# Titanic Survival Prediction — Naive Bayes Case Study
This notebook builds a Naive Bayes classifier to predict Titanic passenger survival using preprocessing, training, and evaluation steps.

## Step 1 — Upload Dataset
Upload the Titanic dataset CSV downloaded from Kaggle.

In [None]:
from google.colab import files
uploaded = files.upload()

## Step 2 — Import Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## Step 3 — Load Dataset

In [None]:
df = pd.read_csv(list(uploaded.keys())[0])
df.head()

## Step 4 — Explore Dataset

In [None]:
df.info()

In [None]:
df.isnull().sum()

## Step 5 — Data Cleaning

In [None]:
# fill missing Age with median
df['Age'].fillna(df['Age'].median(), inplace=True)

# fill Embarked with mode
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

# drop Cabin (too many missing values)
df.drop(columns=['Cabin','Name','Ticket','PassengerId'], inplace=True)

df.isnull().sum()

## Step 6 — Encode Categorical Variables

In [None]:
le = LabelEncoder()

df['Sex'] = le.fit_transform(df['Sex'])
df['Embarked'] = le.fit_transform(df['Embarked'])

df.head()

## Step 7 — Split Data

In [None]:
X = df.drop('Survived',axis=1)
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)

## Step 8 — Train Naive Bayes Model

In [None]:
model = GaussianNB()
model.fit(X_train,y_train)

## Step 9 — Predictions

In [None]:
y_pred = model.predict(X_test)

## Step 10 — Evaluation Metrics

In [None]:
print("Accuracy:",accuracy_score(y_test,y_pred))
print("\nConfusion Matrix:\n",confusion_matrix(y_test,y_pred))
print("\nClassification Report:\n",classification_report(y_test,y_pred))

## Step 11 — Visualization of Confusion Matrix

In [None]:
cm = confusion_matrix(y_test,y_pred)

plt.imshow(cm)
plt.title("Confusion Matrix")
plt.colorbar()
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

## Step 12 — Analysis & Insights
**Write your own analysis here.**

Suggested points:
- Which class was predicted better?
- Did preprocessing improve accuracy?
- Why Naive Bayes works well for this dataset?
- Limitations of the model.