## The Titanic dataset is a classic dataset used in data analysis and machine learning. It comprises various columns, including:

1. **PassengerID**: A unique identifier for each passenger.
2. **Survived**: A binary variable indicating whether the passenger survived (1) or did not survive (0).
3. **Pclass**: The passenger's ticket class (1st, 2nd, or 3rd).
4. **Name**: The passenger's name.
5. **Sex**: The passenger's gender (male or female).
6. **Age**: The passenger's age.
7. **SibSp**: The number of siblings or spouses aboard.
8. **Parch**: The number of parents or children aboard.
9. **Ticket**: The passenger's ticket number.
10. **Fare**: The fare paid for the ticket.
11. **Cabin**: The cabin number where the passenger stayed.
12. **Embarked**: The port at which the passenger boarded (C = Cherbourg, Q = Queenstown, S = Southampton).

Analysts and data scientists often use this dataset to explore relationships between these variables and to predict survival outcomes based on passenger information.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv("/kaggle/input/titanic-dataset/Titanic-Dataset.csv")
df.head()

In [None]:
df.drop(['PassengerId','Name','SibSp','Parch','Ticket','Cabin','Embarked'],axis='columns',inplace=True)
df.head()

In [None]:
y = df.Survived
X = df.drop('Survived',axis='columns')
X.head()

In [None]:
dummies = pd.get_dummies(X.Sex).astype(int)
dummies.head(3)

In [None]:
X = pd.concat([X,dummies],axis='columns')
X.head()

In [None]:
X.drop(['Sex'],axis=1,inplace=True)

In [None]:
X

In [None]:
X.isnull().sum()

In [None]:
X.Age = X.Age.fillna(X.Age.mean())
X.head()

In [None]:
X.isnull().sum()

In [None]:
type(X)

In [None]:
X.describe()

In [None]:
plt.pie(y.value_counts(),labels=[0,1],autopct='%1.1f%%')
plt.show()

In [None]:
y.value_counts()

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=42)

In [None]:
print("Length of X_train - ",len(X_train))
print("Length of X_test - ",len(X_test))

In [None]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train,y_train)

In [None]:
pd.DataFrame(y_test,model.predict(X_test),columns=['Actual','Predicted'])

In [None]:
model.score(X_test,y_test)

In [None]:
from sklearn.metrics import accuracy_score,f1_score,confusion_matrix
y_pred = model.predict(X_test)
print("Accuracy Score - ",accuracy_score(y_test,y_pred)*100)
print("F1 Socre - ",f1_score(y_test,y_pred)*100)
print("Confusion_matrix : \n",confusion_matrix(y_test,y_pred))