Titanic Survival Prediction using NAIVE BAYES

### *Importing basic Libraries*

In [1]:
import pandas as pd
import numpy as np

### *Load Dataset*

In [2]:
dataset = pd.read_csv('titanicsurvival.csv')

### *Summarize Dataset*

In [3]:
print(dataset.shape)
print(dataset.head(5))

(891, 5)
   Pclass  Gender   Age     Fare  Survived
0       3    male  22.0   7.2500         0
1       1  female  38.0  71.2833         1
2       3  female  26.0   7.9250         1
3       1  female  35.0  53.1000         1
4       3    male  35.0   8.0500         0


### *Mapping Text Data to Binary Value*

In [4]:
income_set = set(dataset['Gender'])
dataset['Gender'] = dataset['Gender'].map({'female': 0, 'male': 1}).astype(int)
print(dataset.head)

<bound method NDFrame.head of      Pclass  Gender   Age     Fare  Survived
0         3       1  22.0   7.2500         0
1         1       0  38.0  71.2833         1
2         3       0  26.0   7.9250         1
3         1       0  35.0  53.1000         1
4         3       1  35.0   8.0500         0
..      ...     ...   ...      ...       ...
886       2       1  27.0  13.0000         0
887       1       0  19.0  30.0000         1
888       3       0   NaN  23.4500         0
889       1       1  26.0  30.0000         1
890       3       1  32.0   7.7500         0

[891 rows x 5 columns]>


### *Segregate Dataset into X(Input/IndependentVariable) & Y(Output/DependentVariable)*

In [5]:
X = dataset.drop('Survived',axis='columns')
X

Unnamed: 0,Pclass,Gender,Age,Fare
0,3,1,22.0,7.2500
1,1,0,38.0,71.2833
2,3,0,26.0,7.9250
3,1,0,35.0,53.1000
4,3,1,35.0,8.0500
...,...,...,...,...
886,2,1,27.0,13.0000
887,1,0,19.0,30.0000
888,3,0,,23.4500
889,1,1,26.0,30.0000


In [6]:
Y = dataset.Survived
Y

Unnamed: 0,Survived
0,0
1,1
2,1
3,1
4,0
...,...
886,0
887,1
888,0
889,1


Finding & Removing NA values from our Features X

In [7]:
X.columns[X.isna().any()]

Index(['Age'], dtype='object')

In [8]:
X.Age = X.Age.fillna(X.Age.mean())

### *Test again to check any na value*

In [9]:
X.columns[X.isna().any()]

Index([], dtype='object')

### *Splitting Dataset into Train & Test*

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25,random_state =0)

### *Training*

In [11]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)

### *Predicting, wheather Person Survived or Not*

In [15]:
pclassNo = int(input("Enter Person's Pclass number: "))
gender = int(input("Enter Person's Gender 0-female 1-male(0 or 1): "))
age = int(input("Enter Person's Age: "))
fare = float(input("Enter Person's Fare: "))
person = [[pclassNo,gender,age,fare]]
result = model.predict(person)
print(result)

if result == 1:
  print("Person might have Survived")
else:
  print("Person might not have Survived")

Enter Person's Pclass number: 14
Enter Person's Gender 0-female 1-male(0 or 1): 412
Enter Person's Age: 124
Enter Person's Fare: 124
[1]
Person might have Survived




1### *Prediction for all Test Data*

In [16]:
y_pred = model.predict(X_test)
print(np.column_stack((y_pred,y_test)))

[[0 0]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [0 1]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [0 1]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [1 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [0 0]
 [0 1]
 [0 0]
 [0 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [1 0]
 [0 1]
 [0 1]
 [1 1]
 [0 0]
 [0 1]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 1]
 [0 0]
 [1 0]
 [1 1]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [0 1]
 [1 0]
 [0 0]
 [0 0]
 [1 1]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 1]
 [1 0]
 [0 0]
 [0 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [0 0]
 [0 1]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 0]
 [1 1]
 [1 1]
 [1 0]
 [0 0]
 [1 1]
 [0 0]
 [1 1]
 [0 1]
 [1 0]
 [1 1]
 [1 1]
 [1 1]
 [1 1]
 [0 0]
 [1 1]
 [0 1]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 1]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]

### *Accuracy of our Model*

In [17]:
from sklearn.metrics import accuracy_score
print("Accuracy of the Model: {0}%".format(accuracy_score(y_test, y_pred)*100))

Accuracy of the Model: 77.57847533632287%
