# What is the Naive Bayes Classifier?
The Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features. It is particularly useful for classification tasks.

### Bayes' Theorem
Bayes' theorem provides a way to update the probability estimate for a hypothesis as more evidence or information becomes available. The formula is:

𝑃(𝐴∣𝐵)=𝑃(𝐵∣𝐴⋅𝑃(𝐴)/𝑃(𝐵)

In [50]:
import pandas as pd

In [51]:
df = pd.read_csv("Titanic-Dataset.csv")
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [52]:
df['Age'].fillna(df['Age'].median(), inplace=True)

In [53]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [54]:
dataframe=df[["Age","Sex","Fare"]]
dataframe

Unnamed: 0,Age,Sex,Fare
0,22.0,male,7.2500
1,38.0,female,71.2833
2,26.0,female,7.9250
3,35.0,female,53.1000
4,35.0,male,8.0500
...,...,...,...
886,27.0,male,13.0000
887,19.0,female,30.0000
888,28.0,female,23.4500
889,26.0,male,30.0000


In [55]:
dataframe.isnull().sum()

Age     0
Sex     0
Fare    0
dtype: int64

In [56]:
pd.get_dummies(dataframe.Sex) 

Unnamed: 0,female,male
0,False,True
1,True,False
2,True,False
3,True,False
4,False,True
...,...,...
886,False,True
887,True,False
888,True,False
889,False,True


In [57]:
dummies = pd.get_dummies(dataframe['Sex'])
dummies = dummies.astype(int)  #changing into integer
dummies

Unnamed: 0,female,male
0,0,1
1,1,0
2,1,0
3,1,0
4,0,1
...,...,...
886,0,1
887,1,0
888,1,0
889,0,1


In [58]:
merge =pd.concat([dataframe,dummies],axis='columns')

In [59]:
merge.head()

Unnamed: 0,Age,Sex,Fare,female,male
0,22.0,male,7.25,0,1
1,38.0,female,71.2833,1,0
2,26.0,female,7.925,1,0
3,35.0,female,53.1,1,0
4,35.0,male,8.05,0,1


In [60]:
merge=merge.drop(["Sex"],axis='columns')

In [61]:
merge

Unnamed: 0,Age,Fare,female,male
0,22.0,7.2500,0,1
1,38.0,71.2833,1,0
2,26.0,7.9250,1,0
3,35.0,53.1000,1,0
4,35.0,8.0500,0,1
...,...,...,...,...
886,27.0,13.0000,0,1
887,19.0,30.0000,1,0
888,28.0,23.4500,1,0
889,26.0,30.0000,0,1


In [62]:
x=merge
y=df.Survived

In [68]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

In [70]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()

In [71]:
model.fit(x_train,y_train)

In [72]:
model.score(x_test,y_test)

0.7932960893854749

In [73]:
model.predict(x_test)

array([0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,
       1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0,
       0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0,
       0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,
       1, 0, 0], dtype=int64)