# **Naive Bayes Classifier from scratch using Titanic dataset**
Understanding the concepts behind each statistical model is an important element of studying ML. However, Scikit-learn appears to the user as a mystical black box. The Naive Bayes method is easy to learn and fun to use. It will be presented in Python and applied to the Titanic Survival prediction problem.

![Naive Bayes Classifier](Assets/NBC.jfif)

## What is Bayes Theorem?

The Bayes theorem is a mathematical formula to calculate conditional probability, named after 18th-century British mathematician Thomas Bayes. The possibility of an outcome occurring based on the likelihood of a preceding outcome occurring is known as conditional probability. Given fresh or more facts, Bayes' theorem can be used to alter previous forecasts or theories (update probability). Bayes' theorem is used in finance to assess the risk of lending money to prospective investors.


The Bayes theorem, commonly known as Bayes' Rule or Bayes' Law, is the cornerstone of Bayesian statistics.

It expresses the likelihood of an event based on past knowledge of circumstances that may be relevant to the event. Given that ( x ) has occurred, we may calculate the likelihood of ( y ) occurring using Bayes theory. The evidence ( x ) is the evidence, the previous experience ( y ) is the prior knowledge, and the likelihood ( P(x|y)) is the likelihood. The predictor variables are assumed to be independent in this case.

It's equation is as follows:

\\( P(y|x)= \dfrac{P(x|y)P(y)}{P(x)} \\)
where, 
* \\( y,x \\) = Events
* \\( P(y|x) \\) = Probability of \\( y \\) given \\( x \\)
* \\( P(x|y) \\) = Probability of \\( x \\) given \\( y \\)
* \\( P(y), P(x) \\) = Independent probabilities of \\( y \\) and \\( x \\)

The Naive Bayes Methods are a series of supervised learning algorithms based on Bayes' theorem and the "naive" assumption of conditional independence between every pair of features given the class variable's value.

The variable ( y ) in our example is the class variable (Survival 0 or 1), which indicates whether a passenger would survive or not under the circumstances. The parameters/features are represented by the variable ( X ). ( X ) is written as ( X=(x 1, x 2,..., x n) ) and can be mapped to Age, Class, Sex, and so on. We get by inserting for ( X) in the Bayes Rule and expanding with the chain rule.
        
\\( P(y|x_{1}, x_{2}, ..., x_{n})= \dfrac{P(x_{1}|y)P(x_{2}|y)...P(x_{n}|y)P(y)}{P(x_{1})P(x_{2})...P(x_{n})} \\)

By examining the dataset, you can now compute the values for each likelihood and plug them into the equation. The class variable ( y ) has two possible outcomes in our case: 0 or 1. So, for each passenger, the survival chances for each situation of '*Survival*' or '*No survival*' (1 or 0, respectively) must be determined. The final outcome is the one with the highest likelihood. That is, the guy will survive if ( P(1) > P(0) ).



In [1]:
import pandas as pd
import numpy as np


df_train=pd.read_csv('train.csv')
df_test=pd.read_csv('test.csv')

Lets do some basic feature engineering, for simplicity we will be using 3 features only: Class, Sex and Age):

In [2]:
gender={"male":0, "female":1}
df_train.Sex=[gender[item] for item in df_train.Sex]
df_test.Sex=[gender[item] for item in df_test.Sex]

df_train.Age.fillna(df_train.Age.mean(), inplace=True)
df_test.Age.fillna(df_test.Age.mean(), inplace=True)

df_train.Age=df_train.Age.astype(int)
df_test.Age=df_test.Age.astype(int)

#Ages grouped
data = [df_train, df_test]
for dataset in data:
    dataset.loc[ dataset['Age'] <= 11, 'Age'] = 0
    dataset.loc[(dataset['Age'] > 11) & (dataset['Age'] <= 18), 'Age'] = 1
    dataset.loc[(dataset['Age'] > 18) & (dataset['Age'] <= 22), 'Age'] = 2
    dataset.loc[(dataset['Age'] > 22) & (dataset['Age'] <= 27), 'Age'] = 3
    dataset.loc[(dataset['Age'] > 27) & (dataset['Age'] <= 33), 'Age'] = 4
    dataset.loc[(dataset['Age'] > 33) & (dataset['Age'] <= 40), 'Age'] = 5
    dataset.loc[(dataset['Age'] > 40) & (dataset['Age'] <= 66), 'Age'] = 6
    dataset.loc[ dataset['Age'] > 66, 'Age'] = 7

## The Machine Learning model construction starts here

For 3 features, so the Bayes rule will look like the following equation,

\\( P(y|x_{1}, x_{2},x_{3})= \dfrac{P(x_{1}|y)P(x_{2}|y)P(x_{3}|y)P(y)}{P(x_{1})P(x_{2})P(x_{3})} \\)

where 

\\( P(y) \\) = Probability of survival (for 0 and for 1), so it is a 2-dimensional array.
* \\( P(x_{1}) \\) = Probability of Pclass, it is a 3-dimensional array (denoted as p_Class in the code)
* \\( P(x_{2}) \\) = Probability of gender, 2-dimensional array (denoted as p_Sex in the code)
* \\( P(x_{3}) \\) = Probability of Age, 8-dimensional array (denoted as p_Age in the code)

and the conditional probabilities

*  \\( P(x_{1}|y) \\) = Probability of Pclass given survival (0 or 1)
*  \\( P(x_{2}|y) \\) = Probability of gender given survival (0 or 1)
*  \\( P(x_{3}|y) \\) =  Probability of Age given survival (0 or 1)

The probabilities are calculated below

In [3]:
#probabilities of the features
    
Class_counts=df_train['Pclass'].value_counts()  
p_Class=Class_counts/len(df_train)

Sex_counts=df_train['Sex'].value_counts()
p_Sex=Sex_counts/len(df_train)

Age_counts=df_train['Age'].value_counts()
p_Age=Age_counts/len(df_train)

# Survival and Death probabilities
y_counts=df_train['Survived'].value_counts()
p_y=y_counts/len(df_train)

df_survived=df_train.loc[df_train['Survived'] == 1]
df_died=df_train.loc[df_train['Survived'] == 0]

Conditional Probabilities are calculated below

In [4]:
# Conditional probabilities
#class/survived
class_survived_counts=df_survived['Pclass'].value_counts()  
p_class_survived=class_survived_counts/len(df_survived)

# class/died
class_died_counts=df_died['Pclass'].value_counts()  
p_class_died=class_died_counts/len(df_died)

print("P Class Survived : \n", p_class_survived)
print("\nP Class Died : \n", p_class_died)

P Class Survived : 
 1    0.397661
3    0.347953
2    0.254386
Name: Pclass, dtype: float64

P Class Died : 
 3    0.677596
2    0.176685
1    0.145719
Name: Pclass, dtype: float64


In [5]:
#Age/survived
age_survived_counts=df_survived['Age'].value_counts()  
p_age_survived=age_survived_counts/len(df_survived)

age_died_counts=df_died['Age'].value_counts()  
p_age_died=age_died_counts/len(df_died)

print("P Age Survived : \n", p_age_survived)
print("\nP Age Died : \n", p_age_died)

P Age Survived : 
 4    0.295322
6    0.157895
5    0.131579
3    0.125731
0    0.114035
1    0.090643
2    0.081871
7    0.002924
Name: Age, dtype: float64

P Age Died : 
 4    0.367942
6    0.158470
2    0.116576
3    0.114754
5    0.105647
1    0.072860
0    0.052823
7    0.010929
Name: Age, dtype: float64


In [6]:
#sex/survived
sex_survived_counts=df_survived['Sex'].value_counts()  
p_sex_survived=sex_survived_counts/len(df_survived)

sex_died_counts=df_died['Sex'].value_counts()  
p_sex_died=sex_died_counts/len(df_died)

print("P Gender Survived : \n", p_sex_survived)
print("\nP Gender Died : \n", p_sex_died)

P Gender Survived : 
 1    0.681287
0    0.318713
Name: Sex, dtype: float64

P Gender Died : 
 0    0.852459
1    0.147541
Name: Sex, dtype: float64


Now, lets define the Bayes Function

In [7]:
def Bayes(py, px1y, px2y, px3y, px1, px2, px3):
    numerator=px1y*px2y*px3y*py
    denominator=px1*px2*px3
    p=numerator/denominator
    return p

The probabilities of survival for each passenger

In [8]:
result_array=[]

for i in range(0,418):
    feature_class=df_test.iloc[i]['Pclass']
    feature_sex=df_test.iloc[i]['Sex']
    feature_age=df_test.iloc[i]['Age']
    
    P_Y1=Bayes(p_y[1], p_class_survived[feature_class], p_sex_survived[feature_sex], p_age_survived[feature_age], p_Class[feature_class], p_Sex[feature_sex], p_Age[feature_age])
    P_Y0=Bayes(p_y[0], p_class_died[feature_class], p_sex_died[feature_sex], p_age_died[feature_age], p_Class[feature_class], p_Sex[feature_sex], p_Age[feature_age])
    
    if P_Y0 > P_Y1:
        result=0
    else:
        result=1
        
    result_array.append(result)


output = pd.DataFrame({'PassengerId': df_test.PassengerId,'Survived': result_array})

print("The chance of Number of passenger will survive is predicted below: \n" ,output)

The chance of Number of passenger will survive is predicted below: 
      PassengerId  Survived
0            892         0
1            893         1
2            894         0
3            895         0
4            896         1
..           ...       ...
413         1305         0
414         1306         1
415         1307         0
416         1308         0
417         1309         0

[418 rows x 2 columns]


### Advantages
- It is not only a straightforward strategy, but also a quick and accurate one.
- The calculation cost of Naive Bayes is quite low.
- It can handle a large dataset with ease.
- When compared to a continuous variable, it performs well with discrete response variables.
- It can be used to solve problems involving numerous classes.
- It also performs well when dealing with text analytics issues.
- A Naive Bayes classifier outperforms other models like logistic regression when the assumption of independence is met.

### Disadvantages
- The presumption of distinct characteristics. In practise, getting a group of predictors that are completely independent is nearly impossible.

- When there is no training tuple for a specific class, the posterior probability is zero. The model is unable to provide predictions in this scenario. The Zero Likelihood Problem is the name for this problem.

## Conclusion

Congratulations, we have successfully learnt using Naive Bayes without using any fancy libraries!

Here, we learned about Naïve Bayes algorithm, it's working, Naive Bayes assumption, issues, implementation, advantages, and disadvantages. Along the road, you have also learned model building and evaluation without using scikit-learn for binary and multinomial classes.

Naive Bayes is the most straightforward and most potent algorithm. In spite of the significant advances of Machine Learning in the last couple of years, it has proved its worth. It has been successfully deployed in many applications from text analytics to recommendation engines.