Gaussian Naive Bayes Classifier From Scratch

In this notebook I will create a gaussian naive bayes classifier from scratch, based on Iris dataset.  Dataset contains continuous attribute values. The assumption is that the values associated with each class are distributed according to Gaussian (Normal) distribution. At the end I use it to predict the class of a previously unseen instance.

Steps:

1. Segment the data by the class
2. Compute the mean and variance of x in each class. 
3. Compute the probability distribution of xi given a class  by the Probability Density Function equation.
4. Compute the Maximum A Posteriori and select the highest one.

In [2]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn import datasets

In [3]:
# Import dataset
iris = datasets.load_iris() 

In [5]:
# Create dataframe
df=pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])

In [6]:
# Exploratory data analysis
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [16]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [7]:
# To check the number of targets
df['target'].unique()  

array([0., 1., 2.])

In [8]:
# To check the balance of dataset
df.groupby('target').size()

target
0.0    50
1.0    50
2.0    50
dtype: int64

In [17]:
# Rename column names
df=df.rename({'sepal length (cm)':'sepal_length', 'sepal width (cm)':'sepal_width', 'petal length (cm)':'petal_length', 'petal width (cm)':'petal_width'}, axis=1)

In [11]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


## Naive Bayes algorithm intuition

Bayes theorem allows us to make predictions based on data. Naïve Bayes Classifier uses the Bayes’ theorem to predict membership probabilities for each class such as the probability that given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class. Here is the classic version of the Bayes theorem:

### P(A∣B)= P(B∣A) * P(A) / P(B)
In words:
### P(class∣data) = P(data∣class) * P(class) / P(data)
                
class - is a particular class (in our example class is 0,1 or 2)

data - is an observation’s data

p(class∣data) - is called the posterior - this is what we are looking for

p(data|class) - is called the likelihood - for real data, like here, we calculate it from probability density function. We need to calculate it for every feature in the dataset. The “gaussian” and “naive” come from two assumptions present in this likelihood: 

#### 1. Assumption that each feature is uncorrelated from each other. This is obviously not true, and is a “naive” assumption - hence the name “naive bayes.”
#### 2. Assumption that  the value of the features (e.g. petal_length) are normally (gaussian) distributed. This means that P(data/class) is calculated by inputing the required parameters into the probability density function of the normal distribution:


p(class) - is called the prior. This is just the number of instances belonging to particular class in the dataset divided by the total number of instances in the dataset.
p(data) - is called the marginal probability, which is the same for all classes and will be ignored.   


## In a bayes classifier, we calculate the numerator of posterior for every class for each observation and of course we pick the largets. This is also known as the Maximum A Posteriori (MAP)

Example from our dataset:

P(class=0|Iris data)(MAP) = P(sepal_length|class=0) * P(sepal_width|class=0) * P(petal_length|class=0) * P(petal_width|class=0) * P(class=0) 


In [12]:
#1.Segment data by class and calculate Priors - P(class)

df.groupby('target').size().to_numpy()

array([50, 50, 50])

In [13]:
total=len(df['target'])

In [14]:
Prior_klasa0=df.groupby('target').size().to_numpy()[0]/total
Prior_klasa1=df.groupby('target').size().to_numpy()[1]/total
Prior_klasa2=df.groupby('target').size().to_numpy()[2]/total

In [18]:
print('Prior probability of class 0:', Prior_klasa0)
print('Prior probability of class 1:', Prior_klasa1)
print('Prior probability of class 2:', Prior_klasa2)

Prior probability of class 0: 0.3333333333333333
Prior probability of class 1: 0.3333333333333333
Prior probability of class 2: 0.3333333333333333


In [19]:
# 2.Compute the mean and std for each class and feature combination:

sepal_length_mean=df['sepal_length'].groupby(df['target']).mean()
sepal_width_mean=df['sepal_width'].groupby(df['target']).mean()
petal_length_mean=df['petal_length'].groupby(df['target']).mean()
petal_width_mean=df['petal_width'].groupby(df['target']).mean()

In [20]:
print('Sepal_length_mean for 0,1 and 2 class:', sepal_length_mean)
print('Sepal_width_mean for 0,1 and 2 class:', sepal_width_mean)
print('Petal_length_mean for 0,1 and 2 class:', petal_length_mean)
print('Petal_width_mean for 0,1 and 2 class:', petal_width_mean)

Sepal_length_mean for 0,1 and 2 class: target
0.0    5.006
1.0    5.936
2.0    6.588
Name: sepal_length, dtype: float64
Sepal_width_mean for 0,1 and 2 class: target
0.0    3.428
1.0    2.770
2.0    2.974
Name: sepal_width, dtype: float64
Petal_length_mean for 0,1 and 2 class: target
0.0    1.462
1.0    4.260
2.0    5.552
Name: petal_length, dtype: float64
Petal_width_mean for 0,1 and 2 class: target
0.0    0.246
1.0    1.326
2.0    2.026
Name: petal_width, dtype: float64


In [21]:
sepal_length_mean.to_numpy()

array([5.006, 5.936, 6.588])

In [22]:
sepal_width_mean.to_numpy()

array([3.428, 2.77 , 2.974])

In [23]:
petal_length_mean.to_numpy()

array([1.462, 4.26 , 5.552])

In [24]:
petal_width_mean.to_numpy()

array([0.246, 1.326, 2.026])

In [25]:
sepal_length_std=df['sepal_length'].groupby(df['target']).std()
sepal_width_std=df['sepal_width'].groupby(df['target']).std()
petal_length_std=df['petal_length'].groupby(df['target']).std()
petal_width_std=df['petal_width'].groupby(df['target']).std()

In [26]:
print('Sepal_length_std for 0,1 and 2 class:', sepal_length_std)
print('Sepal_width_std for 0,1 and 2 class:', sepal_width_std)
print('Petal_length_std for 0,1 and 2 class:', petal_length_std)
print('Petal_width_std for 0,1 and 2 class:', petal_width_std)

Sepal_length_std for 0,1 and 2 class: target
0.0    0.352490
1.0    0.516171
2.0    0.635880
Name: sepal_length, dtype: float64
Sepal_width_std for 0,1 and 2 class: target
0.0    0.379064
1.0    0.313798
2.0    0.322497
Name: sepal_width, dtype: float64
Petal_length_std for 0,1 and 2 class: target
0.0    0.173664
1.0    0.469911
2.0    0.551895
Name: petal_length, dtype: float64
Petal_width_std for 0,1 and 2 class: target
0.0    0.105386
1.0    0.197753
2.0    0.274650
Name: petal_width, dtype: float64


In [27]:
sepal_length_std.to_numpy()

array([0.35248969, 0.51617115, 0.63587959])

In [28]:
sepal_width_std.to_numpy()

array([0.37906437, 0.31379832, 0.32249664])

In [29]:
petal_length_std.to_numpy()

array([0.173664  , 0.46991098, 0.5518947 ])

In [30]:
petal_width_std.to_numpy()

array([0.10538559, 0.19775268, 0.27465006])

In [31]:
# 3. Compute the Likelihood P(B/A) (p_x_given_class) by the Probability Density Function equation:
# P(x) = 1/ std(2PI) exp (-((x*mean)^2)/2 std^2)

In [32]:
def p_x_given_y(x, mean, std):

    # Input the arguments into a probability density function
    p = 1/(np.sqrt(2*np.pi*std**2)) * np.exp((-(x-mean)**2)/(2*std**2))
    
    # return p
    return p

In [48]:
# Compute p_x_given_class for the one exammple - first line of dataset and for particular class:

In [34]:
p_x_given_0=p_x_given_y(df['sepal_length'][0], sepal_length_mean[0], sepal_length_std[0])*p_x_given_y(df['sepal_width'][0], sepal_width_mean[0], sepal_width_std[0])*p_x_given_y(df['petal_width'][0], petal_width_mean[0], petal_width_std[0])*p_x_given_y(df['petal_length'][0], petal_length_mean[0], petal_length_std[0])
p_x_given_1=p_x_given_y(df['sepal_length'][0], sepal_length_mean[1], sepal_length_std[1])*p_x_given_y(df['sepal_width'][0], sepal_width_mean[1], sepal_width_std[1])*p_x_given_y(df['petal_width'][0], petal_width_mean[1], petal_width_std[1])*p_x_given_y(df['petal_length'][0], petal_length_mean[1], petal_length_std[1])
p_x_given_2=p_x_given_y(df['sepal_length'][0], sepal_length_mean[2], sepal_length_std[2])*p_x_given_y(df['sepal_width'][0], sepal_width_mean[2], sepal_width_std[2])*p_x_given_y(df['petal_width'][0], petal_width_mean[2], petal_width_std[2])*p_x_given_y(df['petal_length'][0], petal_length_mean[2], petal_length_std[2])

In [35]:
print('P(B/A) for the first instance in dataset and for the class 0 :',p_x_given_0)  
print('P(B/A) for the first instance in dataset and for the class 1 :',p_x_given_1)  
print('P(B/A) for the first instance in dataset and for the class 2 :',p_x_given_2)  

P(B/A) for the first instance in dataset and for the class 0 : 8.374601751530664
P(B/A) for the first instance in dataset and for the class 1 : 2.4967278599904303e-17
P(B/A) for the first instance in dataset and for the class 2 : 1.8025267716033326e-24


In [36]:
# 4.Compute Maximum A Posteriori P(A∣B)= p_class_given_y * P(A) (skipping the marginal probability)

In [37]:
p_0_given_x=p_x_given_0*Prior_klasa0
p_1_given_x=p_x_given_1*Prior_klasa1
p_2_given_x=p_x_given_2*Prior_klasa2

In [38]:
print('P(A/B) for the first instance in dataset and for the class 0 :',p_0_given_x)  
print('P(A/B) for the first instance in dataset and for the class 1 :',p_1_given_x)  
print('P(A/B) for the first instance in dataset and for the class 2 :',p_2_given_x)  

P(A/B) for the first instance in dataset and for the class 0 : 2.7915339171768876
P(A/B) for the first instance in dataset and for the class 1 : 8.3224261999681e-18
P(A/B) for the first instance in dataset and for the class 2 : 6.008422572011108e-25


In [40]:
# The highest probability is for the class 0, what is correct according to dataset.

Below I create a new instance for which we know feature values but not the class. The goal is to predict the class.

In [41]:
df_test=pd.DataFrame()
df_test['sepal_length']=[6.7]
df_test['sepal_width']=3.0
df_test['petal_length']=5.2
df_test['petal_width']=2.3

In [42]:
df_test

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,6.7,3.0,5.2,2.3


In [43]:
test=df_test.to_numpy()

In [44]:
test[0,1]

3.0

In [45]:
new_klasa0=p_x_given_y(test[0,0], sepal_length_mean[0], sepal_length_std[0])*p_x_given_y(test[0,1], sepal_width_mean[0], sepal_width_std[0])*p_x_given_y(test[0,2], petal_length_mean[0], petal_length_std[0])*p_x_given_y(test[0,3], petal_width_mean[0], petal_width_std[0])*Prior_klasa0
new_klasa1=p_x_given_y(test[0,0], sepal_length_mean[1], sepal_length_std[1])*p_x_given_y(test[0,1], sepal_width_mean[1], sepal_width_std[0])*p_x_given_y(test[0,2], petal_length_mean[1], petal_length_std[0])*p_x_given_y(test[0,3], petal_width_mean[1], petal_width_std[1])*Prior_klasa1
new_klasa2=p_x_given_y(test[0,0], sepal_length_mean[2], sepal_length_std[2])*p_x_given_y(test[0,1], sepal_width_mean[2], sepal_width_std[0])*p_x_given_y(test[0,2], petal_length_mean[2], petal_length_std[0])*p_x_given_y(test[0,3], petal_width_mean[2], petal_width_std[2])*Prior_klasa2

In [46]:
print('Posteriori probability of class 0:', new_klasa0)
print('Posteriori probability of class 1:', new_klasa1)
print('Posteriori probability of class 2:', new_klasa2)

Posteriori probability of class 0: 1.4262923611857623e-188
Posteriori probability of class 1: 8.199421658993404e-13
Posteriori probability of class 2: 0.05622760021608976


In [47]:
# Because the posteriori for target = 2 is the greatest, then we predict that the class for the test instance is 2.