## Demo of Naive Bayes Classifier

### Iris data set

There are 3 different species:
- Iris setosa
- Iris versicolor
- Iris virginical

![image1](../data/iris.png)

The sepal and petal measures are different for each species:

![image2](../data/iris2.png)

In [6]:
import pandas as pd 
from sklearn.model_selection import train_test_split

data = pd.read_csv('../data/iris.csv')
data.head()

# Lets split data into training and test data
train_data, test_data = train_test_split(data, test_size=0.2, stratify=data['class'])

display(train_data)
display(test_data)


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
143,6.8,3.2,5.9,2.3,Iris-virginica
93,5.0,2.3,3.3,1.0,Iris-versicolor
24,4.8,3.4,1.9,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
60,5.0,2.0,3.5,1.0,Iris-versicolor
...,...,...,...,...,...
90,5.5,2.6,4.4,1.2,Iris-versicolor
7,5.0,3.4,1.5,0.2,Iris-setosa
44,5.1,3.8,1.9,0.4,Iris-setosa
47,4.6,3.2,1.4,0.2,Iris-setosa


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
26,5.0,3.4,1.6,0.4,Iris-setosa
40,5.0,3.5,1.3,0.3,Iris-setosa
138,6.0,3.0,4.8,1.8,Iris-virginica
111,6.4,2.7,5.3,1.9,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica
57,4.9,2.4,3.3,1.0,Iris-versicolor
131,7.9,3.8,6.4,2.0,Iris-virginica
1,4.9,3.0,1.4,0.2,Iris-setosa
73,6.1,2.8,4.7,1.2,Iris-versicolor
42,4.4,3.2,1.3,0.2,Iris-setosa


In [8]:
classes = train_data['class'].unique()
print(classes)

['Iris-virginica' 'Iris-versicolor' 'Iris-setosa']


Lets compute class prior probabilities $P(y_k)$. In this example there are 3 classes. Hence $k \in \{0, 1, 2\}$.

In [9]:
import numpy as np


# number of classes 
K = len(classes)

# number of instances for each class
class_nk = np.zeros(K)

for k in range(K):
    class_nk[k] = train_data[ train_data['class']==classes[k] ].count()[0]

prior_prob = class_nk/np.sum(class_nk)

print(prior_prob)


[0.33333333 0.33333333 0.33333333]


Now a model is needed to compute likelihood $P(x|k)$ where $x$ are the features: sepal/petal length/width, and $k$ is the class label:

| k   | Class    |
|-----|----------|
| 0   | Iris-setosa |
| 1   | Iris-versicolor |
| 2   | Iris-virginica |

Lets assume (naively) that features are independent from eachother and they each are normal distributed. Lets find mean $\mu$ and std. dev. $\sigma$ of each feature.

In [14]:
mu = np.zeros((K,4)) 
std = np.zeros((K,4))

for k in range(K):
    mu[k] = train_data[ train_data['class']==classes[k] ].mean(axis=0)
    std[k] = train_data[ train_data['class']==classes[k] ].std()

print("Class mean values")
print(mu)

print("Class std. dev.")
print(std)

Class mean values
[[6.57   2.9625 5.515  2.04  ]
 [5.9425 2.77   4.2875 1.3325]
 [5.0025 3.4475 1.4575 0.2425]]
Class std. dev.
[[0.6251974  0.29237094 0.54610509 0.26583203]
 [0.53247728 0.30229888 0.47457809 0.20177101]
 [0.35481125 0.40445436 0.17814932 0.10349656]]


Lets go through each instance in test data, and compute likelihood, and posteriors.

In [29]:
from scipy.stats import norm 
for n in range(len(test_data)):
       
    likelihood = np.ones(K)
    posterior = np.zeros(K)
    for k in range(K):
        for m in range(4):
            #print(test_data.iloc[n][m], mu[m])
            likelihood[k] *= norm.pdf(test_data.iloc[n][m], loc = mu[k][m], scale=std[k][m])
        posterior[k] = likelihood[k] * prior_prob[k]
    #print(n, likelihood)
    print(f'test data {n}')
    print(f"Real label: {test_data.iloc[n]['class']}")
    print(f"Predicted label: {classes[np.argmax(posterior)]}\n\n")


test data 0
Real label: Iris-setosa
Predicted label: Iris-setosa


test data 1
Real label: Iris-setosa
Predicted label: Iris-setosa


test data 2
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 3
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 4
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 5
Real label: Iris-versicolor
Predicted label: Iris-versicolor


test data 6
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 7
Real label: Iris-setosa
Predicted label: Iris-setosa


test data 8
Real label: Iris-versicolor
Predicted label: Iris-versicolor


test data 9
Real label: Iris-setosa
Predicted label: Iris-setosa


test data 10
Real label: Iris-virginica
Predicted label: Iris-versicolor


test data 11
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 12
Real label: Iris-virginica
Predicted label: Iris-virginica


test data 13
Real label: Iris-setosa
Predicted label: Iris-setosa

