In [37]:
!pip install tabulate sklearn 



# Naive Bayes Classifier

This notebook will introduce you the basics of Naive Bayes Algorithm for classification tasks. It includes the following content:

- Brief overview of the Naive Bayes (NB) Classifier
- An example exercise of performing inference with NB


## What is a classifier?

A classifier is a machine learning model that is used to discriminate different objects based on certain features. Given sample data $X$, a classifier predicts the class $y$ it belongs to.

## What is Naive Bayes Classifier?

A Naive Bayes classifier is a probabilistic machine learning model for classification task. It is based on Bayes theorem and imposes a strong assumption on feature independence.

## Bayes Theorem

$$ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} $$

We can compute the probability of event A happening, given the fact that event B has occurred. Event B is the evidence and event A is the hypothesis. The assumption made by Naive Bayes is that the features are independent, i.e. the presence of one feature does not affect the other. Therefore it is called naive.

Under the context of classification tasks, given the observation $X$, the classifier casts prediction on the class $y$. It can also be rewritten (with $y$ and $X$ replacing $A$ and $B$) as

$$ P(y \mid X) = \frac{P(X \mid y) \, P(y)}{P(X)} $$

The formula consists of four components:

- $
P(y \mid X) :
\:$ The posterior probability, which is the probability of class $y$ given the observation $X$

- $
P(y) :
\:$ The Prior probability, which is the prior probability (initial belief) of class $y$

- $
P(X \mid y) :
\:$The Likelihood, which is the probability of obsevation $X$ given class $y$.

- $
P(X) :
\:$The Evidence, which is the probability of obsevation $X$.

In classification tasks, the variable $y$ is the class label. The variable X represent the parameters/features and it usually contains multiple features/dimensions:

$$ X = (x_1, x_2, x_3, ..., x_n) $$

where $x_1, x_2, ..., x_n$ are the features and they are assumed to be independent in NB, i.e. $ (\:x_i \: \bot \:  x_j \mid y)\:\: \text{for all features}$ ($i \neq j$ and $i, j \in \{1, 2, ...., n\}$). By expanding using the chain rule we obtained the following:

$$ P(y \mid x_1, x_2, ..., x_n) = \frac{P(x_1, x_2, ..., x_n \mid y) \, P(y)}{P(X)} = \frac{P(x_1 \mid y) P(x_2 \mid y) P(x_3 \mid y) \cdots P(x_n \mid y) \, P(y)}{P(x_1) P(x_2) P(x_3) \cdots P(x_n)} $$

The denominator ($P(X)$) of the Bayes rule remain the same for all classes. Therefore, we can exclude it when performing inference since it is just a term for normalization. Therefore, based on the assumption on feature independence and ignoring the denominator the NB formula can be written as follows:

$$ P(\: y \mid x_1,x_2,...,x_n)\: \propto P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $$

In (binary) classification tasks, the class variable $y$ has two outcomes. We need to find the class $y$ with maximum probability, i.e. $ y = argmax_y P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $.

## An example exercise of performing inference with NB

We will use the following example to strengthen our understanding in NB. The example toy dataset is for classifying whether a person owns a pet. Observations $X$ contain three features, two categorical ("Gender" and "Education") and one numerical ("Income"), and class label $y$ (i.e. "Has_pet") corresponds to whether this person owns a pet.

In [38]:
from IPython.display import HTML, display
import tabulate
tab_cat = [["Gender", "Education", "Income", "Has_pet"],
          ["Female", "University", 103000,   "Yes"],
          ["Female", "HighSchool", 90500,   "No"],
          ["Female", "HighSchool", 114000,   "No"],
          ["Male",   "University", 102000,   "No"],
          ["Male",   "University", 75000,   "Yes"],
          ["Male",   "HighSchool", 90000,   "No"],
          ["Male",   "HighSchool", 85000,   "Yes"],
          ["Male",   "University", 86000,   "No"]]
display(HTML(tabulate.tabulate(tab_cat, tablefmt='html')))

0,1,2,3
Gender,Education,Income,Has_pet
Female,University,103000,Yes
Female,HighSchool,90500,No
Female,HighSchool,114000,No
Male,University,102000,No
Male,University,75000,Yes
Male,HighSchool,90000,No
Male,HighSchool,85000,Yes
Male,University,86000,No


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2a - Compute the Likelihood table of having pet, for each categorical feature, as well as the marginal probability.

- $P(Gender|Has\_pet)$: $P(Male|Yes)$, $P(Female|Yes)$, $P(Male|No)$, $P(Female|No)$
    
- $P(Education|Has\_pet)$: $P(University|Yes)$, $P(HighSchool|Yes)$, $P(University|No)$, $P(HighSchool|No)$
    
</div>

In [39]:
tab_likelihood_gender = [
    ["likelihood","-",  "Has_pet", "-", "-"],
    ["-",          "-",  "Yes", "No", "P(Gender)"],
    ["Gender", "Male", "2/3", "3/5", "5/8"], 
    ["-", "Female",    "1/3", "2/5", "3/8"],
    ["-", "P(Has_pet)","3/8", "5/8", ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))


tab_likelihood_gender = [
    ["likelihood","-",  "Has_pet", "-", "-"],
    ["-",          "-",  "Yes", "No", "P(Education)"],
    ["Education", "University", "2/3", "2/5", "4/8"], 
    ["-", "HighSchool", "1/3", "3/5", "4/8"],
    ["-", "P(Has_pet)", "3/8", "5/8", ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))

0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Gender)
Gender,Male,2/3,3/5,5/8
-,Female,1/3,2/5,3/8
-,P(Has_pet),3/8,5/8,


0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Education)
Education,University,2/3,2/5,4/8
-,HighSchool,1/3,3/5,4/8
-,P(Has_pet),3/8,5/8,


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2b - Compute posterior probability

- $P(\text{No}|\text{Male})$, $P(\text{Yes}|\text{Female})$
    
- $P(\text{Yes}|\text{Univeristy})$, $P(\text{No}|\text{HighSchool})$

</div>


P(NO|Male) = 3/5, P(Yes|Female) = 1/3

P(Yes|University) = 2/4, P(No|HighSchool) = 3/4

<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2c (Extra Credit) - Compute the Likelihood of having pet using mean, standard deviation, and normal distribution function:

- Mean: $ \mu = \frac{1}{n} \sum^{n}_{i=1}{x_i} $
    
- Standard Deviation $ \sigma = \left[ \frac{1}{n-1} \sum^{n}_{i=1}{(x_i-\mu)^2} \right]^\frac{1}{2}  $
    
- Normal Distribution $f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\,e^{-\dfrac{(x-\mu)^2}{2\sigma{}^2}}$
    
Compute $P( \text{Income}=9000 \mid \text{Yes})$, $P( \text{Income}=9000 \mid \text{No})$

</div>

In [40]:
import numpy as np
import pandas as pd

def normal_dist(x,mean,std):
    var= std**2
    temp= 1/(std*np.sqrt(2*np.pi))
    LH= temp*np.exp(-((x-mean)**2)/(2*var))
    return LH


df = pd.DataFrame.from_records(tab_cat)
df = df.T.set_index(0).T

df_Y = df[df['Has_pet'] == 'Yes'] 
df_N = df[df['Has_pet'] == 'No']

Income_Y =df_Y['Income']
Income_N =df_N['Income']

mean_Y = np.mean(Income_Y)
std_Y = np.std(Income_Y)
P_Income_9000_Y = normal_dist(9000, mean_Y, std_Y)
print('𝑃(Income=9000∣Yes):',P_Income_9000_Y)

mean_N = np.mean(Income_N)
std_N = np.std(Income_N)
P_Income_9000_N = normal_dist(9000, mean_N, std_N)
print('𝑃(Income=9000∣No):',P_Income_9000_N)



𝑃(Income=9000∣Yes): 3.351319167585588e-15
𝑃(Income=9000∣No): 5.7103460856402206e-21


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2d (Extra Credit) - Making inference / casting predictions

- $X=(Education=University, Gender=Female, Income=100000)$
    
- $X=(Education=HighSchool, Gender=Male, Income=92000)$

</div>



In [41]:

df_Edu_University = df[df['Education'] == 'University'] 
df_Edu_HighSchool = df[df['Education'] == 'HighSchool'] 
df_Income = df['Income']

df_Gend_Female = df[df['Gender'] == 'Female'] 
df_Gend_Male= df[df['Gender'] == 'Male'] 

P_EduUniversity_Has_Pet_Y = len(df_Y[df_Y['Education'] == 'University']) / len(df_Y) #P(Education=University|Has_Pet=yes)
P_GendFemale_Has_Pet_Y = len(df_Y[df_Y['Gender'] == 'Female']) / len(df_Y) #P(Gender=Female|Has_Pet=yes)

P_EduUniversity_Has_Pet_N = len(df_N[df_N['Education'] == 'University']) / len(df_N) #P(Education=University|Has_Pet=No)
P_GendFemale_Has_Pet_N = len(df_N[df_N['Gender'] == 'Female']) / len(df_N) #P(Gender=Female|Has_Pet=No)

P_EduHighschool_Has_Pet_Y = len(df_Y[df_Y['Education'] == 'HighSchool']) / len(df_Y) #P(Education=HighSchool|Has_Pet=yes)
P_GendMale_Has_Pet_Y = len(df_Y[df_Y['Gender'] == 'Male']) / len(df_Y) #P(Gender=Female|Has_Pet=yes)

P_EduHighschool_Has_Pet_N = len(df_N[df_N['Education'] == 'HighSchool']) / len(df_N) #P(Education=University|Has_Pet=No)
P_GendMale_Has_Pet_N = len(df_N[df_N['Gender'] == 'Male']) / len(df_N) #P(Gender=Female|Has_Pet=No)


mean_Y = np.mean(Income_Y)
std_Y = np.std(Income_Y)
P_Income_100000_Y = normal_dist(100000, mean_Y, std_Y)
#print('𝑃(Income=100000∣Yes):',P_Income_100000_Y)
P_Income_100000_N = normal_dist(100000, mean_N, std_N)
P_Income_92000_Y = normal_dist(92000, mean_Y, std_Y)
P_Income_92000_N = normal_dist(92000, mean_N, std_N)

P_Has_Pet_Y = len(df_Y) /len(df) # P(Has_Pet=yes)
P_Has_Pet_N =len(df_N) /len(df) # P(Has_Pet=No)
P_Edu_University = len(df_Edu_University)/len(df) # P(Education=University)
P_Gend_Female= len(df_Gend_Female)/len(df)  # P(Gender=Female)
P_Edu_Highschool= len(df_Edu_HighSchool)/len(df) # P(Education=Highschool)
P_Gend_Male=len(df_Gend_Male)/len(df)  # P(Gender=Male)


mean_Income= np.mean(df_Income)
std_Income= np.std(df_Income)
P_Income_100000 =normal_dist(100000, mean_Income, std_Income)
P_Income_92000 =normal_dist(92000, mean_Income, std_Income)


#P(Education=University|Has_Pet=yes)P(Gender=Female|Has_Pet=yes)P(Income=100000|Has_Pet=yes)P(Has_Pet=yes)
#/ P(University)P(Female)P(Income=100000)

P_Univer_Female_Income_100000_Y_Num=(P_EduUniversity_Has_Pet_Y * P_GendFemale_Has_Pet_Y * P_Income_100000_Y *P_Has_Pet_Y)
P_Univer_Female_Income_100000_Y_Den=(P_Edu_University * P_Gend_Female * P_Income_100000)
P_Univer_Female_Income_100000_Y = P_Univer_Female_Income_100000_Y_Num / P_Univer_Female_Income_100000_Y_Den
print('Has_Pet=yes|Education=University,Gender=Female,Income=100000:',P_Univer_Female_Income_100000_Y)

#P(Education=University|Has_Pet=No)P(Gender=Female|Has_Pet=No)P(Income=100000|Has_Pet=No)P(Has_Pet=No)
#/ P(University)P(Female)P(Income=100000)

P_Univer_Female_Income_100000_Y_Num=(P_EduUniversity_Has_Pet_N * P_GendFemale_Has_Pet_N * P_Income_100000_N *P_Has_Pet_N)
P_Univer_Female_Income_100000_Y_Den=(P_Edu_University * P_Gend_Female * P_Income_100000)
P_Univer_Female_Income_100000_Y = P_Univer_Female_Income_100000_Y_Num / P_Univer_Female_Income_100000_Y_Den
print('Has_Pet=No|Education=University,Gender=Female,Income=100000:',P_Univer_Female_Income_100000_Y)


#P(Education=HighSchool|Has_Pet=yes)P(Gender=Male|Has_Pet=yes)P(Income=92000|Has_Pet=yes)P(Has_Pet=yes)
#/ P(HighSchool)P(Male)P(Income=92000)

P_HighSch_Male_Income_92000_Y_Num = P_EduHighschool_Has_Pet_Y*P_GendMale_Has_Pet_Y*P_Income_92000_Y*P_Has_Pet_Y
P_HighSch_Male_Income_92000_Y_Den = P_Edu_Highschool*P_Gend_Male*P_Income_92000
P_HighSch_Male_Income_92000_Y= P_HighSch_Male_Income_92000_Y_Num/P_HighSch_Male_Income_92000_Y_Den
print('Has_Pet=yes|Education=Highschool,Gender=Male,Income=92000:',P_HighSch_Male_Income_92000_Y)

#P(Education=HighSchool|Has_Pet=No)P(Gender=Male|Has_Pet=No)P(Income=92000|Has_Pet=No)P(Has_Pet=No)
#/ P(HighSchool)P(Male)P(Income=92000)

P_HighSch_Male_Income_92000_N_Num = P_EduHighschool_Has_Pet_N *P_GendMale_Has_Pet_N*P_Income_92000_N*P_Has_Pet_N
P_HighSch_Male_Income_92000_N_Den = P_Edu_Highschool*P_Gend_Male*P_Income_92000
P_HighSch_Male_Income_92000_N = P_HighSch_Male_Income_92000_N_Num/P_HighSch_Male_Income_92000_N_Den
print('Has_Pet=No|Education=Highschool,Gender=Male,Income=92000:',P_HighSch_Male_Income_92000_N)


Has_Pet=yes|Education=University,Gender=Female,Income=100000: 0.29980422577392163
Has_Pet=No|Education=University,Gender=Female,Income=100000: 0.6762245549002567
Has_Pet=yes|Education=Highschool,Gender=Male,Income=92000: 0.24998231035170046
Has_Pet=No|Education=Highschool,Gender=Male,Income=92000: 0.7431752722700269


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2e (Extra Credit) Implementing a Naive Bayes Classifier and performing classification on the Iris dataset. Note that the Iris dataset only contains numerical features.

</div>




In [42]:
import math 
import random 
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris["data"], iris["target"]
#print("data", X)
#print("class/label", y)
iris_data = pd.DataFrame(data=np.c_[iris['data'],iris['target']], columns = iris['feature_names']+['target'])
iris_data

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2.0
146,6.3,2.5,5.0,1.9,2.0
147,6.5,3.0,5.2,2.0,2.0
148,6.2,3.4,5.4,2.3,2.0


In [43]:
# spliting data to train and test
def splitting(mydataIn,mydataOut, ratio):
    train_num_in = int(len(mydataIn) * ratio)
    train_num_out = int(len(mydataOut) * ratio)
    X_train_data = []
    y_train_data = []
    # initally testset will have all the dataset
    X_test_data = list(mydataIn)
    y_test_data = list(mydataOut)
    while len(X_train_data) < train_num_in:
        # index generated randomly from range 0 to length of X_test_dataset
        index = random.randrange(len(X_test_data))
        # from testset, pop data rows and put it in train_data
        X_train_data.append(X_test_data.pop(index))
        y_train_data.append(y_test_data.pop(index))
    return X_train_data, X_test_data,y_train_data, y_test_data

ratio = 0.7
X_train_data, X_test_data, y_train_data, y_test_data = splitting(X,y, ratio)
NoOfClasses = np.unique(y_train_data)

In [44]:
def CalMeanStdPriorProb_eachLabel(X_train_data, y_train_data,NoOfClasses):
    classMean=dict()
    classStd =dict()
    PriorProb=dict()
    for i in range(len(NoOfClasses)):
        Temp=y_train_data==NoOfClasses[i]
        Ind=np.where(Temp)[0]
        Temp_X_train_data=[]
        for j in range(len(Ind)):
            Temp_X_train_data.append(X_train_data[Ind[j]])
        classMean[NoOfClasses[i]] = np.mean(Temp_X_train_data, axis=0)
        classStd[NoOfClasses[i]] = np.std(Temp_X_train_data, axis=0)
        PriorProb[NoOfClasses[i]]  = len(Ind)/len(X_train_data)
    return classMean, classStd, PriorProb

def Likelihood_calc(x,mean,std):
    var= std**2
    temp= 1/(std*np.sqrt(2*np.pi))
    LH= np.product(temp*np.exp(-((x-mean)**2)/(2*var)),axis=1)
    return LH

def PosteriorProb_eachLabel(X_test_data, classMean, classStd, PriorProb,NoOfClasses):
    posteriorProbOut =np.zeros((len(X_test_data), len(NoOfClasses)))
    for ii in range(len(NoOfClasses)):
        Likelihood_eachLabel= Likelihood_calc(X_test_data, classMean[ii], classStd[ii])
        posteriorProbOut[:, ii] =(Likelihood_eachLabel) *PriorProb[ii]
    return posteriorProbOut

In [45]:
LabelMean, LabelStd, PriorProb = CalMeanStdPriorProb_eachLabel(X_train_data, y_train_data,NoOfClasses)
posteriorProbOut = PosteriorProb_eachLabel(X_test_data, LabelMean, LabelStd, PriorProb,NoOfClasses)
Prediction= np.argmax(posteriorProbOut, axis=1)

print('Mean Squared Error',((sum(y_test_data-Prediction)**2)/len(y_test_data)))

Mean Squared Error 0.022222222222222223
