In [None]:
!pip install tabulate sklearn 

# Naive Bayes Classifier

This notebook will introduce you the basics of Naive Bayes Algorithm for classification tasks. It includes the following content:

- Brief overview of the Naive Bayes (NB) Classifier
- An example exercise of performing inference with NB


## What is a classifier?

A classifier is a machine learning model that is used to discriminate different objects based on certain features. Given sample data $X$, a classifier predicts the class $y$ it belongs to.

## What is Naive Bayes Classifier?

A Naive Bayes classifier is a probabilistic machine learning model for classification task. It is based on Bayes theorem and imposes a strong assumption on feature independence.

## Bayes Theorem

$$ P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)} $$

We can compute the probability of event A happening, given the fact that event B has occurred. Event B is the evidence and event A is the hypothesis. The assumption made by Naive Bayes is that the features are independent, i.e. the presence of one feature does not affect the other. Therefore it is called naive.

Under the context of classification tasks, given the observation $X$, the classifier casts prediction on the class $y$. It can also be rewritten (with $y$ and $X$ replacing $A$ and $B$) as

$$ P(y \mid X) = \frac{P(X \mid y) \, P(y)}{P(X)} $$

The formula consists of four components:

- $
P(y \mid X) :
\:$ The posterior probability, which is the probability of class $y$ given the observation $X$

- $
P(y) :
\:$ The Prior probability, which is the prior probability (initial belief) of class $y$

- $
P(X \mid y) :
\:$The Likelihood, which is the probability of obsevation $X$ given class $y$.

- $
P(X) :
\:$The Evidence, which is the probability of obsevation $X$.

In classification tasks, the variable $y$ is the class label. The variable X represent the parameters/features and it usually contains multiple features/dimensions:

$$ X = (x_1, x_2, x_3, ..., x_n) $$

where $x_1, x_2, ..., x_n$ are the features and they are assumed to be independent in NB, i.e. $ (\:x_i \: \bot \:  x_j \mid y)\:\: \text{for all features}$ ($i \neq j$ and $i, j \in \{1, 2, ...., n\}$). By expanding using the chain rule we obtained the following:

$$ P(y \mid x_1, x_2, ..., x_n) = \frac{P(x_1, x_2, ..., x_n \mid y) \, P(y)}{P(X)} = \frac{P(x_1 \mid y) P(x_2 \mid y) P(x_3 \mid y) \cdots P(x_n \mid y) \, P(y)}{P(x_1) P(x_2) P(x_3) \cdots P(x_n)} $$

The denominator ($P(X)$) of the Bayes rule remain the same for all classes. Therefore, we can exclude it when performing inference since it is just a term for normalization. Therefore, based on the assumption on feature independence and ignoring the denominator the NB formula can be written as follows:

$$ P(\: y \mid x_1,x_2,...,x_n)\: \propto P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $$

In (binary) classification tasks, the class variable $y$ has two outcomes. We need to find the class $y$ with maximum probability, i.e. $ y = argmax_y P(y) \prod_{i=1}^{i=n} P(\:x_i\mid y) $.

## An example exercise of performing inference with NB

We will use the following example to strengthen our understanding in NB. The example toy dataset is for classifying whether a person owns a pet. Observations $X$ contain three features, two categorical ("Gender" and "Education") and one numerical ("Income"), and class label $y$ (i.e. "Has_pet") corresponds to whether this person owns a pet.

In [1]:
from IPython.display import HTML, display
import tabulate
tab_cat = [["Gender", "Education", "Income", "Has_pet"],
          ["Female", "University", 103000,   "Yes"],
          ["Female", "HighSchool", 90500,   "No"],
          ["Female", "HighSchool", 114000,   "No"],
          ["Male",   "University", 102000,   "No"],
          ["Male",   "University", 75000,   "Yes"],
          ["Male",   "HighSchool", 90000,   "No"],
          ["Male",   "HighSchool", 85000,   "Yes"],
          ["Male",   "University", 86000,   "No"]]
display(HTML(tabulate.tabulate(tab_cat, tablefmt='html')))

0,1,2,3
Gender,Education,Income,Has_pet
Female,University,103000,Yes
Female,HighSchool,90500,No
Female,HighSchool,114000,No
Male,University,102000,No
Male,University,75000,Yes
Male,HighSchool,90000,No
Male,HighSchool,85000,Yes
Male,University,86000,No


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2a - Compute the Likelihood table of having pet, for each categorical feature, as well as the marginal probability.

- $P(Gender|Has\_pet)$: $P(Male|Yes)$, $P(Female|Yes)$, $P(Male|No)$, $P(Female|No)$
    
- $P(Education|Has\_pet)$: $P(University|Yes)$, $P(HighSchool|Yes)$, $P(University|No)$, $P(HighSchool|No)$
    
</div>

In [2]:


tab_likelihood_gender = [
    ["likelihood","-",  "Has_pet", "-", "-"],
    ["-",          "-",  "Yes", "No", "P(Gender)"],
    ["Gender", "Male", "2", "3", "0.625"], 
    ["-", "Female",    "1", "2", "0.375"],
    ["-", "P(Has_pet)","0.375", "0.625", ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))


tab_likelihood_gender = [
    ["likelihood","-",  "Has_pet", "-", "-"],
    ["-",          "-",  "Yes", "No", "P(Education)"],
    ["Education", "University", "2", "2", "0.5"], 
    ["-", "HighSchool", "1", "3", "0.5"],
    ["-", "P(Has_pet)", "0.375", "0.625", ""]
]
display(HTML(tabulate.tabulate(tab_likelihood_gender, tablefmt='html')))


0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Gender)
Gender,Male,2,3,0.625
-,Female,1,2,0.375
-,P(Has_pet),0.375,0.625,


0,1,2,3,4
likelihood,-,Has_pet,-,-
-,-,Yes,No,P(Education)
Education,University,2,2,0.5
-,HighSchool,1,3,0.5
-,P(Has_pet),0.375,0.625,


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2b - Compute posterior probability

- $P(\text{No}|\text{Male})$, $P(\text{Yes}|\text{Female})$
    
- $P(\text{Yes}|\text{Univeristy})$, $P(\text{No}|\text{HighSchool})$

</div>


P(No|Male) = (0.625*3/5)/ 0.625 = 0.6 

P(Yes|Female) = 0.375*0.333/0.375 = 0.3333333


𝑃(Yes|Univeristy) = 0.375*0.5/0.5 = 0.375


𝑃(No|HighSchool) = 0.625*0.75/ 0.5 = 0.9375

<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2c - Compute the Likelihood of having pet using mean, standard deviation, and normal distribution function:

- Mean: $ \mu = \frac{1}{n} \sum^{n}_{i=1}{x_i} $
    
- Standard Deviation $ \sigma = \left[ \frac{1}{n-1} \sum^{n}_{i=1}{(x_i-\mu)^2} \right]^\frac{1}{2}  $
    
- Normal Distribution $f(x)=\dfrac{1}{\sigma\sqrt{2\pi}}\,e^{-\dfrac{(x-\mu)^2}{2\sigma{}^2}}$
    
Compute $P( \text{Income}=90000 \mid \text{Yes})$, $P( \text{Income}=90000 \mid \text{No})$

</div>

In [3]:
import statistics as st
from scipy.stats import norm
result3 =[]
mylist=[]
for i in tab_cat:
    if i[3]== "Yes":
        temp = 1  
        mylist.append(temp)
    elif i[3]=="No":
        temp=0
        mylist.append(temp)

result = st.mean(mylist)
result2= st.stdev(mylist)
result3= norm.pdf(mylist,result,result2)
#for i in mylist:
    #result3.append(1/(np.sqrt((result2**2)*np.pi*2)*np.exp(0.5*((i-result)**2/(result2**2)))))
    #result3.append((1/(result2*np.sqrt(2*np.pi)))*np.exp(-(0-result)**2/(2*result2**2)))
print(result)
print(result2)
print(result3)


0.375
0.5175491695067657
[0.37177946 0.59286546 0.59286546 0.59286546 0.37177946 0.59286546
 0.37177946 0.59286546]


<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2d - Making inference / casting predictions

- $X=(Education=University, Gender=Female, Income=100000)$
    
- $X=(Education=HighSchool, Gender=Male, Income=92000)$

</div>



In [12]:

import bnlearn as bn
import pandas as pand
# Structure Learning: obtaining the network structure



# load the Sprinkler dataset
datainfo = pand.read_table(tab_cat)

model = bn.structure_learning.fit(data)
G = bn.plot(model)

# learn the parameters of the DAG using the df
model_update = bn.parameter_learning.fit(model, df, methodtype="bayes")

# Make plot
G = bn.plot(model_update)

q_1 = bn.inference.fit(model_update, variables=['Education'], evidence={'Gender':'Female','Income':100000})

ValueError: Invalid file path or buffer object type: <class 'list'>

<div class='alert alert-block alert-success' style="font-weight:bolder">

### Task 2e (Extra Credit) Implementing a Naive Bayes Classifier and performing classification on the Iris dataset. Note that the Iris dataset only contains numerical features.

</div>




In [4]:
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris["data"], iris["target"]
print("data", X)
print("class/label", y)

data [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3