# Naive Bayes Classifier

In this notebook I will implement the Naive Bayes classification algorithm from scratch using only the numpy Library.

## How does it work?

The Naive Bayes classifier works as follows:

1 - Let's suppose we have a training set, each element of the set is labeled to belong to a certain class. Let's say there are m classes $C_1,C_2,C_3,...,C_m$ 

2 - Each element or sample of the training set is vector of d dimensions X = ($x_1,x_2,x_3,...,x_d$).

3 - Given a new sample, the classifier will predict the class based on its learning from the training set.

4 - The prediction is found by calculating a set of posterior probabilities for each class and pick the highest probability, that's the class the new sample belongs to.

to express this in mathematical form: The sample X belongs to the class Ci if and only if:
                        
$$P(C_i/X) > P(C_j/X)$$ 

$$j \neq i$$

The idea is to find the class that maximizes $P(C_i/X)$

5 - to calculate $P(C_i/X)$, we use the Bayes Theorem. The Bayes Theorem states that:
                                
$$P(C_i/X) = \frac{P(X/C_i)P(C_i)}{P(X)}$$

Because $P(X)$ is the same for all classes, we will only maximize the nominator.

6 - Let's first compute the easy probability $P(C_i)$. classes are categorical variables so the probabilities are calculated taking the number of samples that belong to class 'i' divided by the total number of samples in the training set.

7 - For the probability $P(X/C_i)$, we make an important assumption that will justify the name of the classifer. 
we will assume that the components of X are conditionally independent. So $P(X/C_i)$ could be written as follows.
                                
$$P(X/C_i)=P(x_1/C_i)P(x_2/C_i)P(x_3/C_i)...P(x_d/C_i)$$
                                
The probabilities $P(x_j/C_i)$ can be easily estimated by using gaussian distribution if the component is continuous or in the case of categorical components we can use the same method as $P(C_i)$
                                
ENOUGH THEORY, LET'S CODE                                   

## Time to apply what we have learnt

First of all, let's import the only library we need

In [1]:
import numpy as np

We will start with the following data

In [2]:
X = np.array([[-3,7],[1,5], [1,2], [-2,0], [2,3], [-4,0], [-1,1], [1,1], [-2,2], [2,7], [-4,1], [-2,7]])
Y = np.array([3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4])

So based on the data we have two classes, 3 and 4, let's split data into two arrays, each array with the samples that belong to each class

In [3]:
X3 = np.asarray([X[i] for i in range(len(Y)) if Y[i] == 3])# samples that belong to class 3
print(X3)

[[-3  7]
 [ 1  5]
 [ 1  2]
 [-2  0]
 [-4  0]
 [-1  1]
 [-2  2]]


In [4]:
X4 = np.asarray([X[i] for i in range(len(Y)) if Y[i] == 4])# samples that belong to class 4 
print(X4)

[[ 2  3]
 [ 1  1]
 [ 2  7]
 [-4  1]
 [-2  7]]


The features are continuous so we will use the gaussian distribution, let's first define the means and standard deviation for each class and each feature.

In [5]:
means3 = np.mean(X3,axis=0) # means for class 3
std3 = np.std(X3,axis=0) # standard deviation for class 3

print(f'The means for class 3 = {means3}')
print(f'The standard deviation for class 3 = {std3}')

The means for class 3 = [-1.42857143  2.42857143]
The standard deviation for class 3 = [1.76126114 2.44114393]


In [6]:
means4 = np.mean(X4,axis=0) # means for class 4
std4 = np.std(X4,axis=0) # standard deviation for class 4

print(f'The means for class 4 = {means4}')
print(f'The standard deviation for class 4 = {std4}')

The means for class 4 = [-0.2  3.8]
The standard deviation for class 4 = [2.4        2.71293199]


Let's compute P(Ci)'s 

In [7]:
P_class = np.array([len(X3)/len(X),len(X4)/len(X)])
print(P_class)

[0.58333333 0.41666667]


Let's define the gaussian probability function

In [8]:
def calculateProbability(x, mean, std):
	return (1 / (np.sqrt(2*np.pi) * std)) * (np.exp(-(np.square(x-mean)/(2*np.square(std)))))

Suppose we want to classify the following sample [-5,5]

In [9]:
x = np.array([-5,5])

In [10]:
P3 = calculateProbability(x,means3,std3)
P4 = calculateProbability(x,means4,std4)

And we multiply all the probabilities together 

In [11]:
P_x_3 = P3[0]*P3[1]*P_class[0]
P_x_4 = P4[0]*P4[1]*P_class[1]

print(f'The probability of X given class 3 = {P_x_3}')
print(f'The probability of X given class 4 = {P_x_4}')

The probability of X given class 3 = 0.001586720448658235
The probability of X given class 4 = 0.001249926418695604


As we can se the probability for class 3 is greater than the probability for class 4 so the point [-5,5] belongs to class 3

Let's try another sample [2,5]

In [12]:
x = np.array([2,5])

In [13]:
P3 = calculateProbability(x,means3,std3)
P4 = calculateProbability(x,means4,std4)

In [14]:
P_x_3 = P3[0]*P3[1]*P_class[0]
P_x_4 = P4[0]*P4[1]*P_class[1]

print(f'The probability of X given class 3 = {P_x_3}')
print(f'The probability of X given class 4 = {P_x_4}')

The probability of X given class 3 = 0.001864240041759532
The probability of X given class 4 = 0.006067494763935239


 This point belongs to class 4

## Verify the results

To verify the results I would like to use sklearn

In [15]:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)

GaussianNB(priors=None, var_smoothing=1e-09)

In [16]:
print(clf.predict([[-5, 5]])) 

[3]


In [17]:
print(clf.predict([[2, 5]]))

[4]


As we can see with sklearn we obtain the same results