## Naive Bayes classifier using only numpy (compared with GaussianNB from sklearn)
#### Based on:
https://github.com/python-engineer/MLfromscratch/blob/master/mlfromscratch/naivebayes.py

https://chrisalbon.com/machine_learning/naive_bayes/naive_bayes_classifier_from_scratch/

Bayes Theorem is a famous equation that allow us to make predicions based on data:

    P(A|B)=(P(B|A)*P(A))/P(B)

if A is a class and B is data then the equation for classification would bee

    P(class|Data)=(P(Data|Class)*P(Class))/P(Data)

In a bayes classifier technically we disregard de P(Data) or marginal probability, and classify each observation based on the class with largest posterior value(P(Class|Data))

Assumptions of Naive and Gaussian Bayes:

1. Assume each feature is uncorrelated from each other
2. Assume that each value of the features are normally (gaussian) distributed.
 
 
Note: for classification we don't care about what is the posterior probability, but which has the biggets posterior probability, for this we can disregard the marginal probability since all classes have the same one and, it is hard to know in real life
 


In [43]:
import numpy as np

In [36]:
class NaiveBayes:
#fit training data and training methods    
    def fit(self,X,y):
        #calculate priors and class conditional (mean and variance for each class)
        n_samples, n_features = X.shape
        #find classes and create array of unique classes
        self._classes=np.unique(y)
        n_classes=len(self._classes)
        #initialize mean,variance and prior probabilities.
        self._mean=np.zeros((n_classes,n_features),dtype=np.float64)
        self._var=np.zeros((n_classes,n_features),dtype=np.float64)
        self._priors=np.zeros(n_classes, dtype=np.float64)
        
        for c in self._classes:
            #only samples in this class
            X_c=X[c==y]
            #calculate mean for each class and fill mean self
            self._mean[c,:]=X_c.mean(axis=0)
            self._var[c,:]=X_c.var(axis=0)
            #probability that the class occurs is the frequency of class in the training samples
            self._priors[c]=X_c.shape[0]/float(n_samples) #number of samples of label c/ number of samples        
            
    def predict (self, X):
        y_pred=[self._predict(x) for x in X]
        return y_pred
    
    #only one sample
    def _predict (self,x):
    #calculate posterior probability and calculate class conditional 
    #and prior of each one and choose class with highest probability
        posteriors = []
        
        for idx,c in enumerate(self._classes):#get index and class labels
            prior= np.log(self._priors[idx])
            class_conditional= np.sum(np.log(self._pdf(idx, x)))#Gaussian function
            posterior= prior + class_conditional
            posteriors.append(posterior)
            
        return self._classes[np.argmax(posteriors)]
            
          
            
    #probability density function of the normal distribution     
    def _pdf(self,class_idx,x):
        mean=self._mean[class_idx]
        var=self._var[class_idx]
        numerator=np.exp(-(x-mean)**2/(2*var))
        denominator= np.sqrt(2* np.pi * var )
        return numerator/denominator
        

In [37]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt



# Testing of Models

In [38]:
def accuracy(y_true,y_pred):
    accuracy = np.sum(y_true==y_pred) / len(y_true)
    return accuracy

X, y = datasets.make_classification(n_samples=1000,n_features=10,n_classes=2, random_state=123)
y=y.astype(np.int)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=123)

nb=NaiveBayes()
nb.fit(X_train,y_train)
predictions=nb.predict(X_test)

In [40]:
print("Naive Bayes from scratch Classification accuracy", accuracy(y_test, predictions))

Naive Bayes from scratch Classification accuracy 0.965


In [41]:
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)
   

In [42]:
print("Naive Bayes from sklearn Classification accuracy", accuracy(y_test, y_pred))

Naive Bayes from sklearn Classification accuracy 0.965
