## Naive Bayes


https://www.youtube.com/watch?v=BqUmKsfSWho



$P(A|B) = \frac{P(B|A) P(A)}{P(B)}$

### In our case


$P(y|X) = \frac{P(X|y) P(y)}{P(X)}$

### with feature vector X

$X = (x_1, x_2, ..., x_n)$


### Assume that all features are mutually independent

$P(y|X) = \frac{P(x_1|y) \dot P(x2|y) \cdots P(x_n|y) \dot P(y)}{P(X)}$

### Select class with highest probability
$\hat y = \underset{y}{\operatorname{argmax}} \frac{P(x_1|y) \dot P(x2|y) \cdots P(x_n|y) \dot P(y)}{P(X)}$
$\hat y = \underset{y}{\operatorname{argmax}} P(x_1|y) P(x2|y) \cdots P(x_n|y)  P(y)$

Since probabilities are very small number and multiplying them can be overflow problem, so we use **log**

$\hat y = \underset{y}{\operatorname{argmax}} log(P(x_1|y)) + log( P(x2|y)) \cdots log(P(x_n|y)) + log(P(y))$

$P(y)$ : Prior probability


In many cases we model 

$P(x_i|y) = \frac{1}{\sqrt {2 \pi \sigma_y^{2}}} exp\left( - \frac{(x_i-\mu_y)^2}{2 \sigma_y^2} \right)$


In [1]:
import numpy as np


class NaiveBayes:
    # We do not need init
    
    def fit(self, X, y):
        # We need priors
        n_samples, n_features  = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)
        
        # init mean, var , priors
        # for each class we need means of all features
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)
        self._priors = np.zeros((n_classes, 1),dtype=np.float64)
        
        for c in self._classes:
            X_c = X[c==y]
            self._mean[c, :] = X_c.mean(axis=0)
            self._var[c, :]  = X_c.var(axis=0)
            #print(f'X_c.shape {X_c.shape[0]}')
            self._priors[c, :]  = X_c.shape[0]/float(n_samples)
        
        pass
    
    # precicts for all samples in X
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return y_pred
    
    # predict method for one sample
    def _predict(self, X):
        posteriors = []
        for idx, c in enumerate(self._classes):
            prior = np.log(self._priors[idx])
            class_conditiona = np.sum(np.log(self._pdf(idx, X)))
            posterior = prior + class_conditiona
            posteriors.append(posterior)
        
        return self._classes[np.argmax(posteriors)]
            
            
    def _pdf(self, class_idx, x):
        mu = self._mean[class_idx]
        var = self._var[class_idx]
        numer  = np.exp((- (x - mu)**2)/(2 * var))
        denom = np.sqrt(2 * np.pi * var)
        return numer/denom

In [2]:
import numpy as np
import sklearn
from sklearn.model_selection import train_test_split
from sklearn import datasets

In [3]:
X, y = datasets.make_classification(n_samples = 1000, n_features = 10, n_classes = 2,
                                    random_state = 123)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)


# fit the model
nb = NaiveBayes()
nb.fit(X_train, y_train)
predictions = nb.predict(X_test)


def accuracy(y_true, y_pred):
    accuracy = np.sum( y_true == y_pred)/len(y_true)
    return accuracy

print(f"Accuracy: {accuracy(y_test, predictions)}")

Accuracy: 0.965
