***
$\mathbf{\text{Naive Bayes Algorithm (GaussianNB)}}$<br>
***

Naive Bayes is a supervised classification algorithm based on Bayes' theorem. In layman's term, it is a probabilistic classifier, which means it predicts on the basis of the probability of an event.

<li> It can be used for binary as well as multi-calss classifications.</li>
<li> It performs well in multi-class classifications as compared to other algorithms.</li>
<li> It is widely used for spam detection and sentiment analysis.</li>
    

There are three common types of Naive Bayes classifier are:

<li> Gaussian: It is used in classification and it assumes that features follow a normal distribution.</li>
<li> Multinomial: MultinomialNB implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice).</li>
<li> Bernoulli: BernoulliNB implements the naive Bayes training and classification algorithms for data that is distributed according to multivariate Bernoulli distributions; i.e., there may be multiple features but each one is assumed to be a binary-valued (Bernoulli, boolean) variable. </li>

It makes a naive assumption that every pair of features being classified are independent of each other. Hence it is called Naive Bayes.

Mathematically, we can write it as:
$$
P(X|y) = \frac{P(y|X) .P(X)}{P(y)}
$$

<li>P(X|y) = class conditional probability(probability of X when event y has already happened),</li>
<li>P(y|X) = Posterior probability(probability of y when event X has already happened),</li>
<li>P(X) = prior probability of X(probability of event X happening),</li>
<li>P(y) = prior probability of y(probability of event y happening)</li>

We can write our feature vectors X as:

\begin{align}
X = (x_{1}, x_{2}, x_{3}, .... ,x_{n})
\end{align}


Assuming that all the features are mutually independent:

\begin{align}
P(y|X) = \frac{P(x_{1}|y) .P(x_{2}|y) .P(x_{3}|y) ....P(x_{n}|y) .P(y)}{P(X)}
\end{align}

Now we will be selecting the class with highest probability:
\begin{align}
y = argmax_{y} P(y|X)
\end{align}

Putting in P(y|X), as all the features are mutually independent:
\begin{align}
y = argmax_{y} \frac{P(x_{1}|y) .P(x_{2}|y) .P(x_{3}|y) ....P(x_{n}|y) .P(y)}{P(X)}
\end{align}

As we are only considered with the probability of y, we can omit P(X)

\begin{align}
y = argmax_{y} P(x_{1}|y) .P(x_{2}|y) .P(x_{3}|y) ....P(x_{n}|y) .P(y)
\end{align}

Multiplying all the probabilities will result in a very small number and hence to avoid this, we will apply the log function:

\begin{align}
y = argmax_{y} log(P(x_{1}|y)) +log(P(x_{2}|y)) +log(P(x_{3}|y)) +....+ log(P(x_{n}|y)) .P(y)
\end{align}

For Gaussian distribution, class conditional probability will be given as:

\begin{align}
P(x_{i}|y) = \frac{1}{\sqrt{2πσ^{2}_{y}}}  .exp (-\frac{(x_{i} - µ_{y})^{2}}{2σ^{2}_{y}})
\end{align}

In [1]:
import numpy as np

In [2]:
class NaiveBayes():
    
    def fit(self, X, y):
        n_samples, n_features = X.shape #X is a numpy n-dimension array
        self._classes = np.unique(y) #To find unique elements of an array
        n_classes = len(self._classes) #To specify number of classes
        
        # initializing mean, variance and priors with zeros
        self._mean = np.zeros((n_classes, n_features), dtype = np.float64)
        self._var = np.zeros((n_classes, n_features), dtype = np.float64)
        self._priors = np.zeros(n_classes, dtype = np.float64)
        
        for index, c in enumerate(self._classes):
            X_c = X[y == c]
            self._mean[c,:] = X_c.mean(axis=0)
            self._var[c,:] = X_c.var(axis=0)
            self._priors[c] = X_c.shape[0] / float(n_samples)
    
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)
        
    def _predict(self, x):
        posteriors = []
        
        # Calculating posterior probability for each class
        for index, c in enumerate(self._classes):
            prior = np.log(self._priors[index])
            posterior = np.sum(np.log(self._pdf(index, x)))
            posterior = prior + posterior
            posteriors.append(posterior)
        
        # return class with highest probability
        return self._classes[np.argmax(posteriors)]
            
    def _pdf(self, class_index, x):
        mean = self._mean[class_index]
        var = self._var[class_index]
        numerator = np.exp(-((x - mean)**2)/(2*var))
        denominator = np.sqrt(2 * np.pi * var)
        
        return numerator / denominator

### Implementing the algorithm to IRIS dataset from sklearn and checking the accuracy score

In [3]:
if __name__ == "__main__":
    
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    
    def accuracy(y_true, y_pred):
        accuracy = np.sum(y_true == y_pred) / len(y_true)
        return accuracy
    
    df = load_iris()
    
    X, y = df.data, df.target
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=1234)
    
    nb = NaiveBayes()
    
    nb.fit(X_train, y_train)
    
    y_pred = nb.predict(X_test)
    
    print("The accuracy of the Gaussian Naive Bayes model is: ", accuracy(y_test, y_pred))

The accuracy of the Gaussian Naive Bayes model is:  0.9555555555555556
