# Naive Bayes Classifier

A <b>Naive Bayes classifier</b> is a probabilistic Machine Learning model that’s used for classification task. The crux of the classifier is based on the Bayes' theorem.

### Bayes' Theorem

$$ P(A/B) = \frac{P(B/A) * P(A)} {P(B)} $$ 
<br>
$$ posterior proababilty = \frac{likelihood * prior} {marginal} $$

Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is the evidence and A is the hypothesis. 

### Assumptions

Naive Bayes Classifier assumes that <b>each feature acts independent of each other</b> i.e. the presence of one feature is unrelated to the presence or absence of another feature, even if those features are dependent on each other.     
In Layman terms, it beahves as if each feature contributes independently. <br>

So, "eat healthy avoid junk" would have the same effect/weightage as "eat junk avoid healthy".

### Types of Naive Bayes Classifier:

The different Naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of $ P(\frac{x_i}{y})$. <br>

#### Gaussian Naive Bayes:
When the predictors take up a continuous value and are not discrete, we assume that these values are sampled from a gaussian distribution. Since the way the values are present in the dataset changes, the formula for conditional probability changes to,
$$ 
P(\frac{x_i}{y}) = \frac{1}{\sqrt{2πσ_y^2}} exp(\frac{-(x_i-u_i)^2}{2σ_y^2})
$$


#### Multinomial Naive Bayes:
This is mostly used for document classification problem, i.e whether a document belongs to the category of sports, politics, technology etc. The features/predictors used by the classifier are the frequency of the words present in the document.

#### Bernoulli Naive Bayes:
This is similar to the multinomial naive bayes but the predictors are boolean variables. The parameters that we use to predict the class variable take up only values yes or no, for example if a word occurs in the text or not.

Other popular Naive Bayesian models are **Categorical** Naive Bayes and **Complement** Naive Bayes

### Advantages

- It is an easy and fast Classification algorithm. 
- When assumption of independence holds, a Naive Bayes classifier performs better compare to other sophisticated Classifcation models with less training data.
- It also performs well in multi-class prediction.

### Disadvantages

- If categorical variable has a feature (in test data set), which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as *Zero Frequency*. To solve this, we can use the *smoothing technique*, such as Laplace estimation.
- Naive Bayes assumes independent predictors. In real life, it is almost impossible that we get a set of predictors which are completely independent.

### Naive Bayes Implementation from Sctratch with Mathematical formulas (Gaussian distribution), explanation, steps and then code

NOTE: Following section implements Gaussian Naive Bayes, which asumes that the features have Normal distribution

In [1]:
import numpy as np
import math

In [15]:
class NaiveBayes:
    
    def fit(self, X, y):
        '''
        Fits Machine Learning model with training data 'X' and 'y'
        Input: Features matrix 'X' and target vector 'y', dtype=ndarray
        '''
        self.X, self.y = X, y
        self.classes = np.unique(y)
        
        # calculate mean, variance of each feature for each class
        self.parameters = []  # {"mean": mean, "var": variance} of each feature (column), shape=(num_features, num_classes)
        
        for idx, cls in enumerate(self.classes):
            # only select rows where labels equal the given class (cls)
            X_where_cls = X[np.where(y==cls)]
            self.parameters.append([])
            # add the mean and variance for each feature (column)
            for col in X_where_cls.T:
                parameters = {"mean": col.mean(), "var": col.var()}
                self.parameters[idx].append(parameters)
                
        
    def predict(self, X):
        '''
        Predict the class labels of samples in X
        '''
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)
    
    def _calculate_prior(self, cls):
        '''
        Calculate prior (frequency) of class cls
        prior = (samples where class == cls) / (total number of samples)
        '''
        prior = np.mean(self.y==cls)
        return prior
    
    def _calculate_likelihood(self, x, mean, var):
        '''
        Calculate Gaussian likelhood of the data x given mean and variance
        '''
        eps = 1e-4  # added in denominator to prevent divison by zero
        numerator = math.exp(-(math.pow(x-mean, 2) / (2 * var + eps)))
        denominator = 1.0 / math.sqrt(2.0 * math.pi * var + eps) 
        return numerator * denominator
        
    
    def _predict(self, x):
        ''' Classification using Bayes Rule P(Y|X) = P(X|Y)*P(Y)/P(X),
            or Posterior = Likelihood * Prior / Scaling Factor
        P(Y|X) - The posterior is the probability that sample x is of class y given the
                 feature values of x being distributed according to distribution of y and the prior.
        P(X|Y) - Likelihood of data X given class distribution Y.
                 Gaussian distribution (given by _calculate_likelihood)
        P(Y)   - Prior (given by _calculate_prior)
        P(X)   - Scales the posterior to make it a proper probability distribution.
                 This term is ignored in this implementation since it doesn't affect
                 which class distribution the sample is most likely to belong to.
        Classifies the sample as the class that results in the largest P(Y|X) (posterior)
        '''
        
        posteriors = []
        # go through list of class
        for idx, cls in enumerate(self.classes):
            prior = self._calculate_prior(cls)
            # initialize posterior for later calculation
            posterior = 1
            # calculate posterior for each class
            for feature_val, params in zip(x, self.parameters[idx]):
                likelihood = self._calculate_likelihood(feature_val, params["mean"], params["var"])
                posterior *= prior * likelihood
            posteriors.append(posterior)
        # return the class with the largest posterior probability
        return self.classes[np.argmax(posteriors)]                     

In [24]:
# testing model accuracy using synthetic data

from sklearn.model_selection import train_test_split
from sklearn import datasets

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

# generate synthetic data
X, y = datasets.make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# test model's performance
model = NaiveBayes()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))

Naive Bayes classification accuracy 0.965


### Naive Bayes implementation using sklearn

NOTE: Following section implements Guassian Naive Bayes algorithm using `GaussianNB` module from `sklearn.naive_bayes` package.

In [22]:
from sklearn.naive_bayes import GaussianNB
# other popular Naive Bayes modules: MultinomialNB, ComplementNB, BernoulliNB, CategoricalNB 

In [25]:
# testing model accuracy using synthetic data

from sklearn.model_selection import train_test_split
from sklearn import datasets

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

# generate synthetic data
X, y = datasets.make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=123)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# test model's performance
model = GaussianNB()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))

Naive Bayes classification accuracy 0.965


## References

- <a href="https://scikit-learn.org/stable/modules/naive_bayes.html">Naive Bayes - sklearn </a>
- <a href="https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/supervised_learning/naive_bayes.py">Machine Learning from Scratch - Naive Bayes </a>
- <a href="https://github.com/AssemblyAI-Examples/Machine-Learning-From-Scratch/blob/main/06%20NaiveBayes/naive_bayes.py">How to implement Naive Bayes from scratch with Python</a>
- <a href="https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c">Naive Bayes Classifier - towardsdatascience blog </a>
- <a href="https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf">All about Naive Bayes - towardsdatascience blog</a>