## Naive Bayes

Naive Bayes is a probabilistic algorithm used for classification and is based on Bayes' theorem. It assumes that the features are independent of each other, hence the name "naive".

In Naive Bayes, we calculate the probability of a data point belonging to each possible class based on the probability of each feature given that class. Then, we choose the class with the highest probability as the predicted class for the data point.

Naive Bayes is commonly used for text classification, spam filtering, sentiment analysis, and recommendation systems. It is computationally efficient and requires a relatively small amount of training data. However, its assumption of independence between features may not hold in some cases, which can affect its accuracy.

The formula for Naive Bayes can be derived from Bayes' theorem, which states that the probability of a hypothesis (H) given the observed evidence (E) is proportional to the product of the prior probability of the hypothesis (P(H)) and the likelihood of the evidence given the hypothesis (P(E|H)), divided by the marginal likelihood of the evidence (P(E)):

P(H|E) = P(H) * P(E|H) / P(E)

In the context of Naive Bayes, we use this formula to calculate the probability of a data point belonging to a specific class (H) given its features (E). We assume that the features are conditionally independent given the class, so we can calculate the likelihood as the product of the probabilities of each feature given the class:

P(E|H) = P(feature1|H) * P(feature2|H) * ... * P(featureN|H)

Then, we use Bayes' theorem to calculate the probability of each class given the data point's features and choose the class with the highest probability:

P(H|E) = P(H) * P(E|H) / P(E)

where P(E) is a normalization constant that ensures the sum of the probabilities of all possible classes adds up to 1.

In practice, we estimate the probabilities and likelihoods from the training data, and use them to classify new data points based on their features.

In [None]:
import numpy as np

class NaiveBayes:
    
    def fit(self, X, y):
        n_samples, n_features = X.shape
        self._classes = np.unique(y)
        n_classes = len(self._classes)
        
        #calculate mean, variance, prior for each class
        self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
        self._var = np.zeros((n_classes, n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)
        
        for idx, c in enumerate(self._classes):
            X_c = X[y == c]
            self._mean[idx, :] = X_c.mean(axis=0)
            self._var[idx, :] = X_c.var(axis=0)
            self._priors[idx] = X_c.shape[0] / float(n_samples)
            
    def predict(self, X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)
    
    def _predict(self, x):
        posteriors = []
        
        for idx, c in enumerate(self._classes):
            prior = np.log(self._priors[idx])
            posterior = np.sum(np.log(self._prob_density_fn(idx, x)))
            posterior += prior
            posteriors.append(posterior)
            
        return self._classes[np.argmax(posteriors)]
    
    def _prob_density_fn(self, class_idx, x):
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        numerator = np.exp(-((x - mean) ** 2) / (2 * var))
        denominator = np.sqrt(2 * np.pi * var)
        return numerator / denominator
    
      

In [None]:
from sklearn.model_selection import train_test_split
from sklearn import datasets

def accuracy(y_true, y_pred):
    accuracy = np.sum(y_true == y_pred) / len(y_true)
    return accuracy

X, y = datasets.make_classification(n_samples=1000, n_features=10, n_classes=2)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

nb = NaiveBayes()
nb.fit(X_train, y_train)
predictions = nb.predict(X_test)

print("Naive Bayes classification accuracy", accuracy(y_test, predictions))  