#(Model 2)Naive Bayes


The Naive Bayes model leverages the assumption of class conditioned independence in order to utilise the theory of Bayes Theorem in predicting the class of a sample, given its underlying distribution and prior probabilities. The posterier probability of a sample belonging to a given class k is given by:

$$P(C_k | x) = \frac{P(x | C_k) \cdot P(C_k)}{P(x)}$$

Bayes theorem is applicable for the implementation of this model because it assumes that the distribution of the input variables are independent of each other conditionally. Under this assumption, the solution using the maximum likelihood function indicates that the training data can be fitted using the labelled data for all individual classes. Keeping in mind the continuous nature of variables present in the dataset, I have implemented the Gaussian Naive Bayes model for solving the problem at hand.

$$ f(x | \mu, \sigma) = \frac{1}{\sqrt{2\pi}\sigma} \cdot e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$

Our attribute feature space is 20 dimensionsal, making Naive Bayes model a wise choice for the classification task because of its high efficiency pertaining to high dimensional inputs. The percentage error associated with the classification can be treated as indicator of a certain degree of correlation among the features.

In [None]:
# Importing necessary libraries
import numpy as np
from collections import defaultdict
from scipy.stats import norm
from sklearn.metrics import accuracy_score

# Defining a class for our Naive Bayes model
class NaiveBayesClassifier:
    def __init__(self):
        self.class_probs = defaultdict(float)
        self.class_means = defaultdict(list)
        self.class_stds = defaultdict(list)


# This function is utilised to train the data by calculating the associated means and deviations.
    def fit(self, X, y):
        unique_classes = np.unique(y)

        for c in unique_classes:
            class_instances = X[y == c]
            self.class_probs[c] = len(class_instances) / len(X)
            self.class_means[c] = np.mean(class_instances, axis=0)
            self.class_stds[c] = np.std(class_instances, axis=0)


# The maximum likelihood function associated with the Gaussian probability density is calculated using this function.

    def calculate_likelihood(self, x, mean, std):
        exponent = np.exp(-((x - mean) ** 2) / (2 * (std ** 2)))
        return (1 / (np.sqrt(2 * np.pi) * std)) * exponent


# The prediction is made for a single instance of the samples.

    def predict_instance(self, x):
        likelihoods = {}

        for c in self.class_probs:
            class_prob = np.log(self.class_probs[c])
            class_likelihood = np.sum(np.log(self.calculate_likelihood(x, self.class_means[c], self.class_stds[c])))
            likelihoods[c] = class_prob + class_likelihood

        return max(likelihoods, key=likelihoods.get)

# The consolidated prediction for the complete dataset is made by calling predict_instance for all samples and appending to the predictions array.

    def predict(self, X):
        predictions = []

        for instance in X:
            predictions.append(self.predict_instance(instance))

        return predictions

In [None]:
# This code block instantiates nb_classifier with the Naive Bayes Classifier class and performs subsequent model fitting and predictions.
nb_classifier = NaiveBayesClassifier()

# The nb_classifier model employs the 'fit' function to the train the data.
nb_classifier.fit(X_train, y_train)

# Model predictions are now made to get the outcomes using the 'predict' function defined within the classifier class.
predictions = nb_classifier.predict(X_test)

# The model's utility in solving the classification task is evaluated through the 'accuracy_score' metric from the sklearn library.
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 0.71
