## 🚀 Program 5

### 📋 Objective

#####  Write a program to implement the **Naive Bayesian Classifier** for a sample training dataset stored as a .CSV file. Compute the accuracy of the classifier, considering few test datasets. Assuming a set of documents that need to be classified, use the Naive Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and recall for your dataset.

In [1]:
# Import necessary libraries
import pandas as pd, math
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [2]:
# Load the dataset
df = pd.read_csv("sample_data.csv")

In [3]:
# Split the dataset into features and labels
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

In [4]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

In [5]:
# Function to calculate the Gaussian probability density function
def summarize(X, y):
    return {c: [(sum(col)/len(col), math.sqrt(sum((x-sum(col)/len(col))**2 for x in col)/(len(col)-1))) 
            for col in zip(*[X[i] for i in range(len(X)) if y[i]==c])] for c in set(y)}

In [6]:
# Function to calculate the Gaussian probability density function
def gaussian(x, mean, stdev):
    if stdev == 0: return 1 if x == mean else 0
    return (1/(math.sqrt(2*math.pi)*stdev))*math.exp(-((x-mean)**2)/(2*stdev**2))

In [7]:
# Function to predict the class label for a new instance using the Gaussian Naive Bayes algorithm
def predict(summ, row):
    probs = {c: math.prod([gaussian(row[i], m, s) for i, (m, s) in enumerate(stats)]) for c, stats in summ.items()}
    return max(probs, key=probs.get)

In [8]:
# Summarize the training data
summ = summarize(X_train, y_train)

In [9]:
# Predicting the class labels for the test set
preds = [predict(summ, row) for row in X_test]

In [10]:
# Calculate accuracy, precision, and recall
print(f"Accuracy: {accuracy_score(y_test, preds)*100:.2f}%")
print(f"Precision: {precision_score(y_test, preds, pos_label='Yes', average='binary'):.2f}")
print(f"Recall: {recall_score(y_test, preds, pos_label='Yes', average='binary'):.2f}")

Accuracy: 40.00%
Precision: 0.67
Recall: 0.50
