### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?
Ans. To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use Bayes' theorem. Let's define the events:

    A: Employee uses the health insurance plan
    B: Employee is a smoker

We are given:

    P(A) = 0.70 (Probability that an employee uses the health insurance plan)
    P(B|A) = 0.40 (Probability that an employee is a smoker given that they use the health insurance plan)

We want to find:

    P(B|A) (Probability that an employee is a smoker given that they use the health insurance plan)

Using Bayes' theorem:
P(B|A) = (P(A|B) * P(B)) / P(A)

We don't have the direct probabilities P(A|B) or P(B), but we can calculate P(A) using the law of total probability:

P(A) = P(A|B') * P(B') + P(A|B) * P(B)
where B' represents the event of an employee not being a smoker.

Since we know that P(B) = 0.70 (complement of P(B')) and P(A|B) = 0.40, we can calculate P(A):

    P(A) = P(A|B') * (1 - P(B)) + P(A|B) * P(B)
    P(A) = 0 * (1 - 0.70) + 0.40 * 0.70
    P(A) = 0.28

Now, we can find P(B|A) using Bayes' theorem:

    P(B|A) = (P(A|B) * P(B)) / P(A)
    P(B|A) = (0.40 * 0.70) / 0.28
    P(B|A) = 0.5714 (approximately)

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.5714 or 57.14%.


### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans. The difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are designed to handle:

Bernoulli Naive Bayes: This classifier is used for binary feature data, where each feature can take only one of two possible values, typically 0 or 1. It assumes that each feature is conditionally independent of others given the class label. It is commonly used in text classification tasks, where each term either occurs (1) or does not occur (0) in a document.

Multinomial Naive Bayes: This classifier is used for discrete feature data, such as word counts or frequency data in text classification tasks. It can handle multiple discrete feature values and calculates the probability of observing a particular value for each class. It also assumes conditional independence between features given the class label.

In summary, Bernoulli Naive Bayes is suitable for binary feature data, while Multinomial Naive Bayes is appropriate for discrete feature data with multiple values.

### Q3. How does Bernoulli Naive Bayes handle missing values?
Ans.  Bernoulli Naive Bayes can handle missing values in a straightforward way. When a feature is missing for an instance in the training or testing data, it is treated as if the feature is not present (0) for that particular instance. This assumption aligns with the nature of Bernoulli Naive Bayes, where the features are binary (1 or 0).

By considering the missing features as non-existent (0), the classifier can still calculate the probabilities of each feature being present or absent for each class, and use Bayes' theorem to make predictions based on the available features.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?
Ans. Yes, Gaussian Naive Bayes can be used for multi-class classification. In Gaussian Naive Bayes, the features are assumed to follow a Gaussian (normal) distribution for each class. It works well with continuous feature data.

For multi-class classification, Gaussian Naive Bayes extends the model to accommodate multiple classes. When given a new instance with continuous feature values, the classifier calculates the probability of the instance belonging to each class using the Gaussian probability density function for each feature. It then selects the class with the highest probability as the predicted class for that instance.

In summary, Gaussian Naive Bayes is applicable for both binary and multi-class classification problems, as long as the features are assumed to be continuous and follow a Gaussian distribution for each class.

### Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:
Report the following performance metrics for each classifier:
    
    Accuracy
    Precision
    Recall
    F1 score

Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:
Summarise your findings and provide some suggestions for future work.

Note: This dataset contains a binary classification problem with multiple features. The dataset is relatively small, but it can be used to demonstrate the performance of the different variants of Naive Bayes on a real-world problem.

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
column_names = [
    f'feature_{i}' for i in range(57)
] + ['spam']  # Assuming the last column is the target variable (spam or non-spam)
data = pd.read_csv(url, names=column_names)

# Split the data into features (X) and target variable (y)
X = data.drop('spam', axis=1)
y = data['spam']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Naive Bayes classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Perform 10-fold cross-validation and evaluate each classifier's performance
classifiers = [bernoulli_nb, multinomial_nb, gaussian_nb]
classifier_names = ['Bernoulli Naive Bayes', 'Multinomial Naive Bayes', 'Gaussian Naive Bayes']

for i, classifier in enumerate(classifiers):
    print(f"Evaluating {classifier_names[i]}:")
    accuracy_scores = cross_val_score(classifier, X_train, y_train, cv=10, scoring='accuracy')
    print("Accuracy: {:.2f}".format(np.mean(accuracy_scores)))

    precision_scores = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision')
    print("Precision: {:.2f}".format(np.mean(precision_scores)))

    recall_scores = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')
    print("Recall: {:.2f}".format(np.mean(recall_scores)))

    f1_scores = cross_val_score(classifier, X_train, y_train, cv=10, scoring='f1')
    print("F1 Score: {:.2f}".format(np.mean(f1_scores)))

    print("-------------------------------------------------")

# Training the best performing model on the entire training set and evaluating on the test set
best_model = multinomial_nb  # Replace with the best performing model
best_model.fit(X_train, y_train)

y_pred = best_model.predict(X_test)
print(f"Accuracy on Test Set: {accuracy_score(y_test, y_pred):.2f}")
print(f"Precision on Test Set: {precision_score(y_test, y_pred):.2f}")
print(f"Recall on Test Set: {recall_score(y_test, y_pred):.2f}")
print(f"F1 Score on Test Set: {f1_score(y_test, y_pred):.2f}")

Evaluating Bernoulli Naive Bayes:
Accuracy: 0.89
Precision: 0.88
Recall: 0.81
F1 Score: 0.85
-------------------------------------------------
Evaluating Multinomial Naive Bayes:
Accuracy: 0.79
Precision: 0.74
Recall: 0.71
F1 Score: 0.72
-------------------------------------------------
Evaluating Gaussian Naive Bayes:
Accuracy: 0.82
Precision: 0.70
Recall: 0.96
F1 Score: 0.81
-------------------------------------------------
Accuracy on Test Set: 0.79
Precision on Test Set: 0.76
Recall on Test Set: 0.72
F1 Score on Test Set: 0.74
