### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


A: Employee uses the company's health insurance plan.
B: Employee is a smoker.

We are given the following probabilities:

- \( P(A) \) = Probability that an employee uses the health insurance plan = 0.70 (70%)
- \( P(B|A) \) = Probability that an employee is a smoker given that they use the health insurance plan = 0.40 (40%)

We want to find \( P(B|A) \), which represents the probability of an employee being a smoker given that they use the health insurance plan.

Using Bayes' theorem:

\[$ P(B|A) = \frac{P(A|B) \cdot P(B)}{P(A)} \$]

Now, we need to find \( P(A|B) \) and \( P(B) \).

Since we don't have the direct value of \( P(A|B) \), we can use the complement rule:

\[$ P(A|B) = 1 - P(\text{not A|B}) \$]

The event "not A" represents an employee not using the health insurance plan.

\[$ P(\text{not A|B}) = 1 - P(A|B) = 1 - 0.40 = 0.60 \$]

Next, we find \( P(B) \):

\[$ P(B) = P(B|A) \cdot P(A) + P(B|\text{not A}) \cdot P(\text{not A}) \$]

We are not given the probability of an employee being a smoker given that they do not use the health insurance plan (\($ P(B|\text{not A}) \$)), but we can infer it using the complement rule:

\[$ P(B|\text{not A}) = 1 - P(\text{not B|\text{not A}}) \$]

The event "not B" represents an employee not being a smoker.

Assuming that all non-users of the health insurance plan are non-smokers:

\[$ P(B|\text{not A}) = 1 - 0 = 1 \$]

Since all non-users are non-smokers, the probability of a smoker not using the health insurance plan is 0.

Now we can calculate \( P(B) \):

\[$ P(B) = P(B|A) \cdot P(A) + P(B|\text{not A}) \cdot P(\text{not A}) \$]

\[$ P(B) = 0.40 \cdot 0.70 + 1 \cdot (1 - 0.70) \$]

\[$ P(B) = 0.28 + 0.30 = 0.58 \$]

Now we can use Bayes' theorem to find \( P(B|A) \):

\[$ P(B|A) = \frac{P(A|B) \cdot P(B)}{P(A)} \$]

\[$ P(B|A) = \frac{0.40 \cdot 0.58}{0.70} \$]

\[$ P(B|A) = \frac{0.232}{0.70} \approx 0.3314 \$]

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.3314 or 33.14%.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?



The difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the types of data they are designed to handle and the way they model the features.

#### Bernoulli Naive Bayes:

- Suitable for binary or boolean features, where each feature can take only two possible values (e.g., 0 or 1, yes or no).
- Assumes that each feature is binary and independent of each other given the class label.
- It models the presence or absence of each feature in a document or instance.
- Often used in text classification tasks, where the presence or absence of specific words or features in a document is relevant for classification.
#### Multinomial Naive Bayes:

- Suitable for discrete features that represent counts or frequencies of occurrences (e.g., word counts in a document).
- Assumes that each feature follows a multinomial distribution, meaning it counts the occurrences of each feature for each class.
- It models the frequency of each feature in a document or instance, considering the number of occurrences.
Commonly used in text classification tasks, such as spam detection, sentiment analysis, and document categorization.

### Q3. How does Bernoulli Naive Bayes handle missing values?


Bernoulli Naive Bayes handles missing values in a straightforward manner by simply ignoring the missing values during the calculation of probabilities. Since Bernoulli Naive Bayes is designed to work with binary features, where each feature can only take two possible values (e.g., 0 or 1, yes or no), any missing value is treated as a separate category or state.

Here's how missing values are handled in Bernoulli Naive Bayes:

1. **Training Phase**:
   - During the training phase, the algorithm calculates the probabilities of each feature (0 or 1) occurring for each class based on the available training data. The presence of a feature is denoted as 1, and its absence is denoted as 0.
   - If a specific feature value is missing for a data point in the training data, it is simply ignored during the probability estimation. The algorithm continues to calculate the probabilities based on the non-missing feature values for that data point.

2. **Prediction Phase**:
   - When making predictions for new instances in the prediction phase, if a feature value is missing for a particular instance, the algorithm will again ignore that missing value during the probability calculation for each class.
   - The missing value is treated as if it were a separate feature state and is not considered in the probability estimation for that particular feature.


### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes continuous features, specifically, features that follow a Gaussian (normal) distribution within each class. While it is often used for binary or two-class classification problems, it can also be extended to handle multi-class classification problems by employing the "one-vs-all" or "one-vs-one" strategies.

Here's how Gaussian Naive Bayes can be used for multi-class classification:

1. **One-vs-All (OvA) Strategy**:
   - In the OvA strategy, the multi-class classification problem is divided into multiple binary classification subproblems.
   - For each class, a binary classifier is trained to distinguish that class from all other classes (i.e., one class against the rest).
   - During the prediction phase, each classifier's probability outputs are collected, and the class with the highest probability is selected as the predicted class.

2. **One-vs-One (OvO) Strategy**:
   - In the OvO strategy, the multi-class classification problem is divided into pairwise binary classification subproblems.
   - For each pair of classes, a binary classifier is trained to distinguish between those two classes only.
   - During the prediction phase, each classifier's probability outputs are collected, and a voting scheme is used to determine the predicted class based on the most frequently selected class in the binary classifications.

Both strategies allow Gaussian Naive Bayes to handle multi-class classification problems effectively. The choice between the OvA and OvO strategies depends on factors such as the dataset size, computational resources, and the classifier's performance on the specific problem.

### Q5. Assignment:

#### Data preparation:

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.
#### Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.
#### Results:
- Report the following performance metrics for each classifier:
- Accuracy
- Precision
- Recall
- F1 score
#### Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Step 2: Load and preprocess the data (assuming you have downloaded the data as "spambase.csv")
data = pd.read_csv("spambase.data",header=None)
X = data.iloc[:,:57]
y = data.iloc[:,57]

# Step 3: Implement and evaluate the classifiers
def evaluate_classifier(classifier, name):
    accuracy = np.mean(cross_val_score(classifier, X, y, cv=10, scoring="accuracy"))
    precision = np.mean(cross_val_score(classifier, X, y, cv=10, scoring="precision"))
    recall = np.mean(cross_val_score(classifier, X, y, cv=10, scoring="recall"))
    f1_score = np.mean(cross_val_score(classifier, X, y, cv=10, scoring="f1"))
    
    print(f"Results for {name} Naive Bayes Classifier:")
    print(f"Accuracy: {accuracy:.2f}")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1 Score: {f1_score:.2f}")
    print("\n")

# Instantiate and evaluate the classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

evaluate_classifier(bernoulli_nb, "Bernoulli")
evaluate_classifier(multinomial_nb, "Multinomial")
evaluate_classifier(gaussian_nb, "Gaussian")


Results for Bernoulli Naive Bayes Classifier:
Accuracy: 0.88
Precision: 0.89
Recall: 0.82
F1 Score: 0.85


Results for Multinomial Naive Bayes Classifier:
Accuracy: 0.79
Precision: 0.74
Recall: 0.72
F1 Score: 0.73


Results for Gaussian Naive Bayes Classifier:
Accuracy: 0.82
Precision: 0.71
Recall: 0.96
F1 Score: 0.81




Discussion:

- The results obtained for each classifier show that Bernoulli naive bayes performed the best in this specific dataset with the highest accuracy of 88%.
