# **Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?**


### Probability of an Employee Being a Smoker Given Health Insurance Usage:
We can use Bayes’ theorem to find the probability that an employee is a smoker given that they use the health insurance plan.

Let’s denote:
* (S): Event that an employee is a smoker.
* (H): Event that an employee uses the health insurance plan.

We are given:
* (P(H) = 0.7) (probability of using health insurance plan)
* (P(S|H) = 0.4) (probability of being a smoker given health insurance usage)

We want to find (P(S|H)).

* Bayes’ theorem states: [ P(S|H) = \frac{{P(H|S) \cdot P(S)}}{{P(H)}} ]
* Plugging in the given values: [ P(S|H) = \frac{{0.4 \cdot 0.7}}{{0.7}} = 0.4 ]

Therefore, the probability that an employee is a smoker given that they use the health insurance plan is 0.4.

# **Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?**


## * Both Bernoulli Naive Bayes (BNB) and Multinomial Naive Bayes (MNB) are variants of the Naive Bayes algorithm.
* Bernoulli Naive Bayes:
    * Used for binary features (e.g., presence/absence of a feature).
    * Models the presence/absence of a feature.
    * Counts how many times a feature does not occur.

* Example: Text classification where each word is either present or absent.

* Multinomial Naive Bayes:
    * Widely used for document classification.
    * Models the number of counts of a feature (e.g., word frequencies).
    * Considers multiple features that occur.
* Example: Classifying documents based on word frequencies.

* In summary:
BNB focuses on a single feature, while MNB considers multiple features and their counts.

# **Q3. How does Bernoulli Naive Bayes handle missing values?**


## Handling Missing Values in Bernoulli Naive Bayes:
* Bernoulli Naive Bayes assumes binary features (presence/absence).
* When dealing with missing values:
    - Treat missing features as absent (i.e., value = 0).
    - This aligns with the binary nature of BNB.
    - It simplifies the model by not introducing additional complexity for handling missing data.

# **Q4. Can Gaussian Naive Bayes be used for multi-class classification?**

# Gaussian Naive Bayes for Multi-Class Classification:
* Gaussian Naive Bayes (GNB) assumes that features follow a Gaussian (normal) distribution.
* GNB is primarily used for continuous-valued features.
* It can be used for multi-class classification by extending the binary classification approach.
* Each class has its own Gaussian distribution for each feature.
* GNB calculates probabilities based on continuous feature values.
* Therefore, yes, GNB can be used for multi-class classification.

# **Q5. Assignment:**
## **Data preparation:**
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
## **Implementation:**
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
# **Results:**
**Report the following performance metrics for each classifier:**
* Accuracy
* Precision
* Recall
* F1 score
# **Discussion:**
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
# **Conclusion:**
Summarise your findings and provide some suggestions for future work.

In [8]:
from ucimlrepo import fetch_ucirepo 
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split 

spambase = fetch_ucirepo(id=94) 


# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 

