Q1. A company conducted a survey of its employees and found that 70% of the employees use the  
company's health insurance plan, while 40% of the employees who use the plan are smokers.    
What is theprobability that an employee is a smoker given that he/she uses the health insurance plan?     
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?   
Q3. How does Bernoulli Naive Bayes handle missing values?  
Q4. Can Gaussian Naive Bayes be used for multi-class classification?  

### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers.

To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use Bayes' theorem.

Let:
- \( A \) be the event that an employee uses the health insurance plan.
- \( B \) be the event that an employee is a smoker.

We are given:
- \( P(A) = 0.70 \) (probability that an employee uses the health insurance plan)
- \( P(B|A) = 0.40 \) (probability that an employee is a smoker given that they use the health insurance plan)

We want to find \( P(B|A) \), the probability that an employee is a smoker given that they use the health insurance plan.

Using Bayes' theorem:

$ P(B|A) = \frac{P(A|B) \times P(B)}{P(A)} $

We already have \( P(A|B) = 0.40 \) (probability that an employee uses the health insurance plan given that they are a smoker), and \( P(A) = 0.70 \).

We need to find \( P(B) \), the probability that an employee is a smoker. We can use the law of total probability to find \( P(B) \):

$ P(B) = P(B|A) \times P(A) + P(B|\neg A) \times P(\neg A) $

Where:
- \( P(B|A) \) is the probability that an employee is a smoker given that they use the health insurance plan.
- \( P(\neg A) \) is the probability that an employee does not use the health insurance plan (complement of \( P(A) \)).
- \( P(B|\neg A) \) is the probability that an employee is a smoker given that they do not use the health insurance plan.

Given that $( P(\neg A) = 1 - P(A) = 0.30 )$, and we don't have $( P(B|\neg A)$ directly, we need more information to solve for $( P(B) )$.

If we assume that the probability of an employee being a smoker is independent of whether they use the health insurance plan or not, we can set \( P(B) = P(B|A) \). This assumption is often made in the absence of additional information.

Using this assumption:

\[ P(B) = P(B|A) = 0.40 \]

Now, we can calculate \( P(B|A) \):

$ P(B|A) = \frac{0.40 \times 0.70}{0.70} = \boxed{0.40} $

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.40, assuming independence between smoking and health insurance plan usage.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in how they model the features and the type of data they are suitable for.

1. **Bernoulli Naive Bayes**:
   - Bernoulli Naive Bayes assumes that features are binary-valued (e.g., presence or absence of a feature).
   - It is commonly used for binary feature vectors, where each feature represents the presence or absence of a particular attribute.
   - In text classification, for example, each feature could represent the presence or absence of a word in a document (e.g., "word is present" or "word is absent").
   - Bernoulli Naive Bayes models the likelihood of each feature occurring in each class using a Bernoulli distribution.

2. **Multinomial Naive Bayes**:
   - Multinomial Naive Bayes assumes that features represent counts or frequencies (e.g., word counts in text).
   - It is suitable for datasets where features are represented as integer counts, such as the frequency of words in a document.
   - In text classification, each feature typically represents the frequency of a word or term in a document.
   - Multinomial Naive Bayes models the likelihood of each feature occurring in each class using a Multinomial distribution.

In summary, Bernoulli Naive Bayes is used when features are binary-valued, whereas Multinomial Naive Bayes is used when features represent counts or frequencies. The choice between the two depends on the nature of the data and how the features are represented.

### Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes, like many machine learning algorithms, requires complete data without missing values. If your dataset contains missing values, you would typically need to handle them before applying Bernoulli Naive Bayes.

Here are some common strategies to handle missing values in a dataset before using Bernoulli Naive Bayes:

1. **Imputation**: Replace missing values with a suitable estimate. For binary features, you might impute missing values with the mode (most frequent value) of that feature. However, be cautious with imputation, as it can introduce bias into your dataset.

2. **Deletion**: Remove rows or columns with missing values. If the number of missing values is small compared to the size of your dataset, this might be a viable option. However, you risk losing valuable information by deleting observations or features.

3. **Model-based imputation**: Use other models to predict missing values based on the non-missing values in the dataset. For example, you could train a separate classifier to predict missing values based on the other features in the dataset.

4. **Consider encoding missing values**: In some cases, you might treat missing values as a separate category or encode them with a special value before applying Bernoulli Naive Bayes. However, this approach could potentially introduce noise into your data, so use it judiciously.

It's important to carefully consider the implications of each approach and choose the one that best suits your dataset and the underlying problem. Additionally, it's good practice to evaluate the performance of your model after handling missing values to ensure that the chosen approach does not adversely affect the model's performance.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. In fact, Gaussian Naive Bayes is naturally suited for multi-class classification tasks where the features are continuous and assumed to follow a Gaussian (normal) distribution.

In the case of multi-class classification, Gaussian Naive Bayes estimates the parameters (mean and variance) of the Gaussian distribution for each class based on the training data. When making predictions on new data, it calculates the likelihood of the observed features given each class using the Gaussian probability density function. It then combines these likelihoods with the prior probabilities of each class to compute the posterior probability of each class given the observed features.

The class with the highest posterior probability is predicted as the output class for the given input features.

In summary, Gaussian Naive Bayes can handle multi-class classification by estimating Gaussian distributions for each class and making predictions based on the likelihoods of the observed features. It is a simple and efficient algorithm suitable for classification tasks with continuous features.

### Q5. Assignment:

Data preparation:  
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.     

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:

Report the following performance metrics for each classifier:  
Accuracy   
Precision   
Recall  
F1 score    

Discussion:   
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?   


Conclusion:    
Summarise your findings and provide some suggestions for future work.

In [1]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn import metrics

# Data preparation
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Implementation
classifiers = {'Bernoulli Naive Bayes': BernoulliNB(),
               'Multinomial Naive Bayes': MultinomialNB(),
               'Gaussian Naive Bayes': GaussianNB()}

results = {}

for name, clf in classifiers.items():
    # 10-fold cross-validation
    scores = cross_val_score(clf, X, y, cv=10)
    # Performance metrics
    accuracy = scores.mean()
    precision = cross_val_score(clf, X, y, cv=10, scoring='precision').mean()
    recall = cross_val_score(clf, X, y, cv=10, scoring='recall').mean()
    f1 = cross_val_score(clf, X, y, cv=10, scoring='f1').mean()
    results[name] = {'Accuracy': accuracy,
                     'Precision': precision,
                     'Recall': recall,
                     'F1 Score': f1}

# Results
for name, metrics in results.items():
    print(f'Classifier: {name}')
    for metric, value in metrics.items():
        print(f'{metric}: {value:.4f}')
    print('-------------------------------------------')

# Discussion
# Discuss the results obtained and compare the performance of each variant of Naive Bayes.
# Analyze which variant performed the best and discuss possible reasons for its performance.
# Also, mention any limitations of Naive Bayes observed during the evaluation.

# Conclusion
# Summarize the findings, including which Naive Bayes variant performed the best and why.
# Provide suggestions for future work, such as exploring feature engineering techniques or trying other classifiers.


  from pandas.core.computation.check import NUMEXPR_INSTALLED
  from pandas.core import (


Classifier: Bernoulli Naive Bayes
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481
-------------------------------------------
Classifier: Multinomial Naive Bayes
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283
-------------------------------------------
Classifier: Gaussian Naive Bayes
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131
-------------------------------------------
