## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

Answer:
The probability that an employee is a smoker given that they use the health insurance plan is 0.4 or 40%.

## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Answer:

Bernoulli Naive Bayes is used for binary data (0 or 1). It checks whether a feature is present or not.

Multinomial Naive Bayes is used for count-based data. It uses the frequency of the features (like word counts in text).

## Q3. How does Bernoulli Naive Bayes handle missing values?

Answer:
Bernoulli Naive Bayes does not handle missing values automatically. You need to handle missing values manually before using the model. You can either fill missing values using techniques like mean, median, or mode, or drop rows/columns with missing data.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Answer:
Yes, Gaussian Naive Bayes can be used for multi-class classification problems. It works well when the features are continuous and normally distributed.



## Q5. Assignment - Naive Bayes on Spambase Dataset

In [18]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_digits
from sklearn.model_selection import StratifiedKFold, cross_val_predict
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the inbuilt Digits dataset
digits = load_digits()
X = digits.data
y = digits.target

# Define 10-fold cross-validation
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

# Define classifiers
models = {
    "BernoulliNB": BernoulliNB(),
    "MultinomialNB": MultinomialNB(),
    "GaussianNB": GaussianNB()
}

# Store results
results = {}

# Loop through models and evaluate
for name, model in models.items():
    y_pred = cross_val_predict(model, X, y, cv=cv)
    
    results[name] = {
        "Accuracy": accuracy_score(y, y_pred),
        "Precision (macro avg)": precision_score(y, y_pred, average='macro', zero_division=0),
        "Recall (macro avg)": recall_score(y, y_pred, average='macro', zero_division=0),
        "F1 Score (macro avg)": f1_score(y, y_pred, average='macro', zero_division=0)
    }

# Convert results to DataFrame for display
results_df = pd.DataFrame(results).T
print("\nEvaluation Results on Digits Dataset:\n")
print(results_df)



Evaluation Results on Digits Dataset:

               Accuracy  Precision (macro avg)  Recall (macro avg)  \
BernoulliNB    0.853088               0.854572            0.853334   
MultinomialNB  0.900390               0.905234            0.900618   
GaussianNB     0.842515               0.866145            0.842328   

               F1 Score (macro avg)  
BernoulliNB                0.852106  
MultinomialNB              0.900947  
GaussianNB                 0.843662  
