In [None]:
Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?
-----------------------------------------

Lets,
P(S) be the probability that an employee is a smoker.
P(H) be the probability that an employee uses the health insurance plan.
P(S∣H) be the probability that an employee is a smoker given that they use the health insurance plan.

P(S)=(0.70*0.40)= 0.28
P(H)=0.70
P(S∣H)=0.40

By Bayes' theorem:
P(S∣H)= P(H∣S)×P(S) / P(H)

We know that P(H∣S)=P(S∣H), so we can substitute P(S∣H) for P(H∣S):
P(S∣H)= P(S∣H)×P(S) / P(H)
P(S∣H)= (0.40×0.28)/0.70
        = 0.16

In [None]:
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
-------------------------------------------

1.Bernoulli Naive Bayes:
    Bernoulli Naive Bayes assumes that the features are binary variables.
    It is typically used when dealing with binary or Boolean features.
    In Bernoulli Naive Bayes, each feature is treated as a binary variable, where 1 represents the presence of the feature and 0 represents the absence.
    It calculates the probability of each feature occurring in each class and the probability of each class given the presence or absence of each feature.
    
2.Multinomial Naive Bayes:
    Multinomial Naive Bayes is suitable when the features are discrete and can take on a countable number of values.
    It is commonly used for document classification tasks where the features represent word counts or term frequencies.
    In Multinomial Naive Bayes, each feature represents the count or frequency of a term (word) in a document.
    It calculates the probability of each term occurring in each class and the probability of each class given the count or frequency of each term.

In [None]:
Q3. How does Bernoulli Naive Bayes handle missing values?
--------------------------------------------

Bernoulli Naive Bayes typically handles missing values by either ignoring instances with missing values or by imputing the missing values with a specific value.
1.Ignoring instances with missing values:
    In this approach, instances with missing values are simply excluded from the analysis.
    While this approach is straightforward, it can lead to loss of valuable data, especially if a significant portion of the dataset contains missing values.
2.Imputing missing values:
    Instead of ignoring instances with missing values, another approach is to impute (fill in) the missing values with some predetermined value.
    For Bernoulli Naive Bayes, where features are binary, missing values can be imputed with either 0 or 1, depending on which value is more appropriate for the context.

In [None]:
Q4. Can Gaussian Naive Bayes be used for multi-class classification?
--------------------------------------------

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks.
In multi-class classification, Gaussian Naive Bayes works by calculating the conditional probability of each class given the features of the instance using Bayes theorem. 
Then, it selects the class with the highest probability as the predicted class for the instance.

In [None]:
'''Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
----------------------------------------------'''

import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

# Split features and target
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Define scoring metrics
scoring = ['accuracy', 'precision', 'recall', 'f1']

# Perform cross-validation for each classifier
bernoulli_scores = cross_val_score(bernoulli_nb, X, y, cv=10, scoring=scoring)
multinomial_scores = cross_val_score(multinomial_nb, X, y, cv=10, scoring=scoring)
gaussian_scores = cross_val_score(gaussian_nb, X, y, cv=10, scoring=scoring)

# Compute mean values of performance metrics
bernoulli_metrics = bernoulli_scores.mean(axis=0)
multinomial_metrics = multinomial_scores.mean(axis=0)
gaussian_metrics = gaussian_scores.mean(axis=0)

# Print results
print("Bernoulli Naive Bayes:")
print("Accuracy:", bernoulli_metrics[0])
print("Precision:", bernoulli_metrics[1])
print("Recall:", bernoulli_metrics[2])
print("F1 score:", bernoulli_metrics[3])
print()

print("Multinomial Naive Bayes:")
print("Accuracy:", multinomial_metrics[0])
print("Precision:", multinomial_metrics[1])
print("Recall:", multinomial_metrics[2])
print("F1 score:", multinomial_metrics[3])
print()

print("Gaussian Naive Bayes:")
print("Accuracy:", gaussian_metrics[0])
print("Precision:", gaussian_metrics[1])
print("Recall:", gaussian_metrics[2])
print("F1 score:", gaussian_metrics[3])