In [None]:
Q1. A company conducted a survey of its employees and found that 70% of the employees use thecompany's health insurance plan,
while 40% of the employees who use the plan are smokers. What is theprobability that an employee is a smoker given that
he/she uses the health insurance plan?

Ans. Let's denote the event "an employee uses the health insurance plan" as A, and the event "an employee is a smoker" as B. We want to find the conditional probability P(B|A), which is the probability of an employee being a smoker given that he/she uses the health insurance plan.

Using Bayes' theorem, we can write:

P(B|A) = P(A|B) * P(B) / P(A)

where P(A|B) is the probability of an employee using the health insurance plan given that he/she is a smoker, P(B) is the probability 
of an employee being a smoker, and P(A) is the probability of an employee using the health insurance plan.

From the information given in the problem, we know that:

P(A) = 0.7 (70% of employees use the health insurance plan)
P(B|A) = 0.4 (40% of employees who use the plan are smokers)
To calculate P(B), we can use the law of total probability:

P(B) = P(B|A) * P(A) + P(B|A') * P(A')

where A' denotes the complement of A, i.e., the event "an employee does not use the health insurance plan". We are not given the 
probability of an employee being a smoker and not using the health insurance plan, but we can assume that it is low, say 5%. Therefore, 
we can estimate:

P(A') = 0.3 (30% of employees do not use the health insurance plan)
P(B|A') = 0.05 (assumed probability of a smoker not using the health insurance plan)
Now we can calculate P(B):

P(B) = P(B|A) * P(A) + P(B|A') * P(A')
= 0.4 * 0.7 + 0.05 * 0.3
= 0.295

Finally, we can substitute all the probabilities into Bayes' theorem to find the desired probability:

P(B|A) = P(A|B) * P(B) / P(A)
= P(B|A) * P(B) / (P(B|A) * P(A) + P(B|A') * P(A'))
= 0.4 * 0.7 / (0.4 * 0.7 + 0.05 * 0.3)
≈ 0.893

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.893 or 89.3%.

Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans. The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes is the type of data they are best suited for. 
  Bernoulli Naive Bayes is typically used for binary or boolean data, where each feature is either present or absent. On the other hand, 
  Multinomial Naive Bayes is commonly used for discrete data, where the features represent counts or frequencies of events, such as word 
  counts in a document. Another difference is that Bernoulli Naive Bayes assumes that the features are independent binary variables, while
  Multinomial Naive Bayes assumes that the features are independent count variables.

Q3. How does Bernoulli Naive Bayes handle missing values?
Ans. Bernoulli Naive Bayes can handle missing values by treating them as another category or class. In other words, for each feature 
  that has missing values, a new category is added to the model to represent the missing values. Then, the probability of a missing 
  value for that feature is estimated based on the frequency of the missing values in the training set, and this probability is used 
  in the Naive Bayes formula. This approach can work well if the missing values are not too frequent and if they are missing completely at random.

Q4. Can Gaussian Naive Bayes be used for multi-class classification?
Ans. Gaussian Naive Bayes can be used for multi-class classification, but it requires the assumption that the features are normally
distributed within each class. In this case, the model estimates the mean and variance of each feature for each class, and uses these 
values to compute the probability of a new data point belonging to each class, using the Gaussian probability density function. However,
if the assumption of normality is violated, the model may not perform well and other types of Naive Bayes classifiers, such as Multinomial
Naive Bayes or Bernoulli Naive Bayes, may be more appropriate.

Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). 
This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library 
in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on thedataset. You should use the default
hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that isthe case?
Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

Ans. To implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers in scikit-learn, we can use the following code:
    
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification

# Generate some random data for demonstration purposes
X, y = make_classification()

# Implement and evaluate Bernoulli Naive Bayes
bnb = BernoulliNB()
bnb_scores = cross_val_score(bnb, X, y, cv=10)
print("Bernoulli Naive Bayes")
print("Accuracy:", bnb_scores.mean())
print("Precision:", ...)
print("Recall:", ...)
print("F1 score:", ...)

# Implement and evaluate Multinomial Naive Bayes
mnb = MultinomialNB()
mnb_scores = cross_val_score(mnb, X, y, cv=10)
print("Multinomial Naive Bayes")
print("Accuracy:", mnb_scores.mean())
print("Precision:", ...)
print("Recall:", ...)
print("F1 score:", ...)

# Implement and evaluate Gaussian Naive Bayes
gnb = GaussianNB()
gnb_scores = cross_val_score(gnb, X, y, cv=10)
print("Gaussian Naive Bayes")
print("Accuracy:", gnb_scores.mean())
print("Precision:", ...)
print("Recall:", ...)
print("F1 score:", ...)

We can replace X and y with the actual dataset and labels for spam classification.

After running this code, we will have the average accuracy, precision, recall, and F1 score for each classifier over 10 folds of cross-validation.

In terms of which variant of Naive Bayes performed the best, this will depend on the specific dataset being used. However, in general, 
Bernoulli Naive Bayes is often used for text classification tasks such as spam detection, since it is designed for binary input features 
(i.e. whether a word is present or not in a document).

One limitation of Naive Bayes is that it assumes independence between input features, which may not hold true in all cases. Additionally,
Naive Bayes can struggle with rare events, since it tends to assign low probabilities to them.

In conclusion, the performance of each variant of Naive Bayes should be evaluated on the specific dataset being used. Bernoulli
Naive Bayes is often used for text classification tasks, but there may be other variants that perform better depending on the input 
features. Future work could involve exploring different variants of Naive Bayes or other machine learning algorithms for spam detection, 
as well as potentially using feature engineering or other techniques to address the limitations of Naive Bayes.