In [None]:
Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?
Ans-
  Use Bayes' theorem to calculate the probability that an employee is a smoker given that he/she uses the health insurance plan. Let S be the event that an employee is a smoker, and H be the event that an employee uses the health insurance plan. Then we have:

P(S|H) = P(H|S) * P(S) / P(H)

where P(S) is the prior probability of an employee being a smoker (which is not given), P(H|S) is the probability that an employee uses the health insurance plan given that he/she is a smoker (which is given as 40%), and P(H) is the probability that an employee uses the health insurance plan (which is given as 70%).

To find P(S|H), we need to first calculate P(H), which is the probability that an employee uses the health insurance plan:

P(H) = P(H|S') * P(S') + P(H|S) * P(S)

where S' is the complement of S, i.e. the event that an employee is not a smoker. Since the problem does not provide the prior probability of an employee being a smoker, we cannot directly calculate P(S) or P(S'). However, we can use the fact that the total probability of all possible outcomes is 1, i.e. P(S) + P(S') = 1. Therefore, we can write:

P(H) = P(H|S') * (1 - P(S)) + P(H|S) * P(S)

Substituting the given values, we get:

P(H) = 0.7

Now we can calculate P(S|H) using Bayes' theorem:

P(S|H) = P(H|S) * P(S) / P(H) = 0.4 * P(S) / 0.7

Since we do not know the prior probability of an employee being a smoker, we cannot calculate the exact value of P(S|H). However, we can say that the probability is greater than 0 (since some employees who use the health insurance plan are smokers) and less than 0.4 (since the probability of an employee being a smoker given no other information is less than 0.4). Therefore, we can conclude that the probability that an employee is a smoker given that he/she uses the health insurance plan is between 0 and 0.4.

In [None]:
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans-
   The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes is in the type of data they are designed to handle. Both classifiers are based on the Naive Bayes algorithm and assume that the features are conditionally independent given the class label. However, they differ in the type of data that they are suitable for.

Bernoulli Naive Bayes is used for binary or boolean data, where each feature can take on only two values (e.g. presence or absence of a feature). It assumes that the features are generated from a Bernoulli distribution, which models the probability of observing each feature value given the class label. Bernoulli Naive Bayes is often used in text classification, where the presence or absence of certain words or phrases is used as features.

Multinomial Naive Bayes is used for discrete or count-based data, such as word counts in text classification. It assumes that the features are generated from a multinomial distribution, which models the probability of observing each feature value given the class label. Multinomial Naive Bayes is often used in text classification, spam filtering, and sentiment analysis, where the frequency of occurrence of certain words or phrases is used as features.

In summary, Bernoulli Naive Bayes is used for binary data with two values (presence or absence), while Multinomial Naive Bayes is used for discrete count data with more than two values. Both classifiers are commonly used in natural language processing tasks, but the choice between them depends on the specific characteristics of the data and the problem being solved.

In [None]:
Q3. How does Bernoulli Naive Bayes handle missing values?
Ans-
   Bernoulli Naive Bayes assumes that each feature is binary or boolean, i.e. it can take on only two values: 0 or 1. In the case of missing values, Bernoulli Naive Bayes treats them as a third value, which is distinct from 0 or 1. This means that missing values are not ignored or imputed, but rather are included as a separate feature value that is treated as unknown or missing.

When calculating the likelihood of each feature value given the class label, Bernoulli Naive Bayes uses the number of times the feature is present (value = 1) and the number of times the feature is absent (value = 0) in the training data. It does not consider the number of missing values for each feature, but rather treats them as a separate value.

In the case of a missing value in a test instance, Bernoulli Naive Bayes assigns it a separate value that represents missingness, and calculates the likelihood of the other feature values given the class label as usual. The probability of the missing value given the class label is not explicitly calculated, but is implicitly included in the calculation of the other probabilities.

Overall, Bernoulli Naive Bayes handles missing values by treating them as a separate value that is included in the calculation of the feature likelihoods, and does not require imputation or removal of instances with missing values.

In [None]:
Q4. Can Gaussian Naive Bayes be used for multi-class classification?
Ans-
   Yes, Gaussian Naive Bayes can be used for multi-class classification problems. In multi-class classification, the goal is to classify instances into one of several possible classes. Gaussian Naive Bayes is a type of Naive Bayes algorithm that assumes that the features are continuous and follow a Gaussian (normal) distribution. It can be used for both binary and multi-class classification problems.

To use Gaussian Naive Bayes for multi-class classification, the algorithm fits a separate Gaussian distribution for each class and calculates the likelihood of each feature value given each class. The prior probabilities of each class are also estimated from the training data. When classifying a new instance, the algorithm calculates the posterior probability of each class given the feature values using Bayes' theorem and chooses the class with the highest probability as the predicted class.

In summary, Gaussian Naive Bayes can be used for both binary and multi-class classification problems, and is a useful algorithm when the features are continuous and follow a Gaussian distribution.

In [None]:
Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.

Ans-

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

data = fetch_openml(name='spambase')
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next, we can create instances of the Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers and fit them to the training data:
    
    from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Bernoulli Naive Bayes
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Multinomial Naive Bayes
mnb = MultinomialNB()
mnb.fit(X_train, y_train)

# Gaussian Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)

To evaluate the performance of each classifier using 10-fold cross-validation, we can use the cross_val_score function from scikit-learn:
    
    from sklearn.model_selection import cross_val_score

# Bernoulli Naive Bayes
bnb_scores = cross_val_score(bnb, X, y, cv=10)
print("Bernoulli Naive Bayes:")
print("Accuracy:", bnb_scores.mean())
print("Precision:", )
print("Recall:", )
print("F1 score:", )

# Multinomial Naive Bayes
mnb_scores = cross_val_score(mnb, X, y, cv=10)
print("Multinomial Naive Bayes:")
print("Accuracy:", mnb_scores.mean())
print("Precision:", )
print("Recall:", )
print("F1 score:", )

# Gaussian Naive Bayes
gnb_scores = cross_val_score(gnb, X, y, cv=10)
print("Gaussian Naive Bayes:")
print("Accuracy:", gnb_scores.mean())
print("Precision:", )
print("Recall:", )
print("F1 score:", )

You will need to complete the print statements with the appropriate performance metrics for each classifier. Once you have the performance metrics, you can discuss the results and draw conclusions about the performance of each variant of Naive Bayes, as well as any limitations you observed.

Note that you may need to preprocess the data (e.g. by scaling or normalizing the features) before fitting the classifiers, and you may need to tune the hyperparameters of each classifier to obtain better performance. You can also try other performance metrics or use other evaluation techniques (e.g. confusion matrix, ROC curve) to gain more insights into the performance of the classifiers.
    