In [None]:
Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?
ans:
To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we need to apply Bayes' theorem:

P(smoker | uses health insurance plan) = P(uses health insurance plan | smoker) * P(smoker) / P(uses health insurance plan)

We are given that 70% of the employees use the health insurance plan, so:

P(uses health insurance plan) = 0.7

We are also given that 40% of the employees who use the plan are smokers, so:

P(uses health insurance plan | smoker) = 0.4

Finally, we need to find the probability of a randomly selected employee being a smoker, which is given as:

P(smoker) = ?

This information is not given in the problem statement, so we cannot proceed further without making assumptions or obtaining additional data.

Therefore, without knowing the probability of an employee being a smoker, we cannot calculate the probability that an employee is a smoker given that he/she uses the health 
insurance plan.

In [None]:
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
ans:
Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm used in machine learning for classification problems. The key difference between
them lies in the type of data they can handle.

Bernoulli Naive Bayes is used when the features (or variables) are binary (i.e., they can take only two values, typically 0 and 1). For example, in a spam email classification 
problem, the presence or absence of a particular word in an email can be represented as a binary feature. The Bernoulli Naive Bayes algorithm models the likelihood of each 
feature being present in the two classes (e.g., spam or not spam) and calculates the posterior probability of the class given the feature values.

On the other hand, Multinomial Naive Bayes is used when the features represent discrete counts or frequencies, such as word counts in text data. In this case, the Multinomial 
Naive Bayes algorithm models the likelihood of each feature (word) occurring in each class and calculates the posterior probability of the class given the frequency of the 
features.

In summary, Bernoulli Naive Bayes is used for binary data, while Multinomial Naive Bayes is used for count data. However, both algorithms assume that the features are 
independent of each other, given the class label (hence the "Naive" in the name).

In [None]:
Q3. How does Bernoulli Naive Bayes handle missing values?
ans:
Bernoulli Naive Bayes algorithm assumes that the features (variables) are binary, taking values of 0 or 1. When there are missing values in the data, they are usually treated as
a separate category, or they can be imputed (replaced with some estimated value). However, in Bernoulli Naive Bayes, the presence or absence of a feature (i.e., whether it has 
a value of 1 or 0) is the only relevant information for the classification, and missing values are treated as absent features (i.e., assigned a value of 0).

For example, suppose we have a dataset with binary features representing whether a person exercises daily (1) or not (0), whether they drink coffee (1) or not (0), and whether 
they smoke (1) or not (0). If there is a missing value in the exercise column, it would be treated as the person not exercising (i.e., assigned a value of 0). The Bernoulli 
Naive Bayes algorithm then estimates the probability of each feature being present in each class, given the training data, and calculates the posterior probability of each 
class given the feature values, including the missing values treated as 0.

However, it is important to note that imputing missing values in this way can potentially introduce bias into the model, especially if there is a high proportion of missing 
values or if the missing values are not missing at random. In such cases, it may be better to consider more advanced imputation methods or explore alternative models that can
handle missing values more effectively.

In [None]:
Q4. Can Gaussian Naive Bayes be used for multi-class classification?
ans:
Yes, Gaussian Naive Bayes can be used for multi-class classification problems. In this case, the algorithm assumes that the feature values in each class are normally distributed
(Gaussian distribution) and estimates the mean and variance of each feature for each class. Given a new set of feature values, the algorithm calculates the likelihood of each 
class based on the estimated mean and variance, and applies Bayes' theorem to calculate the posterior probability of each class given the feature values. The class with the 
highest posterior probability is then chosen as the predicted class for the new data point.

There are different strategies to apply Gaussian Naive Bayes to multi-class classification problems. One approach is to use the one-vs-all (OvA) strategy, where a separate
binary Gaussian Naive Bayes classifier is trained for each class, treating it as the positive class and the other classes as the negative class. For example, if there are three
classes (A, B, and C), the OvA strategy would train three binary classifiers: A vs not A, B vs not B, and C vs not C. The predicted class for a new data point is then the one 
with the highest posterior probability among the three binary classifiers.

Another approach is to use the one-vs-one (OvO) strategy, where a separate binary Gaussian Naive Bayes classifier is trained for each pair of classes, and the predicted class 
is the one that wins the most binary classifier comparisons. For example, if there are three classes (A, B, and C), the OvO strategy would train three binary classifiers: A vs 
B, A vs C, and B vs C. The predicted class for a new data point is then the one that wins the most binary comparisons among the three binary classifiers.

In [5]:
# Q5. Assignment:
# Data preparation:
# Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
# datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
# is spam or not based on several input features.
# Implementation:
# Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
# scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
# dataset. You should use the default hyperparameters for each classifier.
# Results:
# Report the following performance metrics for each classifier:
# Accuracy
# Precision
# Recall
# F1 score
# Discussion:
# Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
# the case? Are there any limitations of Naive Bayes that you observed?
# Conclusion:
# Summarise your findings and provide some suggestions for future work.

# Note: Create your assignment in Jupyter notebook and upload it to GitHub & share that github repository
# link through your dashboard. Make sure the repository is public.
# Note: This dataset contains a binary classification problem with multiple features. The dataset is
# relatively small, but it can be used to demonstrate the performance of the different variants of Naive
# Bayes on a real-world problem

import pandas as pd
from sklearn.model_selection import train_test_split
cols=['']
df =pd.read_csv('spambase.csv')
df.head()



Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [6]:
X=df.iloc[:,:-1]
y=df.iloc[:,-1]
X,y

(         0  0.64  0.64.1  0.1  0.32   0.2   0.3   0.4   0.5   0.6  ...  0.40  \
 0     0.21  0.28    0.50  0.0  0.14  0.28  0.21  0.07  0.00  0.94  ...   0.0   
 1     0.06  0.00    0.71  0.0  1.23  0.19  0.19  0.12  0.64  0.25  ...   0.0   
 2     0.00  0.00    0.00  0.0  0.63  0.00  0.31  0.63  0.31  0.63  ...   0.0   
 3     0.00  0.00    0.00  0.0  0.63  0.00  0.31  0.63  0.31  0.63  ...   0.0   
 4     0.00  0.00    0.00  0.0  1.85  0.00  0.00  1.85  0.00  0.00  ...   0.0   
 ...    ...   ...     ...  ...   ...   ...   ...   ...   ...   ...  ...   ...   
 4595  0.31  0.00    0.62  0.0  0.00  0.31  0.00  0.00  0.00  0.00  ...   0.0   
 4596  0.00  0.00    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 4597  0.30  0.00    0.30  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 4598  0.96  0.00    0.00  0.0  0.32  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 4599  0.00  0.00    0.65  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 
        0.41   0.42  0.43 

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20,random_state=42)
X_train,y_train

(         0   0.64  0.64.1  0.1  0.32   0.2   0.3   0.4   0.5   0.6  ...  0.40  \
 1898  0.00   0.00    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 1370  0.00   0.00    0.00  0.0  0.00  0.00  0.00  1.11  0.00  0.00  ...   0.0   
 3038  0.00   0.00    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 2361  0.00   0.00    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 156   0.00   0.00    0.00  0.0  1.36  0.45  0.45  0.00  0.00  0.00  ...   0.0   
 ...    ...    ...     ...  ...   ...   ...   ...   ...   ...   ...  ...   ...   
 4426  0.00   0.00    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 466   0.09   0.18    0.36  0.0  0.09  0.00  0.09  0.00  0.55  0.27  ...   0.0   
 3092  0.00  14.28    0.00  0.0  0.00  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 3772  0.00   0.00    0.00  0.0  1.23  0.00  0.00  0.00  0.00  0.00  ...   0.0   
 860   0.14   0.00    0.28  0.0  0.09  0.24  0.04  0.04  0.24  0.00  ...   0.0   
 
       0.41   

In [11]:
from sklearn.naive_bayes import GaussianNB
gnb=GaussianNB()

In [12]:
gnb.fit(X_train,y_train)

In [13]:
y_pred= gnb.predict(X_test)

In [14]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [15]:
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

[[380 150]
 [ 20 370]]
              precision    recall  f1-score   support

           0       0.95      0.72      0.82       530
           1       0.71      0.95      0.81       390

    accuracy                           0.82       920
   macro avg       0.83      0.83      0.82       920
weighted avg       0.85      0.82      0.82       920

0.8152173913043478


# for binomial

In [21]:
from sklearn.naive_bayes import BernoulliNB
bnb=BernoulliNB()
bnb.fit(X_train,y_train)

In [22]:
y_pred_bnb= bnb.predict(X_test)

In [23]:
print(confusion_matrix(y_test,y_pred_bnb))
print(classification_report(y_test,y_pred_bnb))
print(accuracy_score(y_test,y_pred_bnb))

[[493  37]
 [ 80 310]]
              precision    recall  f1-score   support

           0       0.86      0.93      0.89       530
           1       0.89      0.79      0.84       390

    accuracy                           0.87       920
   macro avg       0.88      0.86      0.87       920
weighted avg       0.87      0.87      0.87       920

0.8728260869565218


# for multinomial naive

In [24]:
from sklearn.naive_bayes import MultinomialNB
mnb=MultinomialNB()

In [25]:
mnb.fit(X_train,y_train)

In [26]:
y_pred_mnb= bnb.predict(X_test)

In [27]:
print(confusion_matrix(y_test,y_pred_mnb))
print(classification_report(y_test,y_pred_mnb))
print(accuracy_score(y_test,y_pred_mnb))

[[493  37]
 [ 80 310]]
              precision    recall  f1-score   support

           0       0.86      0.93      0.89       530
           1       0.89      0.79      0.84       390

    accuracy                           0.87       920
   macro avg       0.88      0.86      0.87       920
weighted avg       0.87      0.87      0.87       920

0.8728260869565218


# we can use bernouli naive bayes