# Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

                            +-------------+
                            | Population  |
                            +------+------+
                                   |
                            70%    |    30%
                                   |
                    +--------------+---------------+
                    |                              |
              +-----+-----+                 +------+------+
              | Plan user |                 | Plan non-user|
              +-----+-----+                 +--------------+
                    |                              |
             40%    |    60%                       0%
                    |                              |
             +------+------+                  +------+
             | smoker | non-smoker            |   -  |
             +--------+--------+             +------+
                    |  
                    |
                    
The probability that an employee is a smoker given that he/she uses the health insurance plan can be calculated using Bayes' theorem:

P(Smoker | Health Plan) = P(Health Plan | Smoker) * P(Smoker) / P(Health Plan)

where P(Smoker | Health Plan) is the probability that an employee is a smoker given that he/she uses the health insurance plan, P(Health Plan | Smoker) is the probability that an employee uses the health insurance plan given that he/she is a smoker, P(Smoker) is the overall probability that an employee is a smoker, and P(Health Plan) is the overall probability that an employee uses the health insurance plan.

Using the given information, we can plug in the values and get:

P(Smoker | Health Plan) = (0.4 * 0.5) / 0.7 = 0.286 or 28.6%

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 28.6%.

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are suited for.

Bernoulli Naive Bayes is used when the input data is in the form of binary (0/1) variables. It assumes that each feature is binary and independent of each other. It is commonly used in text classification tasks, where the presence or absence of a word in a document is used as a feature.

Multinomial Naive Bayes, on the other hand, is used when the input data is in the form of a count or frequency of occurrences of discrete features. It is commonly used in text classification tasks, where the frequency of occurrence of a word in a document is used as a feature.

In summary, Bernoulli Naive Bayes assumes binary features, while Multinomial Naive Bayes assumes count or frequency features.

# Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes assumes that missing values are non-occurrences of the corresponding feature. In other words, it assumes that if a particular feature value is missing, it is because the corresponding event did not occur. Therefore, when calculating probabilities, Bernoulli Naive Bayes does not consider the missing values as a separate category, but rather treats them as if they were absent. This is different from the way that Multinomial Naive Bayes handles missing values, which is to simply ignore them during training and assume that they will not affect the classification process.

# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. It is a variant of Naive Bayes that assumes the likelihood of the features to be Gaussian distribution. It works well with continuous data and **can be used for multi-class classification by using the maximum a posteriori (MAP) rule to determine the most likely class given the input features.** In the case of multi-class classification, the model estimates the class-conditional means and variances for each class, and **the class with the highest posterior probability is selected as the output class.**

# Q5. Assignment:

## Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

## Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.

## Results:
Report the following performance metrics for each classifier:

Accuracy
Precision
Recall
F1 score

## Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

## Conclusion:
Summarise your findings and provide some suggestions for future work.

#### Note: 
This dataset contains a binary classification problem with multiple features. The dataset is relatively small, but it can be used to demonstrate the performance of the different variants of Naive Bayes on a real-world problem.

In [1]:
import pandas as pd

In [3]:
import urllib.request

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
filename = 'spambase.csv'

urllib.request.urlretrieve(url, filename)


('spambase.csv', <http.client.HTTPMessage at 0x7ff1415b6560>)

In [5]:
df = pd.read_csv('spambase.csv')

In [8]:
df.head()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [14]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [15]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [30]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import GaussianNB

from sklearn.model_selection import cross_val_score

In [18]:
bnb=BernoulliNB()
mnb=MultinomialNB()
gnb=GaussianNB()

# BernoulliNB

In [19]:
bnb.fit(X_train,y_train)

In [20]:
y_pred=bnb.predict(X_test)

In [21]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [22]:
print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))

[[770 114]
 [ 52 444]]
0.8797101449275362
              precision    recall  f1-score   support

           0       0.94      0.87      0.90       884
           1       0.80      0.90      0.84       496

    accuracy                           0.88      1380
   macro avg       0.87      0.88      0.87      1380
weighted avg       0.89      0.88      0.88      1380



# MultinomialNB

In [23]:
mnb.fit(X_train,y_train)

In [24]:
y_pred=mnb.predict(X_test)

In [25]:
print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))

[[684 131]
 [138 427]]
0.8050724637681159
              precision    recall  f1-score   support

           0       0.83      0.84      0.84       815
           1       0.77      0.76      0.76       565

    accuracy                           0.81      1380
   macro avg       0.80      0.80      0.80      1380
weighted avg       0.80      0.81      0.80      1380



# GaussianNB

In [26]:
gnb.fit(X_train,y_train)

In [27]:
y_pred=gnb.predict(X_test)

In [28]:
print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))

[[600  15]
 [222 543]]
0.8282608695652174
              precision    recall  f1-score   support

           0       0.73      0.98      0.84       615
           1       0.97      0.71      0.82       765

    accuracy                           0.83      1380
   macro avg       0.85      0.84      0.83      1380
weighted avg       0.86      0.83      0.83      1380



In [38]:
bnb_scores = cross_val_score(bnb, X, y, cv=10)
mnb_scores = cross_val_score(mnb, X, y, cv=10)
gnb_scores = cross_val_score(gnb, X, y, cv=10)


scores = bnb_scores, mnb_scores, gnb_scores

print('Bernoulli Naive Bayes Mean Accuracy:', bnb_scores.mean())
print('Multinomial Naive Bayes Mean Accuracy:', mnb_scores.mean())
print('Gaussian Naive Bayes Mean Accuracy:', gnb_scores.mean())

print('--------------------------------------------------------')

print ('Bernoulli Naive Bayes variant of Naive Bayes performed the best in the given data beacause it has good average accuracy score among all other Naive Bayes ')

Bernoulli Naive Bayes Mean Accuracy: 0.8839130434782609
Multinomial Naive Bayes Mean Accuracy: 0.786086956521739
Gaussian Naive Bayes Mean Accuracy: 0.8217391304347826
--------------------------------------------------------
Bernoulli Naive Bayes variant of Naive Bayes performed the best in the given data beacause it has good average accuracy score among all other Naive Bayes 


* limitations of Naive Bayes !!


There are several limitations of the Naive Bayes algorithm:

* Assumption of independence: Naive Bayes assumes that the features are independent of each other, which is often not true in real-world scenarios. This assumption can lead to inaccurate predictions.

* Sensitivity to irrelevant features: Naive Bayes considers all features equally important, which means that irrelevant features can negatively impact the accuracy of the model.

* Lack of interpretability: Naive Bayes does not provide information about the importance of individual features in the classification decision, which can make it difficult to interpret the model.

* Limited expressive power: Naive Bayes is a simple algorithm that is not capable of capturing complex relationships between features and the target variable.

* Data scarcity: Naive Bayes relies on the availability of sufficient data to estimate the probabilities of the different classes and features accurately. In cases where data is scarce, the model may not perform well.