### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

In [None]:
To find the probability that an employee is a smoker given that they use the health insurance plan, we can use conditional probability.

Let:

A be the event that an employee uses the health insurance plan.
S be the event that an employee is a smoker.

We are given:

    P(A) = Probability that an employee uses the health insurance plan = 70% = 0.70.
    P(S | A) = Probability that an employee is a smoker given that they use the health insurance plan.

We are also given that 40% of the employees who use the plan are smokers. So:

    P(S | A) = Probability that an employee is a smoker given that they use the health insurance plan = 40% = 0.40.

Now, we can use the conditional probability formula:

    P(S∣A)= P(S∩A)/P(A)

    Where:
    P(S | A) is the conditional probability of being a smoker given that they use the health insurance plan.
    P(S ∩ A) is the joint probability of being a smoker and using the health insurance plan.
    P(A) is the probability of using the health insurance plan.
    
We already have P(A) and P(S | A), so we can calculate P(S ∩ A):

    P(S∩A)=P(S∣A)∗P(A)=0.40∗0.70=0.28

So, the probability that an employee is a smoker given that they use the health insurance plan is 0.28 or 28%.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes classifier, each designed for different types of
data and assumptions. Here are the key differences between them:

#### Data Type:

* Bernoulli Naive Bayes: 
  It is used for binary or binarized data, where each feature is either present (1) or absent (0). It's commonly applied to text   classification tasks where the presence or absence of words in documents is considered.
 
 
* Multinomial Naive Bayes: 
  It is used for discrete data, typically representing counts or frequencies of categorical features. It's widely used in text 
  classification, where the features often represent word counts or TF-IDF values.

#### Feature Representation:

* Bernoulli Naive Bayes: 
  Assumes binary features, making it suitable for cases where you want to model whether a feature is present or not (e.g., word   presence in a document).
  
  
* Multinomial Naive Bayes: 
  Deals with discrete feature counts or frequencies, which is appropriate for cases where you want to capture the number of     
  occurrences of different categories (e.g., word counts in a document).

#### Feature Independence Assumption:

* Both Bernoulli and Multinomial Naive Bayes make the same strong independence assumption known as the "naive" assumption. They   assume that all features are conditionally independent given the class label, which simplifies the calculation of     
  probabilities.

#### Use Cases:

* Bernoulli Naive Bayes: 
  Useful for text classification tasks, spam detection, sentiment analysis, and situations where you want to model the presence   or absence of features.
        
        
* Multinomial Naive Bayes:
  Well-suited for text classification tasks, document categorization, and problems involving discrete   
  feature counts, such as bag-of-words representations.

#### Probability Estimation:

* Bernoulli Naive Bayes: 
  Estimates probabilities based on the presence or absence of features. It models feature occurrences as binary events.

* Multinomial Naive Bayes: 
  Estimates probabilities based on the frequency or counts of features. It models feature occurrences as discrete events.



##### Example:

For a Bernoulli Naive Bayes classifier, you might represent a document as a binary vector, indicating whether each word is present (1) or absent (0).For a Multinomial Naive Bayes classifier, you might represent a document as a vector of word counts, where each entry represents the count of a specific word.

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of your data and the way you've encoded your features. If your features are binary (presence/absence), Bernoulli Naive Bayes is suitable. If your features are discrete counts or frequencies, Multinomial Naive Bayes is more appropriate, particularly in text classification scenarios.

## Q3. How does Bernoulli Naive Bayes handle missing values?


Bernoulli Naive Bayes, like other Naive Bayes variants, can handle missing values in a straightforward manner. When dealing with missing values, you typically have a few options:

* #### Ignoring Missing Values:
One approach is to simply ignore instances (rows) that contain missing values. This can be a reasonable choice if missing values are rare and not systematically related to the class labels. In this case, you would exclude instances with missing values from both training and testing datasets.

* #### Imputing Missing Values: 
Another option is to impute (fill in) the missing values with some suitable values. For Bernoulli Naive Bayes, which deals with binary features (0 or 1), you might impute missing values with the most frequent value (0 or 1) in the corresponding feature across the dataset or class. This imputation method should be chosen based on the nature of your data.

   
 
Treating Missing Values as a Separate Category: In some cases, missing values themselves may carry information. You can treat missing values as a distinct category (e.g., "unknown" or "-1") in your binary features. This approach allows the classifier to learn from instances with missing values, provided that missingness is informative.

It's essential to make a choice based on the specifics of your dataset and the problem you're trying to solve. Handling missing values appropriately can help prevent biased or inaccurate model predictions.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. Gaussian Naive Bayes is a variant of the Naive Bayes classifier 
that assumes continuous data and models the distribution of each feature using a Gaussian (normal) distribution. 
While it's often used for binary or two-class classification, it can be extended to handle multi-class classification as well.

To use Gaussian Naive Bayes for multi-class classification, you typically follow one of two common approaches:

* #### One-vs-Rest (OvR) or One-vs-All (OvA): 
In this approach, you create a separate binary classifier for each class. For example, if you have N classes, you train N binary 
classifiers, where each classifier distinguishes one class from the rest (hence the name "One-vs-Rest" or "One-vs-All"). During prediction, you obtain the class with the highest probability from all the classifiers.

* #### Multinomial Naive Bayes: 
Alternatively, you can use a variation of Naive Bayes called "Multinomial Naive Bayes" for multi-class problems, especially when dealing with discrete or count-based data (e.g., text classification). Multinomial Naive Bayes extends the Naive Bayes model to handle multiple classes directly. It estimates probabilities using a multinomial distribution.

    
The choice between Gaussian Naive Bayes with OvR and Multinomial Naive Bayes depends on the nature of your data and the specific requirements 
of your classification task:

Use Gaussian Naive Bayes with OvR when dealing with continuous data that can be modeled well by Gaussian distributions, and when you prefer 
simplicity in handling multi-class problems.
Use Multinomial Naive Bayes when dealing with discrete data, such as text data represented by word counts or TF-IDF values, for multi-class 
text classification tasks.
Both approaches are valid and widely used, but the choice should align with the characteristics of your data and the assumptions of the models.

In [None]:
""""Q5. Assignment:

* Data preparation:
  Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset 
  contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

* Implementation:
  Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. 
  Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each 
  classifier.

* Results:
  Report the following performance metrics for each classifier:
  Accuracy
  Precision
  Recall
  F1 score
  
* Discussion:
  Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations 
  of Naive Bayes that you observed?

* Conclusion:
  Summarise your findings and provide some suggestions for future work."""

DATASET LINK: https://archive.ics.uci.edu/datasets?search=Spambase

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB,BernoulliNB,MultinomialNB
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report

In [2]:
spam_data = pd.read_csv('spambase.data',header=0)

In [3]:
spam_data.head()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [5]:
spam_data.shape

(4600, 58)

In [6]:
X=spam_data.iloc[:,:-1]
y=spam_data.iloc[:,-1]

In [7]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=10)

In [8]:
# Bernoulli Naive Bayes
ber=BernoulliNB()
ber.fit(X_train,y_train)
y_pred=ber.predict(X_test)

print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))

0.8939130434782608
              precision    recall  f1-score   support

           0       0.89      0.94      0.92       699
           1       0.90      0.82      0.86       451

    accuracy                           0.89      1150
   macro avg       0.90      0.88      0.89      1150
weighted avg       0.89      0.89      0.89      1150

[[659  40]
 [ 82 369]]


In [9]:
# Multinomial Naive Bayes
mul=MultinomialNB()
mul.fit(X_train,y_train)
y_pred=mul.predict(X_test)

print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))

0.788695652173913
              precision    recall  f1-score   support

           0       0.82      0.83      0.83       699
           1       0.73      0.73      0.73       451

    accuracy                           0.79      1150
   macro avg       0.78      0.78      0.78      1150
weighted avg       0.79      0.79      0.79      1150

[[579 120]
 [123 328]]


In [10]:
# Gaussian Naive Bayes 
gau=MultinomialNB()
gau.fit(X_train,y_train)
y_pred=gau.predict(X_test)

print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))

0.788695652173913
              precision    recall  f1-score   support

           0       0.82      0.83      0.83       699
           1       0.73      0.73      0.73       451

    accuracy                           0.79      1150
   macro avg       0.78      0.78      0.78      1150
weighted avg       0.79      0.79      0.79      1150

[[579 120]
 [123 328]]


In [None]:
# Discussion
From above we can say that Bernoulli Naive Bayes has provided best result. 
As Bernoulli has only two value as classification either 0 or 1, this model perform well for spambase data 
Comapare to other model Bernoulli has higher accuracy and less number for FP and FT as well.

# Conclusion:
For email spam data set FP is more importan as we do not want to tag wrong mail as spam so considering this as requirement Bernoulli Bayes has 
provided more accurate result compare to others