## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

To find the probability that an employee is a smoker given that they use the health insurance plan, we can use conditional probability.

Let's define the following events:
A: Employee uses the health insurance plan.
B: Employee is a smoker.

We are given the following probabilities:
P(A) = 70% = 0.70 (probability that an employee uses the health insurance plan)
P(B | A) = 40% = 0.40 (probability that an employee is a smoker given that they use the health insurance plan)

We want to find P(B | A), which represents the probability of an employee being a smoker given that they use the health insurance plan.

Using the definition of conditional probability, we have:

P(B | A) = P(A and B) / P(A)

From the information provided, we can see that P(A and B) represents the probability that an employee both uses the health insurance plan and is a smoker. However, we don't have this value directly.

To find P(A and B), we can use the fact that the probability of two independent events occurring is the product of their individual probabilities:

P(A and B) = P(B) * P(A | B)

Since P(A and B) is not directly given, we can rewrite it as:

P(A and B) = P(B) * P(A | B) = P(A) * P(B | A)

Now, substituting this into the conditional probability formula, we have:

P(B | A) = (P(A) * P(B | A)) / P(A)

Plugging in the given values:

P(B | A) = (0.70 * 0.40) / 0.70

Simplifying:

P(B | A) = 0.40

Therefore, the probability that an employee is a smoker given that they use the health insurance plan is 0.40 or 40%.

## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of the Naive Bayes algorithm, but they have some differences in their assumptions and handling of feature vectors. Here are the key differences:

1. Nature of Features:

- `Bernoulli Naive Bayes:` It assumes that features are binary variables, where each feature can be present (1) or absent (0). It is suitable for text classification tasks where the presence or absence of words in a document is considered.
- `Multinomial Naive Bayes:` It assumes that features represent discrete counts or frequencies. It is commonly used in text classification, where the features can be the frequency of words or term frequencies-inverse document frequencies (TF-IDF) of words.
2. Feature Independence:

Both algorithms assume feature independence given the class variable, which is a characteristic of the Naive Bayes algorithm. However, the way they model this independence differs.
- `Bernoulli Naive Bayes:` It models feature independence using the presence or absence of features in the input vector. It considers only whether a feature is present (1) or absent (0) and ignores the frequency or count of the features.
- `Multinomial Naive Bayes:` It models feature independence using the frequencies or counts of features in the input vector. It takes into account the actual counts or frequencies of features and uses them to estimate the probabilities.
3. Handling of Zero Counts:

- `Bernoulli Naive Bayes:` It handles zero counts by adding a smoothing factor (usually Laplace smoothing) to avoid zero probabilities. This helps prevent the model from assigning zero probabilities to unseen features.
- `Multinomial Naive Bayes:` It handles zero counts by adding a smoothing factor as well. It is common to use Laplace smoothing or other smoothing techniques to handle zero probabilities.
In summary, Bernoulli Naive Bayes is suitable for binary feature vectors, whereas Multinomial Naive Bayes is appropriate for discrete count or frequency-based features. They differ in how they model feature independence and handle zero counts in their probability estimations.






## Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes assumes that features are binary variables, where each feature can be present (1) or absent (0). When dealing with missing values in Bernoulli Naive Bayes, the approach typically depends on the specific implementation or library used. However, there are a few common strategies for handling missing values:

1. `Ignoring Instances:` One approach is to simply ignore instances or samples that have missing values. This means that if a feature value is missing for a particular instance, that instance is excluded from the training and prediction process.

2. `Filling Missing Values:` Another strategy is to fill in missing values with a default value or imputation technique. For Bernoulli Naive Bayes, the most common approach is to assign a default value (such as 0 or 1) to represent the missing value. The choice of the default value may depend on the specific problem or dataset.

3. `Incorporating Missingness as a Separate Category:` In some cases, missing values can be considered as a separate category or feature state. This means that instead of imputing or ignoring missing values, a new category or feature level can be created to represent the missingness of a feature.

It's important to note that the handling of missing values in Bernoulli Naive Bayes can have an impact on the performance and accuracy of the model. The chosen strategy should align with the specific characteristics of the dataset and the objectives of the analysis. Additionally, preprocessing steps, such as data imputation or cleaning, may be required before applying the Bernoulli Naive Bayes algorithm to handle missing values effectively.






## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. The Gaussian Naive Bayes algorithm is a variant of the Naive Bayes algorithm that assumes that the features follow a Gaussian (normal) distribution. It is commonly used for continuous numerical features.

In the case of multi-class classification, where there are more than two classes to be predicted, Gaussian Naive Bayes can be extended to handle multiple classes. The algorithm calculates the probability of each class given the feature values and predicts the class with the highest probability.

The general steps for using Gaussian Naive Bayes for multi-class classification are as follows:

1. `Data Preparation:` Prepare the dataset with features and corresponding class labels. Ensure that the features are continuous numerical variables.

2. `Training:` Estimate the parameters of the Gaussian distribution (mean and variance) for each feature in the training set for each class. This involves calculating the mean and variance of each feature for each class.

3. `Class Prior Probabilities:` Calculate the prior probability of each class, which is the probability of each class occurring in the training data.

4. `Prediction:` Given a new instance with feature values, calculate the conditional probability of the instance belonging to each class using the Gaussian probability density function. Multiply the prior probability of each class with the conditional probability, and then select the class with the highest resulting probability as the predicted class.

5. `Model Evaluation:` Evaluate the performance of the Gaussian Naive Bayes model using appropriate evaluation metrics such as accuracy, precision, recall, or F1 score.

It's important to note that Gaussian Naive Bayes assumes that the features are independent given the class variable. However, this assumption may not hold in some real-world datasets. Therefore, it's advisable to assess the independence assumption and consider other algorithms if the independence assumption is violated.

Overall, Gaussian Naive Bayes can be extended to handle multi-class classification problems and is particularly useful when dealing with continuous numerical features.







## Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository 
(https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.



In [1]:
import pandas as pd
import numpy as np
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data")

In [2]:
df.head()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4600 entries, 0 to 4599
Data columns (total 58 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       4600 non-null   float64
 1   0.64    4600 non-null   float64
 2   0.64.1  4600 non-null   float64
 3   0.1     4600 non-null   float64
 4   0.32    4600 non-null   float64
 5   0.2     4600 non-null   float64
 6   0.3     4600 non-null   float64
 7   0.4     4600 non-null   float64
 8   0.5     4600 non-null   float64
 9   0.6     4600 non-null   float64
 10  0.7     4600 non-null   float64
 11  0.64.2  4600 non-null   float64
 12  0.8     4600 non-null   float64
 13  0.9     4600 non-null   float64
 14  0.10    4600 non-null   float64
 15  0.32.1  4600 non-null   float64
 16  0.11    4600 non-null   float64
 17  1.29    4600 non-null   float64
 18  1.93    4600 non-null   float64
 19  0.12    4600 non-null   float64
 20  0.96    4600 non-null   float64
 21  0.13    4600 non-null   float64
 22  

In [4]:
#splitting the dataset into independent and dependent features
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [5]:
#confirming the split
X.shape,y.shape

((4600, 57), (4600,))

In [6]:
#train test splitting the data
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(X,y , test_size=0.2, random_state=42)

In [7]:
X_train.shape , X_test.shape

((3680, 57), (920, 57))

In [8]:
#BernoulliNB
from sklearn.naive_bayes import BernoulliNB
classifier = BernoulliNB()

In [9]:
##Cross validation
parameters = {
    "alpha" : [1.0],
    "force_alpha" : [False] ,
    "binarize" : [0],
    "fit_prior" : [True ],
    "class_prior" : [None]
}

In [10]:
from sklearn.model_selection import GridSearchCV
clf = GridSearchCV(classifier ,param_grid=parameters,scoring="accuracy" , cv = 10 , verbose= 3)

In [11]:
clf.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.883 total time=   0.0s
[CV 2/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.856 total time=   0.0s
[CV 3/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.913 total time=   0.0s
[CV 4/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.870 total time=   0.0s
[CV 5/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.902 total time=   0.0s
[CV 6/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.880 total time=   0.0s
[CV 7/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.897 total time=   0.0s
[CV 8/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=Fa

In [12]:
clf.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.883 total time=   0.0s
[CV 2/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.856 total time=   0.0s
[CV 3/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.913 total time=   0.0s
[CV 4/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.870 total time=   0.0s
[CV 5/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.902 total time=   0.0s
[CV 6/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.880 total time=   0.0s
[CV 7/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.897 total time=   0.0s
[CV 8/10] END alpha=1.0, binarize=0, class_prior=None, fit_prior=True, force_alpha=Fa

In [13]:
y_pred_BNB = clf.predict(X_test)

In [14]:
from sklearn.metrics import classification_report
print("BernoulliNB classification reprot")
print(classification_report(y_test , y_pred_BNB))

BernoulliNB classification reprot
              precision    recall  f1-score   support

           0       0.86      0.93      0.89       530
           1       0.89      0.79      0.84       390

    accuracy                           0.87       920
   macro avg       0.88      0.86      0.87       920
weighted avg       0.87      0.87      0.87       920



In [15]:
#Multinomial
from sklearn.naive_bayes import MultinomialNB
classifier = MultinomialNB()

In [16]:
##Cross validation
parameters = {
    "alpha" : [1.0],
    "force_alpha" : [False],
    "fit_prior" : [True ],
    "class_prior" : [None]
}

In [17]:
clf = GridSearchCV(classifier ,param_grid=parameters,scoring="accuracy" , cv = 10 , verbose= 3)

In [18]:
clf.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.777 total time=   0.0s
[CV 2/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.818 total time=   0.0s
[CV 3/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.818 total time=   0.0s
[CV 4/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.802 total time=   0.0s
[CV 5/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.818 total time=   0.0s
[CV 6/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.799 total time=   0.0s
[CV 7/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.818 total time=   0.0s
[CV 8/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=False;, score=0.739 total time=   0.0s
[CV 9/10] END alpha=1.0, class_prior=None, fit_prior=True, 

In [19]:
y_pred_MNB = clf.predict(X_test)

In [20]:
print("Multinomial classification reprot")
print(classification_report(y_test , y_pred_MNB))

Multinomial classification reprot
              precision    recall  f1-score   support

           0       0.78      0.83      0.80       530
           1       0.75      0.68      0.71       390

    accuracy                           0.77       920
   macro avg       0.76      0.76      0.76       920
weighted avg       0.77      0.77      0.76       920



In [21]:
#Gaussian NB
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()

In [22]:
##Cross validation
parameters = {
    "priors" : [None],
    "var_smoothing" :[np.e**(-9)]
}

In [23]:
clf = GridSearchCV(classifier ,param_grid=parameters,scoring="accuracy" , cv = 10 , verbose= 3)

In [24]:
clf.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.658 total time=   0.0s
[CV 2/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.666 total time=   0.0s
[CV 3/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.688 total time=   0.0s
[CV 4/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.688 total time=   0.0s
[CV 5/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.668 total time=   0.0s
[CV 6/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.707 total time=   0.0s
[CV 7/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.666 total time=   0.0s
[CV 8/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.663 total time=   0.0s
[CV 9/10] END priors=None, var_smoothing=0.00012340980408667962;, score=0.652 total time=   0.0s
[CV 10/10] END priors=None, var_smoothing=0.00012340980408667962;,

In [25]:
y_pred_GNB = clf.predict(X_test)

In [26]:
print("GaussianNB classification reprot")
print(classification_report(y_test , y_pred_MNB))

GaussianNB classification reprot
              precision    recall  f1-score   support

           0       0.78      0.83      0.80       530
           1       0.75      0.68      0.71       390

    accuracy                           0.77       920
   macro avg       0.76      0.76      0.76       920
weighted avg       0.77      0.77      0.76       920



`Discussion:` The Bernoulli Naive Bayes model had the highest accuracy compared to all and it had the highest precision , recall and f1 scores compared to Multinomial or Gaussian NB classifier. This is because bernoulli NB classifier is good for binary classification problem statement as Bernoulli Naive Bayes is very good if the predictor variable has binary outcomes like whether the outcome is spam (1) or not spam(0). Therefore it has fared better compared to other methods. On the other hand multinomial classfication is more suitable when the predictor variable has mutiple categorical outcomes and Gaussain NB suits data which follows Guassian distribution.

While Naive Bayes is a powerful and widely used classification algorithm, it also has some limitations:

Strong independence assumption: Naive Bayes assumes that all features are independent, which is often not the case in real-world data. This can lead to suboptimal performance if there are correlations or dependencies between features.

Sensitivity to irrelevant features: Naive Bayes can be sensitive to irrelevant features, which can have a negative impact on its performance. This is because the algorithm treats all features equally and assigns equal weight to each feature, regardless of its relevance to the classification task.

Limited expressive power: Naive Bayes can only model linear decision boundaries, which can be a limitation when dealing with complex or nonlinear datasets.

Limited data: Naive Bayes requires a sufficient amount of data to accurately estimate the class-conditional probabilities. If the amount of training data is limited, the algorithm may suffer from overfitting or underfitting.

Handling of continuous data: The standard implementation of Naive Bayes assumes that the input features are categorical or binary. Handling continuous data requires discretization or modeling the features as continuous variables, which can be computationally expensive.

Despite these limitations, Naive Bayes remains a popular and effective classification algorithm, particularly for high-dimensional datasets with sparse features. Its simplicity, efficiency, and ease of implementation make it a good choice for many real-world applications.

`Conclusion:`

Bernoulli NB had the highest scores interms of all parameters when compared to other methods like polynomial or guassian NB because the target varaible had binary outcomes.
Bernoulli NB assumes that the features are categorical / binary in nature therefore applying it to continous data may not yield best results.
Since the dataset has limited number of observations the bernoulli Nb does not attain very high accuracy scores and more data may be required to develop better models
The model was trained with default parameters therefore to improve accuracy we can try out different parameters for our models during cross valdation
Logistic regression is very good in situations where features are continous in nature and outcome is binary and this can be tested out in future
Randome forest classifier may also be suitable for our problem statement and this can also be tried out in the future

Despite these limitations, Naive Bayes remains a popular and effective classification algorithm, particularly for high-dimensional datasets with sparse features. Its simplicity, efficiency, and ease of implementation make it a good choice for many real-world applications.