# 1] A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


In [1]:
(0.4 * 0.7) / ((0.6 * 0.3) + (0.4 * 0.7))

0.6086956521739131

# 2] What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?


## 1) Feature Type:

### => Bernoulli Naive Bayes: This variant assumes that features are binary or follow a Bernoulli distribution. Each feature is treated as a binary indicator of presence or absence of a particular attribute.
### => Multinomial Naive Bayes: It is designed for discrete features that represent counts or frequencies, such as word frequencies in text classification or occurrence counts of events in a fixed-size set.
## 2) Feature Representation:

### => Bernoulli Naive Bayes: Each feature is represented as a binary value (0 or 1), indicating the absence or presence of the feature.
### => Multinomial Naive Bayes: Features are represented as discrete counts or frequencies. For example, in text classification, each feature may represent the number of times a word appears in a document or the frequency of a word in a corpus.
## 3) Probability Estimation:

### => Bernoulli Naive Bayes: The conditional probabilities are estimated using the frequency of feature occurrence within each class. The probability of a feature being present (1) or absent (0) is calculated.
### => Multinomial Naive Bayes: The conditional probabilities are estimated using the frequency or counts of each feature within each class. The probability of a feature taking a specific value (count) is computed.
## 4) Handling Missing Features:

### => Bernoulli Naive Bayes: It can handle missing features by treating them as absent (0).
### => Multinomial Naive Bayes: Missing features are typically ignored in the computation of conditional probabilities, assuming they have zero counts.
## 5) Application Domains:

### => Bernoulli Naive Bayes: It is commonly used in text classification tasks where features represent the presence or absence of specific words or features in documents.
### => Multinomial Naive Bayes: It is widely used in text classification, sentiment analysis, and other applications involving discrete feature counts.


# 3] How does Bernoulli Naive Bayes handle missing values?


### => Bernoulli Naive Bayes handles missing values by treating them as absent (0). In this variant of Naive Bayes, each feature is represented as a binary value, indicating the presence or absence of a particular attribute. When a feature value is missing, it is considered as if the feature is absent (0).

### => By treating missing values as 0, Bernoulli Naive Bayes assumes that the missing values do not contribute to the presence of the feature. This assumption aligns with the Bernoulli distribution, where the feature is represented as a binary indicator.

### => When calculating the conditional probabilities in Bernoulli Naive Bayes, the absence of a feature (0) is considered along with the presence (1) within each class. The frequency of feature occurrence is estimated, taking into account the presence or absence of the feature for each class.

### => It's important to note that the treatment of missing values as absent (0) may introduce a bias if the missingness is not random or if the missing values carry some specific meaning. In such cases, alternative approaches, such as imputation techniques, may be considered to handle missing values appropriately.

# 4] Can Gaussian Naive Bayes be used for multi-class classification?


### => Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of Naive Bayes that assumes a Gaussian (normal) distribution for continuous or real-valued features. It is commonly used when the features are continuous and can be modeled by a Gaussian distribution.

### => In the case of multi-class classification, where there are more than two classes, Gaussian Naive Bayes can still be applied. The classifier builds separate Gaussian distribution models for each class based on the training data, estimating the mean and variance for each feature in each class. During prediction, it calculates the probability of the instance belonging to each class using the Gaussian distribution parameters and applies the Bayes' theorem to determine the most probable class.

### => The decision rule in Gaussian Naive Bayes assigns the class label with the highest posterior probability given the observed feature values. This process can be extended to multiple classes by comparing the posterior probabilities of each class and selecting the one with the highest probability.

# 5] Assignment:
## Data preparation:
### Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.
## Implementation:
### Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.
## Results:
### Report the following performance metrics for each classifier:
### Accuracy
### Precision
### Recall
### F1 score
## Discussion:
### Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?
## Conclusion: 
### Summarise your findings and provide some suggestions for future work.
### 
## Note:
### This dataset contains a binary classification problem with multiple features. The dataset is relatively small, but it can be used to demonstrate the performance of the different variants of Naive Bayes on a real-world problem.

In [2]:
import pandas as pd

In [3]:
df=pd.read_csv("spambase_csv (1).csv")
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_%3B,char_freq_%28,char_freq_%5B,char_freq_%21,char_freq_%24,char_freq_%23,capital_run_length_average,capital_run_length_longest,capital_run_length_total,class
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [4]:
X=df.drop(["class"],axis=1)
y=df["class"]

In [5]:
from sklearn.naive_bayes import GaussianNB,BernoulliNB,MultinomialNB

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=29)

In [8]:
clf_g=GaussianNB()

In [9]:
clf_b=BernoulliNB()

In [10]:
clf_m=MultinomialNB()

In [11]:
from sklearn.model_selection import GridSearchCV

In [12]:
params={}

In [13]:
clf_gcv=GridSearchCV(estimator=clf_g,param_grid=params,cv=10)

In [14]:
clf_bcv=GridSearchCV(estimator=clf_b,param_grid=params,cv=10)
clf_mcv=GridSearchCV(estimator=clf_m,param_grid=params,cv=10)

In [15]:
clf_gcv.fit(X_train,y_train)

GridSearchCV(cv=10, estimator=GaussianNB(), param_grid={})

In [18]:
clf_bcv.fit(X_train,y_train)


GridSearchCV(cv=10, estimator=BernoulliNB(), param_grid={})

In [19]:
clf_mcv.fit(X_train,y_train)

GridSearchCV(cv=10, estimator=MultinomialNB(), param_grid={})

In [21]:
y_predg=clf_gcv.predict(X_test)
y_predb=clf_bcv.predict(X_test)
y_predm=clf_mcv.predict(X_test)

In [22]:
lst=[y_predb,y_predg,y_predm]

In [23]:
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [26]:
for i in lst:
    print(classification_report(y_test,i))
    print(accuracy_score(y_test,i))
    print(confusion_matrix(y_test,i))
    print("\n")

              precision    recall  f1-score   support

           0       0.87      0.93      0.90       543
           1       0.89      0.80      0.85       378

    accuracy                           0.88       921
   macro avg       0.88      0.87      0.87       921
weighted avg       0.88      0.88      0.88       921

0.8794788273615635
[[506  37]
 [ 74 304]]


              precision    recall  f1-score   support

           0       0.96      0.73      0.83       543
           1       0.71      0.96      0.82       378

    accuracy                           0.82       921
   macro avg       0.84      0.84      0.82       921
weighted avg       0.86      0.82      0.83       921

0.8241042345276873
[[398 145]
 [ 17 361]]


              precision    recall  f1-score   support

           0       0.81      0.83      0.82       543
           1       0.75      0.72      0.73       378

    accuracy                           0.78       921
   macro avg       0.78      0.77      0

### => bernouli naive bayes performed the best, because it specially used for binary classification.

## limitations:
## 1) Gaussian Naive Bayes: 
### => This variant assumes that continuous features follow a Gaussian (normal) distribution. The limitations associated with Gaussian Naive Bayes include:

### => It assumes that the feature distributions are unimodal and have equal variance. If the actual distributions deviate significantly from these assumptions, the performance may be negatively affected.
### => It may struggle with data that has outliers or features that do not follow a Gaussian distribution.
## 2) Multinomial Naive Bayes: 
### => This variant is commonly used for text classification tasks, where features represent word frequencies or occurrences. Some limitations of Multinomial Naive Bayes include:

### => It assumes that features are independent and have multinomial distribution. If the features are dependent or have a different distribution, the classifier may not perform optimally.
### => It may not handle rare or unseen feature occurrences well since it relies on counting frequencies in the training data.
## 3) Bernoulli Naive Bayes: 
### => This variant is similar to Multinomial Naive Bayes but is specifically designed for binary feature data. Some limitations of Bernoulli Naive Bayes include:

### => It assumes that features are independent and have a Bernoulli distribution. If the independence assumption is violated or the features have a different distribution, it may lead to suboptimal results.
### => It may struggle with imbalanced datasets or when the occurrence of certain features is rare.

## Findings:

### => Naive Bayes relies on the strong assumption of feature independence, which may not hold true in real-world scenarios.
### => It fails to capture feature interactions, which can be important for accurate classification.
### => The algorithm assumes specific feature distributions, and deviations from these assumptions can impact performance.
### => The zero probability problem occurs when a feature value is absent in the training data, leading to incorrect classifications.
### => Naive Bayes has limited expressive power compared to more complex classifiers.
### => Insufficient training data or data sparsity can affect the accuracy of Naive Bayes.
### 
## Suggestions for future work:

### => Relaxing the independence assumption: Explore methods to relax the strict assumption of feature independence, such as using feature selection techniques or incorporating feature dependencies into the model.
### => Handling feature interactions: Investigate techniques to capture feature interactions, such as using higher-order Naive Bayes models or integrating other machine learning algorithms that can capture complex relationships.
### => Robustness to feature distributions: Develop approaches to handle deviations from assumed feature distributions, such as using non-parametric methods or adapting Naive Bayes to handle different types of feature distributions.
### => Zero probability problem mitigation: Investigate strategies to address the zero probability problem, such as smoothing techniques like Laplace smoothing or employing more sophisticated probability estimation methods.
### => Enhancing model complexity: Explore ensemble methods or hybrid models that combine Naive Bayes with other classifiers to improve its expressive power and capture more complex patterns in the data.
### => Handling data scarcity: Investigate techniques to handle data sparsity, such as incorporating prior knowledge or leveraging techniques like data augmentation or transfer learning to improve classification performance with limited training data.