## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


To find the probability that an employee is a smoker given that they use the health insurance plan, we can use Bayes' theorem.

Let:
- \( P(A) \) be the probability that an employee uses the health insurance plan = 0.70
- \( P(B|A) \) be the probability that an employee is a smoker given that they use the health insurance plan = 0.40

We want to find \( P(B|A) \), which is already given as 0.40. Therefore, the probability that an employee is a smoker given that they use the health insurance plan is 40%.


## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?


The main differences between Bernoulli Naive Bayes and Multinomial Naive Bayes are:

- **Bernoulli Naive Bayes**: This classifier is used for binary/boolean features. It assumes that features are binary (i.e., presence or absence of a feature). It is useful when the features represent the presence or absence of some property or event.

- **Multinomial Naive Bayes**: This classifier is used for discrete features that represent counts or frequencies (e.g., word counts in text classification). It is based on the multinomial distribution and is useful when the features are counts or when the features can take on multiple values.


## Q3. How does Bernoulli Naive Bayes handle missing values?


Bernoulli Naive Bayes does not have a built-in mechanism to handle missing values. Typically, missing values are handled by either imputing them (e.g., filling with the mean or median value) or by removing the records with missing values before applying the Bernoulli Naive Bayes classifier. Proper handling of missing values is important to ensure accurate classification results.


## Q4. Can Gaussian Naive Bayes be used for multi-class classification?


Yes, Gaussian Naive Bayes can be used for multi-class classification. It is capable of handling multiple classes by modeling the feature distribution for each class using a Gaussian (normal) distribution. The classifier computes the probability of each class given the features and selects the class with the highest posterior probability. This makes it suitable for problems where there are more than two classes.


## Q5. Implementation

### Data Preparation


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
column_names = [f'feature_{i}' for i in range(57)] + ['label']
data = pd.read_csv(url, header=None, names=column_names)

X = data.iloc[:, :-1]
y = data.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.shape, X_test.shape


((3680, 57), (921, 57))

### Implementation


In [3]:
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

def evaluate_classifier(classifier, X, y):
    scores = cross_val_score(classifier, X, y, cv=10, scoring='accuracy')
    return scores.mean()

accuracy_bernoulli = evaluate_classifier(bernoulli_nb, X_train, y_train)
accuracy_multinomial = evaluate_classifier(multinomial_nb, X_train, y_train)
accuracy_gaussian = evaluate_classifier(gaussian_nb, X_train, y_train)

accuracy_bernoulli, accuracy_multinomial, accuracy_gaussian


(0.8853260869565217, 0.7918478260869566, 0.8206521739130433)

### Results


In [4]:
bernoulli_nb.fit(X_train, y_train)
multinomial_nb.fit(X_train, y_train)
gaussian_nb.fit(X_train, y_train)

y_pred_bernoulli = bernoulli_nb.predict(X_test)
y_pred_multinomial = multinomial_nb.predict(X_test)
y_pred_gaussian = gaussian_nb.predict(X_test)

def performance_metrics(y_true, y_pred):
    return {
        'Accuracy': accuracy_score(y_true, y_pred),
        'Precision': precision_score(y_true, y_pred),
        'Recall': recall_score(y_true, y_pred),
        'F1 Score': f1_score(y_true, y_pred)
    }

metrics_bernoulli = performance_metrics(y_test, y_pred_bernoulli)
metrics_multinomial = performance_metrics(y_test, y_pred_multinomial)
metrics_gaussian = performance_metrics(y_test, y_pred_gaussian)

metrics_bernoulli, metrics_multinomial, metrics_gaussian


({'Accuracy': 0.8805646036916395,
  'Precision': 0.9069767441860465,
  'Recall': 0.8,
  'F1 Score': 0.8501362397820164},
 {'Accuracy': 0.7861020629750272,
  'Precision': 0.7643835616438356,
  'Recall': 0.7153846153846154,
  'F1 Score': 0.7390728476821192},
 {'Accuracy': 0.8208469055374593,
  'Precision': 0.7192982456140351,
  'Recall': 0.9461538461538461,
  'F1 Score': 0.8172757475083057})