## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

The probability that an employee is a smoker given that he/she uses the health insurance plan can be calculated using the conditional probability formula. 

Let's denote:
- \( A \): The event that an employee uses the health insurance plan.
- \( B \): The event that an employee is a smoker.

The probability of an employee using the health insurance plan is \( P(A) = 0.70 \) (70%), and the probability of an employee who uses the plan being a smoker is \( P(B|A) = 0.40 \) (40%).

The conditional probability of being a smoker given the use of the health insurance plan is given by:

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

where \( P(A \cap B) \) is the probability of both events A and B occurring.

We know that \( P(B|A) = 0.40 \) and \( P(A) = 0.70 \). Rearranging the formula to solve for \( P(A \cap B) \), we get:

\[ P(A \cap B) = P(B|A) \cdot P(A) \]

Substitute the values:

\[ P(A \cap B) = 0.40 \cdot 0.70 \]

Now, calculate the result:

\[ P(A \cap B) = 0.28 \]

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is \( 0.28 \) or 28%.

## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm, and they differ in terms of the types of data they are designed to handle.

1. **Bernoulli Naive Bayes:**
   - **Features:** It is specifically designed for binary features, meaning features that can take on only two values (0 or 1).
   - **Use Case:** It is often used in text classification problems, where the presence or absence of words in a document is considered. Each feature represents the presence (1) or absence (0) of a term.
   - **Probability Model:** Assumes that features are binary variables and follows a Bernoulli distribution. The model considers only whether a particular feature is present or not, not the frequency of its occurrence.

2. **Multinomial Naive Bayes:**
   - **Features:** It is suitable for features that represent counts or frequencies. The features are non-negative integers, typically representing the number of occurrences of a term.
   - **Use Case:** Commonly used in text classification problems, especially when the frequency of words matters. It considers the frequency of each term in the document.
   - **Probability Model:** Assumes that features are generated from a multinomial distribution. It models the likelihood of observing a particular frequency of each term.

**Key Differences:**

- **Type of Features:**
  - Bernoulli Naive Bayes is suitable for binary features.
  - Multinomial Naive Bayes is suitable for features representing counts or frequencies.

- **Probability Distribution:**
  - Bernoulli Naive Bayes assumes a Bernoulli distribution for features.
  - Multinomial Naive Bayes assumes a multinomial distribution for features.

- **Representation of Data:**
  - In Bernoulli Naive Bayes, the focus is on the presence or absence of features (binary representation).
  - In Multinomial Naive Bayes, the focus is on the frequencies or counts of features.

- **Application:**
  - Bernoulli Naive Bayes is often used in text classification problems where the presence or absence of words is important.
  - Multinomial Naive Bayes is also used in text classification but takes into account the frequency of words.

Both variants make the "naive" assumption of feature independence, meaning that features are considered to be conditionally independent given the class label. The choice between Bernoulli and Multinomial Naive Bayes depends on the nature of the data and the problem at hand.

## Q3. How does Bernoulli Naive Bayes handle missing values?

In the context of Naive Bayes classifiers, including Bernoulli Naive Bayes, handling missing values is an important consideration. The presence of missing values in the dataset can affect the performance of the classifier, and various strategies can be employed to address this issue. Here are some common approaches:

1. **Ignoring Missing Values:**
   - One simple approach is to ignore instances with missing values during training and classification. This means that any instance with missing values in one or more features is excluded from consideration. While straightforward, this approach may lead to a loss of information.

2. **Imputation:**
   - Another strategy is to impute (fill in) missing values with some estimated values. Common imputation methods include replacing missing values with the mean, median, or mode of the observed values in the feature. However, for Bernoulli Naive Bayes, which deals with binary features, imputation might not be as straightforward.

3. **Treating Missing as a Separate Category:**
   - Instead of imputing missing values, you can treat missing values as a separate category or level for each feature. This approach is feasible for categorical or discrete features but may not be suitable for binary features in a Bernoulli Naive Bayes setting.

4. **Advanced Imputation Techniques:**
   - For more advanced scenarios, you may explore machine learning-based imputation techniques, such as using other classifiers or regression models to predict missing values based on the available information. This can be especially useful when dealing with complex relationships in the data.

The specific choice of how to handle missing values depends on the characteristics of the data and the problem at hand. It's important to carefully consider the implications of each approach and evaluate their impact on the performance of the classifier.

In the case of Bernoulli Naive Bayes, where features are binary, the handling of missing values should align with the nature of binary features. Ignoring missing values or treating them as a separate category are more straightforward options in this context. Imputation might require additional consideration and may not be as intuitive as in the case of continuous or categorical features.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. The Gaussian Naive Bayes classifier is an extension of the Naive Bayes algorithm that is designed for continuous features, assuming that the values of each feature are normally (Gaussian) distributed within each class. Despite the "Gaussian" assumption, it can still be applied to problems with more than two classes.

In the context of multi-class classification, the classifier assigns a class label to an instance based on the class that maximizes the posterior probability given the observed feature values. The decision rule for multi-class Gaussian Naive Bayes is typically based on comparing the posterior probabilities for each class.

The probability of an instance belonging to class \( C_i \) given its feature vector \( X \) is calculated using Bayes' theorem:

\[ P(C_i | X) \propto P(X | C_i) \cdot P(C_i) \]

Here:
- \( P(C_i | X) \) is the posterior probability of class \( C_i \) given the observed features.
- \( P(X | C_i) \) is the likelihood of the observed features given class \( C_i \), which is modeled as a multivariate Gaussian distribution.
- \( P(C_i) \) is the prior probability of class \( C_i \).

The classification decision is made by selecting the class with the highest posterior probability.

While Gaussian Naive Bayes is applicable to multi-class problems, it's important to note that its performance may be influenced by the assumption of feature independence, which is part of the "naive" aspect of the algorithm. Additionally, its effectiveness depends on the degree to which the feature distributions within each class approximate a multivariate Gaussian distribution. In practice, it is often used for problems where the Gaussian assumption is reasonable and computational efficiency is a priority.

Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

Note: Create your assignment in Jupyter notebook and upload it to GitHub & share that github repository
link through your dashboard. Make sure the repository is public.
Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.

In [6]:
import pandas as pd

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
columns = [...]  # Specify column names based on the dataset documentation
data = pd.read_csv(url)


In [7]:
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Split the data into features (X) and target variable (y)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Implement and evaluate Bernoulli Naive Bayes
bernoulli_nb = BernoulliNB()
bernoulli_scores = cross_val_score(bernoulli_nb, X, y, cv=10)

# Implement and evaluate Multinomial Naive Bayes
multinomial_nb = MultinomialNB()
multinomial_scores = cross_val_score(multinomial_nb, X, y, cv=10)

# Implement and evaluate Gaussian Naive Bayes
gaussian_nb = GaussianNB()
gaussian_scores = cross_val_score(gaussian_nb, X, y, cv=10)


In [8]:
# Function to calculate performance metrics
def calculate_metrics(y_true, y_pred):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    return accuracy, precision, recall, f1

# Calculate metrics for each classifier
bernoulli_metrics = calculate_metrics(y, cross_val_predict(bernoulli_nb, X, y, cv=10))
multinomial_metrics = calculate_metrics(y, cross_val_predict(multinomial_nb, X, y, cv=10))
gaussian_metrics = calculate_metrics(y, cross_val_predict(gaussian_nb, X, y, cv=10))

# Print the results
print("Bernoulli Naive Bayes Metrics:", bernoulli_metrics)
print("Multinomial Naive Bayes Metrics:", multinomial_metrics)
print("Gaussian Naive Bayes Metrics:", gaussian_metrics)


Bernoulli Naive Bayes Metrics: (0.8839130434782608, 0.8812649164677804, 0.815121412803532, 0.8469036697247706)
Multinomial Naive Bayes Metrics: (0.7860869565217391, 0.7320627802690582, 0.7207505518763797, 0.7263626251390434)
Gaussian Naive Bayes Metrics: (0.8217391304347826, 0.7003231017770598, 0.956953642384106, 0.808768656716418)
