Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we use Bayes' theorem. Let's denote the events as follows:

- Event A: An employee uses the company's health insurance plan.
- Event B: An employee is a smoker.

We are asked to find \( P(B|A) \), the probability that an employee is a smoker given that he/she uses the health insurance plan.

According to Bayes' theorem:

\[ P(B|A) = \frac{P(A|B) \times P(B)}{P(A)} \]

Given:
- \( P(A) \) = Probability that an employee uses the health insurance plan = 70% = 0.70
- \( P(B|A) \) = Probability that an employee is a smoker given that he/she uses the health insurance plan (what we want to find)
- \( P(B) \) = Probability that an employee is a smoker = 40% = 0.40

We can calculate \( P(A|B) \), the probability that an employee uses the health insurance plan given that he/she is a smoker, using the provided information:
- \( P(A|B) \) = Probability that an employee uses the health insurance plan given that he/she is a smoker = 40% = 0.40

Now, let's calculate \( P(B|A) \) using Bayes' theorem:

\[ P(B|A) = \frac{P(A|B) \times P(B)}{P(A)} \]
\[ P(B|A) = \frac{0.40 \times 0.40}{0.70} \]
\[ P(B|A) = \frac{0.16}{0.70} \]
\[ P(B|A) ≈ 0.2286 \]

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.2286 or 22.86%.

Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of features they are designed to handle and the underlying probability distributions they assume for those features:

1. **Bernoulli Naive Bayes**:
   - **Features**: Bernoulli Naive Bayes is typically used for binary feature data, where each feature can take on one of two possible values (e.g., presence or absence of a term in a document).
   - **Probability Distribution**: It assumes that each feature follows a Bernoulli distribution, which is a discrete probability distribution for a binary random variable (0 or 1).
   - **Example**: Bernoulli Naive Bayes is commonly used in text classification tasks, such as sentiment analysis or spam detection, where the presence or absence of certain words or features is used to classify documents.

2. **Multinomial Naive Bayes**:
   - **Features**: Multinomial Naive Bayes is suitable for categorical feature data, where each feature represents the count or frequency of a term occurring in a document or sample (e.g., word counts in a document).
   - **Probability Distribution**: It assumes that each feature follows a multinomial distribution, which is a generalization of the binomial distribution to more than two possible outcomes.
   - **Example**: Multinomial Naive Bayes is commonly used in text classification tasks, such as document categorization or topic classification, where the frequency of words or features in documents is used for classification.

In summary, Bernoulli Naive Bayes is used for binary feature data with Bernoulli-distributed features, while Multinomial Naive Bayes is used for categorical feature data with multinomial-distributed features. The choice between the two depends on the nature of the features and the assumptions that best match the data distribution in a given classification task.

Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes handles missing values by simply ignoring them during the classification process. Since Bernoulli Naive Bayes assumes that each feature follows a Bernoulli distribution, which is a binary distribution representing the presence or absence of a feature, missing values are treated as absent features.

During training, the presence or absence of each feature is determined based on whether it is present or absent in the training data. If a feature is missing for a particular sample in the training data, it is considered as absent, and its absence is accounted for in the probability calculations.

Similarly, during classification, if a feature is missing for a new instance, it is also treated as absent. The classification algorithm calculates the probability of the instance belonging to each class based on the presence or absence of features and their associated probabilities, as learned from the training data.

In summary, Bernoulli Naive Bayes does not require imputation or special handling for missing values, as it naturally handles them by considering them as absent features. However, it's essential to ensure that missing values are properly encoded as such in the dataset before training the model.

Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes that continuous features follow a Gaussian (normal) distribution. It is commonly used when the features are continuous and normally distributed.

In multi-class classification, there are more than two classes to predict. Gaussian Naive Bayes can be adapted to handle multi-class classification by extending the underlying probability model to accommodate multiple classes. The algorithm calculates the likelihood of each class for a given instance based on the probability density function (PDF) of the Gaussian distribution for each feature in each class.

During training, Gaussian Naive Bayes estimates the mean and variance of each feature for each class from the training data. Then, during classification, it computes the probability of each class given the observed feature values using Bayes' theorem and the Gaussian probability density function.

In summary, Gaussian Naive Bayes can be used for both binary and multi-class classification tasks, making it a versatile algorithm for a wide range of machine learning problems. However, it assumes that the continuous features in the dataset are normally distributed, so it may not perform well if this assumption is violated.

Q5. Assignment:

Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

In [1]:
import pandas as pd

# URL of the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"

# Column names for the dataset
column_names = [
    "word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d", "word_freq_our",
    "word_freq_over", "word_freq_remove", "word_freq_internet", "word_freq_order", "word_freq_mail",
    "word_freq_receive", "word_freq_will", "word_freq_people", "word_freq_report", "word_freq_addresses",
    "word_freq_free", "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
    "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money", "word_freq_hp", "word_freq_hpl",
    "word_freq_george", "word_freq_650", "word_freq_lab", "word_freq_labs", "word_freq_telnet", "word_freq_857",
    "word_freq_data", "word_freq_415", "word_freq_85", "word_freq_technology", "word_freq_1999", "word_freq_parts",
    "word_freq_pm", "word_freq_direct", "word_freq_cs", "word_freq_meeting", "word_freq_original", "word_freq_project",
    "word_freq_re", "word_freq_edu", "word_freq_table", "word_freq_conference", "char_freq_;", "char_freq_(",
    "char_freq_[", "char_freq_!", "char_freq_$", "char_freq_#", "capital_run_length_average", "capital_run_length_longest",
    "capital_run_length_total", "is_spam"
]

# Read the dataset into a DataFrame
df = pd.read_csv(url, header=None, names=column_names)

# Display the first few rows of the DataFrame
print(df.head())

   word_freq_make  word_freq_address  word_freq_all  word_freq_3d  \
0            0.00               0.64           0.64           0.0   
1            0.21               0.28           0.50           0.0   
2            0.06               0.00           0.71           0.0   
3            0.00               0.00           0.00           0.0   
4            0.00               0.00           0.00           0.0   

   word_freq_our  word_freq_over  word_freq_remove  word_freq_internet  \
0           0.32            0.00              0.00                0.00   
1           0.14            0.28              0.21                0.07   
2           1.23            0.19              0.19                0.12   
3           0.63            0.00              0.31                0.63   
4           0.63            0.00              0.31                0.63   

   word_freq_order  word_freq_mail  ...  char_freq_;  char_freq_(  \
0             0.00            0.00  ...         0.00        0.000   
1 

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.

In [2]:
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
# Split features and target variable
X = df.drop(columns=["is_spam"])
y = df["is_spam"]

# Instantiate the Naive Bayes classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Perform 10-fold cross-validation and calculate mean accuracy
scores_bernoulli = cross_val_score(bernoulli_nb, X, y, cv=10, scoring='accuracy')
scores_multinomial = cross_val_score(multinomial_nb, X, y, cv=10, scoring='accuracy')
scores_gaussian = cross_val_score(gaussian_nb, X, y, cv=10, scoring='accuracy')

# Print mean accuracy for each classifier
print("Mean Accuracy (Bernoulli Naive Bayes):", scores_bernoulli.mean())
print("Mean Accuracy (Multinomial Naive Bayes):", scores_multinomial.mean())
print("Mean Accuracy (Gaussian Naive Bayes):", scores_gaussian.mean())

Mean Accuracy (Bernoulli Naive Bayes): 0.8839380364047911
Mean Accuracy (Multinomial Naive Bayes): 0.7863496180326323
Mean Accuracy (Gaussian Naive Bayes): 0.8217730830896915


Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

In [3]:
from sklearn.metrics import classification_report

# Define a function to calculate and print performance metrics
def print_metrics(y_true, y_pred, classifier_name):
    print("Performance metrics for", classifier_name)
    print(classification_report(y_true, y_pred))

# Perform predictions using each classifier and print performance metrics
print_metrics(y, bernoulli_nb.fit(X, y).predict(X), "Bernoulli Naive Bayes")
print_metrics(y, multinomial_nb.fit(X, y).predict(X), "Multinomial Naive Bayes")
print_metrics(y, gaussian_nb.fit(X, y).predict(X), "Gaussian Naive Bayes")

Performance metrics for Bernoulli Naive Bayes
              precision    recall  f1-score   support

           0       0.89      0.93      0.91      2788
           1       0.89      0.82      0.85      1813

    accuracy                           0.89      4601
   macro avg       0.89      0.87      0.88      4601
weighted avg       0.89      0.89      0.88      4601

Performance metrics for Multinomial Naive Bayes
              precision    recall  f1-score   support

           0       0.82      0.84      0.83      2788
           1       0.74      0.72      0.73      1813

    accuracy                           0.79      4601
   macro avg       0.78      0.78      0.78      4601
weighted avg       0.79      0.79      0.79      4601

Performance metrics for Gaussian Naive Bayes
              precision    recall  f1-score   support

           0       0.97      0.73      0.83      2788
           1       0.70      0.96      0.81      1813

    accuracy                           0.82

Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

Based on the results obtained from the performance metrics, we can discuss the performance of each variant of Naive Bayes:

1. **Bernoulli Naive Bayes**:
   - **Performance**: The accuracy, precision, recall, and F1 score of Bernoulli Naive Bayes are determined to evaluate its performance.
   - **Observation**: Bernoulli Naive Bayes may perform well when dealing with binary features, such as presence or absence of certain words in text data.
   - **Limitations**: Bernoulli Naive Bayes assumes that features are binary and independent, which may not always hold true in real-world datasets. Additionally, it may struggle with continuous or multinomial features.

2. **Multinomial Naive Bayes**:
   - **Performance**: The accuracy, precision, recall, and F1 score of Multinomial Naive Bayes are determined to evaluate its performance.
   - **Observation**: Multinomial Naive Bayes may perform well when dealing with features representing counts or frequencies, such as word counts in text data.
   - **Limitations**: Multinomial Naive Bayes assumes that features are multinomially distributed and independent, which may not always hold true in real-world datasets. It may also struggle with continuous or binary features.

3. **Gaussian Naive Bayes**:
   - **Performance**: The accuracy, precision, recall, and F1 score of Gaussian Naive Bayes are determined to evaluate its performance.
   - **Observation**: Gaussian Naive Bayes may perform well when dealing with continuous features that follow a Gaussian distribution.
   - **Limitations**: Gaussian Naive Bayes assumes that features are continuous and follow a Gaussian distribution, which may not hold true for all datasets. It may also struggle with categorical or binary features.

Overall, the choice of which variant of Naive Bayes performs the best depends on the nature of the dataset and the characteristics of the features. In some cases, Bernoulli Naive Bayes may perform better if the features are binary, while in others, Multinomial or Gaussian Naive Bayes may be more appropriate. 

Conclusion:
Summarise your findings and provide some suggestions for future work.

In conclusion, we explored three variants of Naive Bayes classifiers (Bernoulli, Multinomial, and Gaussian) and evaluated their performance on the "Spambase Data Set" using 10-fold cross-validation. Here are the key findings:

1. **Performance Comparison**: Each variant of Naive Bayes achieved different levels of performance based on accuracy, precision, recall, and F1 score metrics.
2. **Best Performing Variant**: The best performing variant of Naive Bayes depended on the nature of the dataset and the characteristics of the features. Bernoulli Naive Bayes may perform well with binary features, Multinomial Naive Bayes with features representing counts or frequencies, and Gaussian Naive Bayes with continuous features following a Gaussian distribution.
3. **Limitations**: Naive Bayes classifiers have certain limitations, such as the assumption of feature independence and the requirement for features to follow specific distributions, which may not always hold true in real-world datasets.
4. **Future Work Suggestions**:
   - Investigate feature engineering techniques to enhance the performance of Naive Bayes classifiers by transforming features or creating new ones that better capture the underlying patterns in the data.
   - Explore ensemble methods that combine multiple Naive Bayes classifiers or integrate them with other machine learning algorithms to improve overall predictive performance.
   - Evaluate Naive Bayes classifiers on diverse datasets with varying characteristics to gain insights into their strengths and weaknesses across different domains.
   - Investigate techniques for handling imbalanced datasets, as Naive Bayes classifiers may struggle with class imbalances and biased predictions.

Overall, Naive Bayes classifiers offer simplicity, efficiency, and interpretability, making them suitable for a wide range of classification tasks. However, understanding their limitations and exploring strategies to mitigate them can lead to more effective and robust machine learning models.