In [1]:
# Answer 1

# using the formula of the naive bayes -- P(A/B) = ( P(B/A) . P(A) ) / P(B),
#  => P(smoker / health plan) = ( P(health plan / smoker) . P(smoker) ) / P(health plan)
#  => 0.40⋅p / 0.70
 
# P(smoker) = p which is needed to calculate exact probability

In [2]:
# Answer 2

# Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of the Naive Bayes algorithm used for text classification and other machine learning tasks. They are designed to handle different types of data and assumptions about feature distributions. Here's the difference between the two:

# Bernoulli Naive Bayes:

# Type of Features: Bernoulli Naive Bayes is used when features are binary or boolean, meaning they are either present (1) or absent (0).
# Assumption: It assumes that features are conditionally independent given the class label.
# Application: It's commonly used for text classification tasks where features represent the presence or absence of specific words in a document (binary bag-of-words representation).
# Example: In email classification, each feature could represent the presence or absence of a particular word in the email.
# Formula: It calculates the probability of a feature being present or absent given the class label.
# Multinomial Naive Bayes:

# Type of Features: Multinomial Naive Bayes is used when features are discrete and represent counts or frequencies. These could be the counts of words or tokens in a document.
# Assumption: Similar to Bernoulli Naive Bayes, it assumes that features are conditionally independent given the class label.
# Application: It's commonly used for text classification tasks where features represent word frequencies or counts (integer values).
# Example: In email classification, each feature could represent the frequency of a word in the email.
# Formula: It calculates the probability of observing a particular frequency of a feature given the class label.


In [3]:
# Answer 3

# Bernoulli Naive Bayes, like other variants of the Naive Bayes algorithm, is designed to work with complete data where all features are observed. It doesn't inherently handle missing values out of the box. However, there are ways to handle missing values in the context of Bernoulli Naive Bayes:

# Imputation with Defaults: One simple approach is to impute missing values with a default value that makes sense in the context of binary features. For example, you could replace missing values with 0 (absence) if the feature represents the presence or absence of a certain attribute.

# Use a Separate Category: If it's possible to treat missing values as a separate category, you could create an additional category for missing values. This would essentially turn the feature into a ternary feature: 0 (absence), 1 (presence), and M (missing).

# Drop Missing Values: If the proportion of missing values is small and random, you might consider dropping instances with missing values from your dataset. However, this approach might result in data loss.

# Advanced Imputation Techniques: If you have a significant amount of missing data, you could use more advanced imputation techniques like k-Nearest Neighbors imputation or matrix factorization methods.

In [4]:
# Answer 4

# Yes, Gaussian Naive Bayes can be used for multi-class classification. While it's often associated with binary classification due to its name ("Naive Bayes"), the Gaussian Naive Bayes algorithm can be extended to handle multi-class problems as well.

# In the context of multi-class classification, Gaussian Naive Bayes calculates the likelihood of each class using a Gaussian distribution (normal distribution) for each feature. The assumption is that each class's feature values follow a Gaussian distribution.

# Here's how Gaussian Naive Bayes works for multi-class classification:

# Training Phase:

# Calculate the mean and standard deviation of each feature for each class.
# For each class, calculate the Gaussian probability density function (PDF) for each feature's value using the class's mean and standard deviation.
# Prediction Phase:

# Given a new instance with feature values, calculate the Gaussian PDF for each class's feature distribution.
# Combine the Gaussian PDFs using Bayes' theorem to calculate the posterior probabilities for each class.
# Select the class with the highest posterior probability as the predicted class.
# The Gaussian Naive Bayes algorithm is mathematically straightforward and can be effective when the features are approximately normally distributed within each class. However, keep in mind that it assumes that all features are continuous and follow a Gaussian distribution.



In [1]:
# Answer 5

import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset from the UCI repository
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

# Split the dataset into features (X) and target labels (y)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Instantiate classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Function to evaluate classifiers and print performance metrics
def evaluate_classifier(classifier, X, y):
    accuracy = cross_val_score(classifier, X, y, cv=10, scoring='accuracy').mean()
    precision = cross_val_score(classifier, X, y, cv=10, scoring='precision').mean()
    recall = cross_val_score(classifier, X, y, cv=10, scoring='recall').mean()
    f1 = cross_val_score(classifier, X, y, cv=10, scoring='f1').mean()
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print()

# Evaluate each classifier
print("Bernoulli Naive Bayes:")
evaluate_classifier(bernoulli_nb, X, y)

print("Multinomial Naive Bayes:")
evaluate_classifier(multinomial_nb, X, y)

print("Gaussian Naive Bayes:")
evaluate_classifier(gaussian_nb, X, y)


Bernoulli Naive Bayes:
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481

Multinomial Naive Bayes:
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283

Gaussian Naive Bayes:
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131



In [None]:
# Based on these results, we can observe that one variant of Naive Bayes outperforms the others in terms of accuracy, precision, recall, and F1 score. This could be due to the characteristics of the dataset and the assumptions made by each variant.

# Conclusion:
    
# In conclusion, the choice of the best variant of Naive Bayes depends on the nature of the dataset and the problem you are trying to solve. Here are some observations and suggestions for future work:

# Bernoulli Naive Bayes: This variant is suitable when dealing with binary features, such as the presence or absence of certain words in text classification tasks. It assumes features are conditionally independent given the class label.

# Multinomial Naive Bayes: This variant is useful for discrete features that represent counts or frequencies. It's commonly used for text classification where features represent word frequencies. It also assumes feature independence.

# Gaussian Naive Bayes: This variant is intended for continuous features that follow a Gaussian distribution. It might perform well if features exhibit a normal distribution within classes.