In [None]:
""" Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan? """

# ans
""" To calculate the probability that an employee is a smoker given that he/she uses the health insurance plan, you can use Bayes' theorem. Let:

A = Employee is a smoker.
B = Employee uses the health insurance plan.
You are given:

P(B) = 0.70 (probability that an employee uses the health insurance plan).
P(A|B) = 0.40 (probability that an employee is a smoker given that they use the health insurance plan).
You want to find P(B|A), the probability that an employee uses the health insurance plan given that they are a smoker. You can use Bayes' theorem:

P(B|A) = [P(A|B) * P(B)] / P(A)

P(A) can be calculated using the law of total probability:

P(A) = P(A|B) * P(B) + P(A|~B) * P(~B)

Here, ~B represents an employee not using the health insurance plan.

Assuming you have information about the percentage of non-health insurance users who are smokers (P(A|~B)), you can calculate P(A), and then use Bayes' theorem to find P(B|A). """

In [None]:
""" Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes? """

# ans
""" Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes classifier used for different types of data:

Bernoulli Naive Bayes is suitable for binary data, where features represent the presence (1) or absence (0) of specific attributes. It's commonly used for text classification problems, where each feature represents the presence or absence of a word in a document.

Multinomial Naive Bayes is designed for discrete data, such as text data where features are integer counts, representing how many times each term appears in a document. It's used for tasks like text classification, spam detection, and sentiment analysis.

The key difference lies in how they model the data and calculate conditional probabilities. """

In [None]:
""" Q3. How does Bernoulli Naive Bayes handle missing values? """

# ans
"""  Bernoulli Naive Bayes typically handles missing values by treating them as if the feature is absent. In the context of text data, if a term is missing in a document, it's treated as if it doesn't appear in that document (i.e., a 0 in the binary feature vector). This is a simplification, and whether this approach is suitable depends on the specific problem and dataset. You can also use techniques to impute missing values if needed, but this may complicate the Naive Bayes model.

 """

In [None]:
""" Q4. Can Gaussian Naive Bayes be used for multi-class classification? """

# ans
""" Gaussian Naive Bayes is primarily used for binary and continuous data, so it's not typically used for multi-class classification out of the box. However, you can adapt it for multi-class problems by using techniques like one-vs-all (OvA) or one-vs-one (OvO) strategies. In OvA, you train multiple binary classifiers, each distinguishing one class from the rest, and then combine their outputs to make a multi-class prediction. In OvO, you train a binary classifier for every pair of classes. Gaussian Naive Bayes can be used as one of these binary classifiers. These strategies allow you to extend Gaussian Naive Bayes for multi-class classification tasks. """

In [None]:
""" Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score"""

In [9]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.preprocessing import Binarizer
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split

# Load the dataset
data = pd.read_csv("spambase.data.csv", header=None)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)

In [10]:
# Create a Bernoulli Naive Bayes classifier
bnb = BernoulliNB()

bnb.fit(X_train, y_train)
y_pred=bnb.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

0.8905299739357081
              precision    recall  f1-score   support

           0       0.89      0.93      0.91       694
           1       0.89      0.83      0.86       457

    accuracy                           0.89      1151
   macro avg       0.89      0.88      0.88      1151
weighted avg       0.89      0.89      0.89      1151



In [11]:
# Create a Multinomial Naive Bayes classifier
mnb = MultinomialNB()

mnb.fit(X_train, y_train)
y_pred=mnb.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

0.8132059079061685
              precision    recall  f1-score   support

           0       0.84      0.85      0.85       694
           1       0.77      0.76      0.76       457

    accuracy                           0.81      1151
   macro avg       0.80      0.80      0.80      1151
weighted avg       0.81      0.81      0.81      1151



In [12]:
# Create a Gaussian Naive Bayes classifier
gnb = GaussianNB()

gnb.fit(X_train, y_train)
y_pred=gnb.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

0.8253692441355344
              precision    recall  f1-score   support

           0       0.97      0.73      0.84       694
           1       0.71      0.96      0.81       457

    accuracy                           0.83      1151
   macro avg       0.84      0.85      0.82      1151
weighted avg       0.86      0.83      0.83      1151

