Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

In [None]:
# Given probabilities
p_H = 0.7  # Probability that an employee uses the health insurance plan
p_S_not = 0.3  # Probability that an employee is not a smoker
p_H_given_S = 0.4  # Probability that an employee uses the health insurance plan given that he/she is a smoker

# Calculate P(S), the probability that an employee is a smoker
p_S = 1 - p_S_not

# Calculate P(S|H) using Bayes' theorem
p_S_given_H = (p_H_given_S * p_S) / p_H

# Output the result
print(f"The probability that an employee is a smoker given that he/she uses the health insurance plan is: {p_S_given_H:.4f}")


Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans:-
The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the nature of the features they are designed to handle:

Bernoulli Naive Bayes:

Assumes that the features are binary, meaning they can take only two possible values (typically 0 or 1).
Commonly used for text classification problems where the presence or absence of words in a document is considered.
Well-suited for problems where the feature vectors represent binary data, such as document classification where each word is treated as a binary feature.
Multinomial Naive Bayes:

Assumes that the features represent counts or frequencies of events.
Often used for text classification problems where the features are the counts of word occurrences in a document.
Suitable for problems where the feature vectors represent discrete data (e.g., word counts) and are non-negative.

Q3. How does Bernoulli Naive Bayes handle missing values?
Ans:-Bernoulli Naive Bayes, like other Naive Bayes variants, does not handle missing values in a straightforward manner. It assumes that all features are binary, meaning they take values of 0 or 1. In the context of text classification, for example, a feature might represent the presence or absence of a particular word in a document.

If you have missing values in your data and you are using Bernoulli Naive Bayes, you might need to handle them before applying the classifier. Here are a couple of common strategies:

Imputation:

You can impute the missing values by replacing them with a suitable value (e.g., 0 or 1).
The choice of imputation depends on the nature of your data and the meaning of the missing values.
Feature Engineering:

If missing values are common and imputation is not straightforward, you might consider creating an additional binary feature to explicitly indicate whether a value is missing or not.
This way, the missing information is not lost, and the model can potentially learn how to handle missing values.

In [None]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
import numpy as np

# Example data with missing values
X_train = np.array([[1, 1, 0], [0, np.nan, 1], [1, 0, 1], [0, 1, 0]])
y_train = np.array([0, 1, 0, 1])

# Create a pipeline with a Bernoulli Naive Bayes classifier and a simple imputer
model = make_pipeline(SimpleImputer(strategy='most_frequent'), BernoulliNB())

# Fit the model on the data with missing values
model.fit(X_train, y_train)

# Example test data with missing values
X_test = np.array([[1, np.nan, 0]])

# Make predictions on the test data
predictions = model.predict(X_test)

print(f"Predictions for the test data: {predictions}")


Q4. Can Gaussian Naive Bayes be used for multi-class classification?

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset (a well-known multi-class dataset)
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Gaussian Naive Bayes classifier
gnb = GaussianNB()

# Fit the model on the training data
gnb.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = gnb.predict(X_test)

# Calculate accuracy and display classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}\n")

# Display detailed classification report
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Classification Report:\n", report)
