- In **Multinomial Naive Bayes**, the probabilities of the features are calculated by assuming that the features are independent, which is why it is called "naive".
- The output of a **Gaussian Naive Bayes classifier** is a probability distribution over classes. For each instance, the classifier calculates the probability of that instance belonging to each class, based on the values of its features. The class with the highest probability is then chosen as the predicted class for that instance. This output is useful because it not only provides a predicted class label, but also a measure of confidence in that prediction.
-  **Gaussian Naive Bayes** is best suited for numerical data, where each feature has a continuous value. This is because Gaussian Naive Bayes assumes that the distribution of each feature is Gaussian (normal), which is a good assumption for numerical data. For categorical data, a different type of Naive Bayes classifier such as **Multinomial Naive Bayes** would be more appropriate. Text and image data are usually preprocessed into numerical data before being used with Naive Bayes classifiers.
- Bayesian inference involves the use of prior knowledge or beliefs about a parameter or hypothesis, which are updated in light of new data using Bayes' theorem. Classical inference, on the other hand, does not involve the use of prior knowledge and instead relies on the properties of the data sample and the sampling distribution. Bayesian inference is often used in cases where prior knowledge or expertise can provide valuable insights or help to inform the analysis.
- AUC (Area Under the Curve) is a measure of how well a binary classifier can distinguish between positive and negative examples. It measures the area under the ROC curve and provides a single score to evaluate the overall performance of a classification model.

## 9th April Assignment-1

## Q1. What is Bayes' theorem?  Q2. What is the formula for Bayes' theorem?

Ans 1. Bayes' theorem is a mathematical concept in probability theory that describes the probability of an event based on prior knowledge or information. It was named after the Reverend Thomas Bayes, an 18th-century British statistician and theologian who developed the idea.

Ans 2 The formula for Bayes' theorem is as follows:

$$P(A|B) = P(B|A) * P(A) / P(B)$$

where:

P(A|B) is the conditional probability of A given B.
P(B|A) is the conditional probability of B given A.
P(A) is the prior probability of A.
P(B) is the prior probability of B.
In words, the formula states that the probability of A given B is equal to the probability of B given A times the prior probability of A, divided by the prior probability of B.







## Q3. How is Bayes' theorem used in practice?

Bayes' theorem is widely used in practice across various fields, including statistics, machine learning, artificial intelligence, data science, and engineering. Some of the practical applications of Bayes' theorem include:

- Spam filtering: Bayes' theorem is used in spam filtering to classify emails as spam or not spam based on the probability of the words in the email.

- Medical diagnosis: Bayes' theorem is used in medical diagnosis to determine the probability of a patient having a disease based on their symptoms and medical history.

- Risk assessment: Bayes' theorem is used in risk assessment to calculate the probability of an event occurring based on historical data.

- Prediction: Bayes' theorem is used in machine learning and artificial intelligence to make predictions based on prior knowledge and data.

- Fraud detection: Bayes' theorem is used in fraud detection to determine the probability of a transaction being fraudulent based on historical data.

- Sensor fusion: Bayes' theorem is used in sensor fusion to combine data from multiple sensors to make accurate predictions.

Overall, Bayes' theorem is a fundamental concept in many areas of applied mathematics and is widely used to make predictions, classification, and decisions based on probabilities.







## Q4. What is the relationship between Bayes' theorem and conditional probability?

Bayes' theorem and conditional probability are closely related concepts. In fact, Bayes' theorem is based on the concept of conditional probability. Conditional probability is the probability of an event occurring given that another event has already occurred. Bayes' theorem uses conditional probability to calculate the probability of a hypothesis given the observed evidence.

More specifically, Bayes' theorem states that the probability of a hypothesis (H) given the observed evidence (E) is equal to the product of the probability of the evidence given the hypothesis and the prior probability of the hypothesis, divided by the probability of the evidence:

P(H | E) = P(E | H) * P(H) / P(E)

where P(H | E) is the posterior probability of the hypothesis given the evidence, P(E | H) is the likelihood of the evidence given the hypothesis, P(H) is the prior probability of the hypothesis, and P(E) is the probability of the evidence.

Conditional probability is used to calculate the likelihood of the evidence given the hypothesis (P(E | H)), while the prior probability of the hypothesis (P(H)) is usually determined by previous experience or expert knowledge. The probability of the evidence (P(E)) can be calculated by summing the probabilities of all possible outcomes that could produce the observed evidence. **By using Bayes' theorem, we can update our prior beliefs about the probability of a hypothesis in light of new evidence**. This makes Bayes' theorem a powerful tool in machine learning, as it allows us to update our models as new data becomes available.

## Q5. How do you choose which type of Naive Bayes classifier to use for any given problem?

There are three main types of Naive Bayes classifiers: Gaussian, Multinomial, and Bernoulli.

1. **Gaussian Naive Bayes** assumes that the features follow a normal (Gaussian) distribution. It is typically used for classification problems with **continuous** input variables.

2. **Multinomial Naive Bayes** assumes that the features are counts or frequencies of discrete events, such as word counts in a document. It is typically used for **text classification** problems.

3. **Bernoulli Naive Bayes** is similar to Multinomial Naive Bayes, but it assumes that the features are **binary (0 or 1)**, representing the absence or presence of a particular feature. It is also typically used for text classification problems.

The choice of which type of Naive Bayes classifier to use depends on the type of data you have and the nature of the problem you are trying to solve. If your input features are continuous, you might choose Gaussian Naive Bayes. If your input features are counts or frequencies, you might choose Multinomial or Bernoulli Naive Bayes depending on whether your features are discrete or binary. In some cases, you might want to try all three and compare their performance on your specific problem.

The mathematical formulas behind each of these classifiers are as follows:

1. **Gaussian Naive Bayes**:
Gaussian Naive Bayes assumes that the features are normally distributed. For a given class, the probability density function (PDF) of each feature is estimated using mean and variance. The class with the highest posterior probability is chosen as the predicted class. The formula for Gaussian Naive Bayes is as follows:
$$P(y|x) = P(y) * Π P(xi|y) = P(y) * Π (1/√(2πσy^2)) * exp(-((xi-μy)^2)/(2σy^2))$$

where,
- P(y|x) is the posterior probability of class y given feature vector x
- P(y) is the prior probability of class y
- P(xi|y) is the PDF of feature i given class y
- μy is the mean of feature i given class y
- σy^2 is the variance of feature i given class y

2. **Multinomial Naive Bayes**
Multinomial Naive Bayes is used for discrete data such as text. It assumes that the features are generated from a multinomial distribution. For a given class, the probability of each feature is estimated using the frequency count. The class with the highest posterior probability is chosen as the predicted class. The formula for Multinomial Naive Bayes is as follows:
$$P(y|x) = P(y) * Π (P(xi|y)^xi) / (xi!)$$

where,
- P(y|x) is the posterior probability of class y given feature vector x
- P(y) is the prior probability of class y
- P(xi|y) is the probability of feature i given class y
- xi is the count of feature i in the document
- Π (P(xi|y)^xi) / (xi!) is the probability of generating the document given class y

3. **Bernoulli Naive Bayes**:
Bernoulli Naive Bayes is also used for discrete data such as text. It assumes that the features are generated from a Bernoulli distribution. For a given class, the probability of each feature is estimated using the frequency count. The class with the highest posterior probability is chosen as the predicted class. The formula for Bernoulli Naive Bayes is as follows:
$$P(y|x) = P(y) * Π (P(xi|y)^xi) * (1-P(xi|y)^(1-xi))$$

where,
- P(y|x) is the posterior probability of class y given feature vector x
- P(y) is the prior probability of class y
- P(xi|y) is the probability of feature i given class y
- xi is the presence or absence of feature i in the document (1 if present, 0 if absent)
- Π (P(xi|y)^xi) * (1-P(xi|y)^(1-xi)) is the probability of generating the document given class y







## here's an example calculation for Naive Bayes using the GaussianNB classifier:

Suppose we have a dataset of fruits with two features: weight (in grams) and sweetness (on a scale from 0 to 10). We want to predict whether a fruit is an apple or an orange based on its weight and sweetness.

Our training dataset looks like this:

           $$Fruit	Weight	Sweetness
              Apple	150	8
             Orange	130	4
             Apple	200	9
             Orange	160	3
            Orange	180	6
            Apple	120	7$$
We can use the GaussianNB classifier to predict the fruit type based on weight and sweetness. Here's how we can do this:

First, we split the dataset into training and testing sets. Let's say we use 4 instances for training and 2 instances for testing. We can randomly select the following instances for testing:

                    $$Fruit	Weight	Sweetness
                     Orange	160	3
                     Apple	120	7$$
Next, we compute the mean and standard deviation for each feature (weight and sweetness) and for each class (apple and orange) based on the training data. Here's what the mean and standard deviation look like for each feature and class:

                      $$Class	Feature	Mean	Standard deviation
                       Apple	Weight	157.5	33.54
                        Apple	Sweetness	8.0	1.41
                        Orange	Weight	156.67	20.41
                        Orange	Sweetness	4.33	1.70$$
Now, we can use these values to compute the probability of each test instance belonging to each class. For example, for the first test instance (orange with weight 160g and sweetness 3), we can compute the probability of it being an apple as follows:

$$P(apple|weight=160, sweetness=3) = P(weight=160|apple) * P(sweetness=3|apple) * P(apple)$$

$$P(weight=160) * P(sweetness=3)$$

where:

P(apple|weight=160, sweetness=3) is the probability of the instance being an apple given its weight and sweetness
P(weight=160|apple) is the probability of an apple having a weight of 160g (according to our training data)
P(sweetness=3|apple) is the probability of an apple having a sweetness rating of 3 (according to our training data)
P(apple) is the prior probability of an apple (i.e., the proportion of apples in our training data)
P(weight=160) and P(sweetness=3) are the marginal probabilities of weight and sweetness, respectively
We can compute the same probability for the instance being an orange and compare the two probabilities to decide which class the instance belongs to.

In this way, we can use the GaussianNB classifier to predict the fruit type based on weight and sweetness. Note that this is just a simple example and in practice, we would likely use more features and more instances to train our model.







## Q5 You have a dataset with two features, X1 and X2, and two possible classes, A and B. You want to use Naive Bayes to classify a new instance with features X1 = 3 and X2 = 4. The following table shows the frequency of each feature value for each 
                $$Class X1=1 X1=2 X1=3 X2=1 X2=2 X2=3 X2=4
                 A 3 3 4 4 3 3 3
                 B 2 2 1 2 2 2 3$$

- Assuming equal prior probabilities for each class, which class would Naive Bayes predict the new instance
to belong to?


To use Naive Bayes to predict the class of a new instance with features X1=3 and X2=4, we need to compute the posterior probabilities for each class using Bayes' theorem and then choose the class with the highest probability.

Let A be the event that the new instance belongs to class A and B be the event that it belongs to class B. We want to compute 
$$P(A|X1=3, X2=4) and P(B|X1=3, X2=4)$$.

According to Naive Bayes, the joint probability of the features given the class is the product of the probabilities of each feature given the class. For example, $$P(X1=3, X2=4|A) = P(X1=3|A) * P(X2=4|A)$$. We can estimate these probabilities from the frequency table as follows:

- P(X1=3|A) = 4/10 = 0.4
- P(X2=4|A) = 3/10 = 0.3
- P(X1=3|B) = 1/7 ≈ 0.143
- P(X2=4|B) = 3/7 ≈ 0.429

Using the law of total probability, we can compute the marginal probabilities of the features as follows:

P(X1=3) = P(X1=3|A) * P(A) + P(X1=3|B) * P(B) = 0.4 * 0.5 + 0.143 * 0.5 = 0.2715
P(X2=4) = P(X2=4|A) * P(A) + P(X2=4|B) * P(B) = 0.3 * 0.5 + 0.429 * 0.5 = 0.3645
Assuming equal prior probabilities for each class (i.e., P(A) = P(B) = 0.5), we can compute the posterior probabilities using Bayes' theorem:

P(A|X1=3, X2=4) = P(X1=3, X2=4|A) * P(A) / P(X1=3) / P(X2=4) = 0.12 / 0.2715 / 0.3645 ≈ 0.969
P(B|X1=3, X2=4) = P(X1=3, X2=4|B) * P(B) / P(X1=3) / P(X2=4) = 0.0129 / 0.2715 / 0.3645 ≈ 0.117
Therefore, Naive Bayes would predict that the new instance belongs to class A, since it has the higher posterior probability.







## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

This is an example of a conditional probability problem, which can be solved using Bayes' theorem. Let S denote the event that an employee is a smoker, and H denote the event that an employee uses the health insurance plan. Then, we are given:

P(H) = 0.7 (probability of using the health insurance plan)
P(S|H) = 0.4 (probability of being a smoker given that the employee uses the health insurance plan)

We want to find P(S|H), the probability of an employee being a smoker given that he/she uses the health insurance plan. Bayes' theorem states that:

P(S|H) = P(H|S) * P(S) / P(H)

where P(H|S) is the probability of using the health insurance plan given that the employee is a smoker, P(S) is the overall probability of being a smoker, and P(H) is the overall probability of using the health insurance plan.

We can use the law of total probability to find P(H|S):

P(H|S) = P(H and S) / P(S)
= P(S|H) * P(H) / P(S)

where P(S|H) and P(H) are given, and we can find P(S) using the law of total probability:

P(S) = P(S and H) + P(S and not H)
= P(S|H) * P(H) + P(S|not H) * P(not H)

where P(S|not H) is the probability of being a smoker given that the employee does not use the health insurance plan, and P(not H) is the probability of not using the health insurance plan. We can find these probabilities using the given information:

P(S|not H) = 0.1 (probability of being a smoker given that the employee does not use the health insurance plan)
P(not H) = 0.3 (probability of not using the health insurance plan)

Therefore,

P(S) = P(S|H) * P(H) + P(S|not H) * P(not H)
= 0.4 * 0.7 + 0.1 * 0.3
= 0.31

Now we can substitute the values we have found into Bayes' theorem:

P(S|H) = P(H|S) * P(S) / P(H)
= (P(S|H) * P(H)) / (P(S|H) * P(H) + P(S|not H) * P(not H))
= (0.4 * 0.7) / (0.4 * 0.7 + 0.1 * 0.3)
= 0.824

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.824, or approximately 82.4%.







## Q2 What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm used for classification tasks. The main difference between them lies in the type of data they are best suited for.

Bernoulli Naive Bayes is used for binary data, where each feature can take on only two possible values (0 or 1). This makes it ideal for text classification tasks, where each feature represents the presence or absence of a specific word in a document. It assumes that the features are conditionally independent given the class, and the probability distribution of each feature is a Bernoulli distribution.

On the other hand, Multinomial Naive Bayes is used for discrete count data, where each feature represents the frequency of a word or term in a document. This makes it well-suited for text classification tasks where the features are counts or frequencies of words. It assumes that the features are conditionally independent given the class, and the probability distribution of each feature is a multinomial distribution.

In summary, Bernoulli Naive Bayes is used for binary data, while Multinomial Naive Bayes is used for discrete count data.







## Q3. How does Bernoulli Naive Bayes handle missing values?

In Bernoulli Naive Bayes, missing values are typically handled by ignoring them during the probability calculations. This is because the Bernoulli distribution assumes that the features are binary, taking on values of either 0 or 1. Thus, a missing value can be thought of as an absence of evidence, and it is not included in the calculation of probabilities.

For example, if a feature has values of 1, 0, 1, and missing, the probability of this feature being 1 would be calculated based on the 1s and 0s only. The missing value is not included in the calculation.

However, some implementations of Bernoulli Naive Bayes may require that missing values be replaced with a default value, such as 0 or 1, in order to compute probabilities. In this case, the choice of default value may have an impact on the classification results and should be carefully considered.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. In this case, the model learns the parameters of the Gaussian distribution for each class and then predicts the class with the highest probability based on the input features. The decision boundary for multi-class classification in Gaussian Naive Bayes is typically linear, which means that the classes are separated by hyperplanes. If the classes are not linearly separable, other classifiers such as SVM or decision trees may be more appropriate.