## Q1. What is Bayes' theorem?

Bayes' theorem is a fundamental concept in probability theory and statistics that provides a framework for updating probability beliefs or estimates based on new evidence or information. It's named after the 18th-century statistician and philosopher Thomas Bayes.

The core idea of Bayes' theorem is to calculate the probability of an event (A) occurring given that another event (B) has occurred. 

This conditional probability, denoted as P(A|B), is calculated using the following formula:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:
- P(A|B) represents the conditional probability of event A occurring given that event B has occurred.
- P(B|A) is the conditional probability of event B occurring given that event A has occurred.
- P(A) is the prior probability of event A occurring independently of any information about event B.
- P(B) is the prior probability of event B occurring independently of any information about event A. 

Bayes' theorem allows us to update our belief or estimate of the probability of an event (A) occurring based on new evidence or information (B). It is commonly used in various fields, including machine learning, statistics, and data science, for tasks such as Bayesian inference, spam filtering, medical diagnosis, and more. Bayesian statistics, in particular, is a branch of statistics that relies heavily on Bayes' theorem to update probability distributions based on data and prior beliefs.

## Q2. What is the formula for Bayes' theorem?

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:

- P(A|B) represents the conditional probability of event A occurring given that event B has occurred. This is the probability we want to calculate.
- P(B|A) is the conditional probability of event B occurring given that event A has occurred.
- P(A) is the prior probability of event A occurring independently of any information about event B. 
- P(B) is the prior probability of event B occurring independently of any information about event A.

## Q3. How is Bayes' theorem used in practice?

Say we have a set of independent features age,height and weight and overweight as the output feature,

Here age,height and weight are independent feature and overweight is the dependent feature

age &ensp; height &ensp; weight &ensp; overweight
<br>17  &ensp; &ensp; 170 &ensp; &ensp; &ensp; 73  &ensp; &ensp; &ensp; &ensp; no
<br>21  &ensp; &ensp; 165  &ensp; &ensp; &ensp; 91  &ensp; &ensp; &ensp; &ensp; yes
<br>28  &ensp; &ensp; 178 &ensp; &ensp; &ensp; 88  &ensp; &ensp; &ensp; &ensp; yes
<br>42  &ensp; &ensp; 160 &ensp; &ensp; &ensp; 60  &ensp; &ensp; &ensp; &ensp; no

Now we have to find if the pearson is overweight or not based on age height and weight,

Considering, 
- Age,height and weight as x1,x2 amd x3 respectively and 
- overweight as y

By Using Bayes Theorem ,

P(y/(x1,x2,x3) = (P(y) * P(x1,x2,x3)/y ) / P(x1,x2,x3)
<br> P(y/(x1,x2,x3) = [(P(y) * P(x1/y) * P(x2/y) * P(x3)/y ] / P(x1,x2,x3)
<br> P(yes/(x1,x2,x3) = [(P(yes) * P(x1/yes) * P(x2/yes) * P(x3/yes) ] / P(x1,x2,x3)
<br> P(no/(x1,x2,x3) = [(P(no) * P(x1/no) * P(x2/no) * P(x3/no) ] / P(x1,x2,x3)


The Denominator P(x1,x2,x3) is contant and can be ignored in calculation

Final Formula is ,

P(yes/(x1,x2,x3) = (P(yes) * P(x1/yes) * P(x2/yes) * P(x3/yes)
<br> P(no/(x1,x2,x3) = (P(no) * P(x1/no) * P(x2/no) * P(x3/no) 

The one whose probabilty is the highest is considered the output.

## Q4. What is the relationship between Bayes' theorem and conditional probability?

Bayes' theorem is a mathematical formula that describes how to calculate conditional probabilities, and it provides a framework for updating probability beliefs based on new information or evidence.

The relationship between Bayes' theorem and conditional probability is as follows:

1. **Conditional Probability**: Conditional probability is a fundamental concept that deals with the probability of an event occurring given that another event has already occurred. It's denoted as P(A|B), which represents the probability of event A occurring given that event B has occurred.

2. **Bayes' Theorem**: Bayes' theorem is a specific formula that allows you to calculate conditional probabilities. It relates the conditional probability P(A|B) to other probabilities, including the prior probability of A (P(A)), the prior probability of B (P(B)), and the likelihood of B given A (P(B|A)). The formula is as follows:

   P(A|B) = (P(B|A) * P(A)) / P(B)

  - P(A|B) is the conditional probability you want to calculate, and it's related to the other probabilities in the equation.

Bayes' theorem is a mathematical tool for calculating conditional probabilities, and it formalizes the process of updating probability beliefs in light of new evidence. It's a fundamental concept in Bayesian probability and statistics, and it plays a crucial role in various applications, such as machine learning, and decision-making under uncertainty.

## Q5. How do you choose which type of Naive Bayes classifier to use for any given problem?

There are 3 Types of Naive Bayes classifiers :

1. **Gaussian Naive Bayes (GNB)**:
   - **Data Type**: GNB is suitable for continuous or real-valued data.
   - **Assumptions**: It assumes that the features follow a Gaussian (normal) distribution.
   - **Example Applications**: GNB is often used for problems involving continuous features, such as spam detection (with word frequencies as features) or medical diagnosis (with physiological measurements).

2. **Multinomial Naive Bayes (MNB)**:
   - **Data Type**: MNB is commonly used with discrete data, especially when dealing with text data.
   - **Assumptions**: It assumes that the features represent the counts or frequencies of events (e.g., word counts in text documents).
   - **Example Applications**: MNB is well-suited for text classification problems like sentiment analysis or document categorization, where features often represent word counts or term frequencies.

3. **Bernoulli Naive Bayes (BNB)**:
   - **Data Type**: BNB is also used with discrete data, but it's tailored for binary features (i.e., features that are either present or absent).
   - **Assumptions**: It assumes that features are binary, where 1 represents the presence of a feature and 0 represents the absence.
   - **Example Applications**: BNB is commonly applied in text classification tasks where binary features indicate whether specific words appear in a document or not (e.g., spam or not spam classification based on the presence of certain words).

## Q6. Assignment: You have a dataset with two features, X1 and X2, and two possible classes, A and B. You want to use Naive Bayes to classify a new instance with features X1 = 3 and X2 = 4. The following table shows the frequency of each feature value for each class:

<table>
  <thead>
    <th>Class</th>
    <th>X1=1</th>
    <th>X1=2</th>
    <th>X1=3</th>
    <th>X2=1</th>
    <th>X2=2</th>
    <th>X2=3</th>
    <th>X2=4</th>
  </thead>
  <tbody align="center">
    <tr>
      <td>A</td>
      <td>3</td>
      <td>3</td>
      <td>4</td>
      <td>4</td>
      <td>3</td>
      <td>3</td>
      <td>3</td>
    </tr>
    <tr>
      <td>B</td>
      <td>2</td>
      <td>2</td>
      <td>1</td>
      <td>2</td>
      <td>2</td>
      <td>2</td>
      <td>3</td>
    </tr>
 </tbody>
</table>
        

###  Assuming equal prior probabilities for each class, which class would Naive Bayes predict the new instance to belong to?

Let's denote:
- P(A) as the prior probability of class A (given as equal to P(B) in this case because of equal prior probabilities for each class).
- P(X1 = 3 | A) as the conditional probability of observing X1 = 3 given class A.
- P(X2 = 4 | A) as the conditional probability of observing X2 = 4 given class A.
- P(X1 = 3 | B) as the conditional probability of observing X1 = 3 given class B.
- P(X2 = 4 | B) as the conditional probability of observing X2 = 4 given class B.

You can calculate these conditional probabilities based on the provided frequency table:

- P(X1 = 3 | A) = 4 / (4 + 3) = 4/7
- P(X2 = 4 | A) = 3 / (4 + 3) = 3/7
- P(X1 = 3 | B) = 1 / (1 + 2) = 1/3
- P(X2 = 4 | B) = 3 / (2 + 2 + 3) = 3/7

Now, you can use Naive Bayes to calculate the posterior probabilities for each class:

For class A:
- P(A | X1 = 3, X2 = 4) = P(X1 = 3 | A) * P(X2 = 4 | A) * P(A) 
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= (4/7) * (3/7) * (1/2)
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 6/98
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 3/49
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 0.06122

For class B:
- P(B | X1 = 3, X2 = 4) = P(X1 = 3 | B) * P(X2 = 4 | B) * P(B) 
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= (1/3) * (3/7) * (1/2)
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 1/14
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = 0.07142
  
The Probabilty of class Prediction is calculated as,

- P(A | X1 = 3, X2 = 4) = 0.06122 / (0.06122 + 0.07142) 
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 0.4615 * 100
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 46.15%

- P(B | X1 = 3, X2 = 4) = 07142 / (0.06122 + 0.07142) 
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 0.5384 * 100
  <br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= 53.84%


Since Probability of Class B is 53.84 which is greater then Class A of 46.15 , Naive Bayes would predict that the new instance with features X1 = 3 and X2 = 4 belongs to class **B**.