# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 1 : What is Bayes' theorem?</div>

Bayes' theorem, named after Thomas Bayes, is a fundamental concept in probability theory and statistics. It provides a way to update probabilities based on new evidence or information. The theorem is formulated as follows:

$$[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]$$
where:
- $(P(A|B))$ is the probability of event A occurring given that event B has occurred.
- $(P(B|A))$ is the probability of event B occurring given that event A has occurred.
- $(P(A))$ is the prior probability of event A.
- $(P(B))$ is the prior probability of event B.

Bayes' theorem is particularly useful in Bayesian statistics and is widely applied in various fields, including machine learning, medical diagnosis, and information retrieval. It allows for the incorporation of new evidence to update and refine the probability of a hypothesis or event.

# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 2 : What is the formula for Bayes' theorem?</div>

Bayes' theorem is expressed mathematically as:

$$[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]$$
where:
- $(P(A|B))$ is the probability of event A occurring given that event B has occurred.
- $(P(B|A))$ is the probability of event B occurring given that event A has occurred.
- $(P(A))$ is the prior probability of event A.
- $(P(B))$ is the prior probability of event B.


# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 3 : How is Bayes' theorem used in practice? </div>

Bayes' theorem is widely used in various fields and applications to update probabilities based on new evidence or information. Here's a general overview of how Bayes' theorem is applied in practice:

1. **Defining Prior Probabilities (Prior Beliefs):** Before any new evidence is considered, there are initial beliefs or probabilities associated with different events. These are known as prior probabilities and are denoted by $(P(A))$ and $(P(B))$.

2. **Incorporating New Evidence (Likelihood):** When new evidence or information becomes available, the likelihood of observing that evidence given certain hypotheses is assessed. This likelihood is represented by $(P(B|A))$, where A is the hypothesis and B is the evidence.

3. **Calculating Joint Probabilities:** The joint probability of both the hypothesis and the evidence is computed by multiplying the prior probability of the hypothesis by the likelihood of the evidence: $(P(B|A)) \cdot P(A))$.

4. **Calculating the Normalization Factor (Marginal Likelihood):** The marginal likelihood or evidence, $(P(B))$, is calculated by considering all possible ways in which the evidence could occur, taking into account all possible hypotheses. It is the sum of the joint probabilities for all hypotheses: $(\sum_{i} P(B|A_i) \cdot P(A_i))$.

5. **Updating Probabilities (Posterior Probability):** Bayes' theorem is then used to calculate the posterior probability, \(P(A|B)\), which represents the updated probability of the hypothesis given the new evidence. The formula is: 

   $[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]$
   
6. **Iterative Process:** This process can be repeated iteratively as new evidence becomes available, updating the probabilities and refining the beliefs over time.

Applications of Bayes' theorem include:
- **Medical Diagnosis:** Updating the probability of a disease given new test results.
- **Spam Filtering:** Updating the probability of an email being spam based on observed features.
- **Machine Learning:** Bayesian methods are used in machine learning for parameter estimation and model updating.
- **Finance:** Assessing the probability of different financial outcomes based on market information.

Bayes' theorem provides a principled way to incorporate new information into existing knowledge and is particularly useful in situations where uncertainty needs to be quantified and updated as evidence accumulates.

# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 4: What is the relationship between Bayes' theorem and conditional probability? </div>

Bayes' theorem is closely related to conditional probability, and it can be derived from the definition of conditional probability. Let's explore this relationship:

**Conditional Probability:**
Conditional probability is the probability of an event occurring given that another event has already occurred. It is denoted as \( P(A|B) \) and is calculated using the formula:

$[ P(A|B) = \frac{P(A \cap B)}{P(B)} ]$

where:
- $( P(A|B) )$ is the conditional probability of event A given that event B has occurred.
- $( P(A \cap B) )$ is the probability of both events A and B occurring.
- $( P(B) )$ is the probability of event B occurring.

**Bayes' Theorem:**
Bayes' theorem is a way of expressing conditional probability in terms of other conditional probabilities and prior probabilities. It is formulated as:

$[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]$

where:
- $( P(A|B) )$ is the conditional probability of event A given that event B has occurred.
- $( P(B|A) )$ is the conditional probability of event B given that event A has occurred.
- $( P(A) )$ is the prior probability of event A.
- $( P(B) )$ is the prior probability of event B.

**Relationship:**
Bayes' theorem provides a way to express conditional probability in terms of prior probabilities and the likelihood of the evidence. By rearranging terms in Bayes' theorem, you can derive the formula for conditional probability:

$[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} ]$

This relationship highlights how Bayes' theorem extends and connects with the concept of conditional probability. Bayes' theorem is a powerful tool for updating probabilities based on new evidence, making it particularly useful in situations where you want to refine your beliefs as more information becomes available.

# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 5 : How do you choose which type of Naive Bayes classifier to use for any given problem? </div>

Choosing the right type of Naive Bayes classifier depends on the nature of your data and the underlying assumptions you can make about the independence of features. The three main types of Naive Bayes classifiers are:

1. **Gaussian Naive Bayes:**
   - **Assumption:** Assumes that the continuous features follow a Gaussian (normal) distribution.
   - **Use Case:** Suitable for datasets where the features are continuous and approximately normally distributed.

2. **Multinomial Naive Bayes:**
   - **Assumption:** Appropriate for discrete data, such as word counts in text classification problems.
   - **Use Case:** Commonly used in text classification tasks where features represent the frequency of words in documents.

3. **Bernoulli Naive Bayes:**
   - **Assumption:** Assumes that features are binary (i.e., present or absent).
   - **Use Case:** Suitable for binary data, such as document classification where the presence or absence of certain words is considered.

**Guidelines for Choosing:**

1. **Nature of Features:**
   - **Continuous Features:** If your features are continuous and approximately normally distributed, consider Gaussian Naive Bayes.
   - **Discrete Features:** If your features are discrete, such as word counts or presence/absence indicators, consider Multinomial or Bernoulli Naive Bayes.

2. **Dataset Size:**
   - **Small Datasets:** In cases of small datasets, Naive Bayes classifiers, in general, can perform well. However, the choice between Gaussian, Multinomial, or Bernoulli depends on the nature of the features.

3. **Independence Assumption:**
   - **Features are Independent:** Naive Bayes classifiers assume independence between features. If this assumption is violated, the classifier might not perform well. Evaluate whether the independence assumption is reasonable for your data.

4. **Text Classification:**
   - **Word Frequency Data:** For text classification tasks where features represent word frequencies, Multinomial Naive Bayes is commonly used.
   - **Presence/Absence of Words:** If you're dealing with binary data indicating the presence or absence of words (e.g., in spam filtering), Bernoulli Naive Bayes may be appropriate.

5. **Experimentation:**
   - **Try Different Types:** It's often beneficial to try different types of Naive Bayes classifiers on your dataset and compare their performance through cross-validation or other evaluation methods.

Remember that the choice between these classifiers is not always strict, and experimentation is key to finding the most suitable model for your specific problem. Additionally, preprocessing steps, such as feature scaling or transformation, can also influence the performance of Naive Bayes classifiers.

# <div style="padding: 10px; background-color: #64CCC5; margin: 10px; color: #000000; font-family: 'New Times Roman', serif; font-size: 60%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Question 6 : Assignment:</div>

**You have a dataset with two features, X1 and X2, and two possible classes, A and B. You want to use Naive Bayes to classify a new instance with features X1 = 3 and X2 = 4. The following table shows the frequency of each feature value for each class:**
```python
Class X1=1 X1=2 X1=3 X2=1 X2=2 X2=3 X2=4
    A   3   3    4    4    3    3    3
    B   2   2    1    2    2    2    3
```
    
**Assuming equal prior probabilities for each class, which class would Naive Bayes predict the new instance to belong to?**

To use Naive Bayes for classification, you need to calculate the likelihood and the prior probabilities for each class. The class with the highest posterior probability given the new instance's features is the predicted class.

Let's denote:
- $( P(A) )$ and $( P(B) )$ as the prior probabilities of classes A and B, respectively. Since the prior probabilities are assumed to be equal, $( P(A) = P(B) = 0.5 )$.
- $( P(X1=3 | A) )$ and $( P(X1=3 | B) )$ as the likelihoods of $( X1=3 )$ given classes A and B, respectively.
- $( P(X2=4 | A) )$ and $( P(X2=4 | B) )$ as the likelihoods of $( X2=4 )$ given classes A and B, respectively.

The likelihoods can be calculated from the provided table as follows:

$[ P(X1=3 | A) = \frac{4}{10} = 0.4 ]$
$[ P(X1=3 | B) = \frac{1}{9} ]$

$[ P(X2=4 | A) = \frac{3}{10} = 0.3 ]$
$[ P(X2=4 | B) = \frac{3}{9} ]$

Now, the posterior probabilities for each class given the new instance's features are calculated using Bayes' theorem:

$[ P(A | X1=3, X2=4) \propto P(X1=3 | A) \times P(X2=4 | A) \times P(A) ]$
$[ P(B | X1=3, X2=4) \propto P(X1=3 | B) \times P(X2=4 | B) \times P(B) ]$

Since we only need the relative probabilities, we can compare the products without normalizing:

$[ P(A | X1=3, X2=4) \propto 0.4 \times 0.3 \times 0.5 ]$
$[ P(B | X1=3, X2=4) \propto 0.0222 ]$

Comparing the values, $( P(A | X1=3, X2=4) )$ is higher, so Naive Bayes would predict that the new instance belongs to class A.

# <div style="padding: 15px; background-color: #D2E0FB; margin: 15px; color: #000000; font-family: 'New Times Roman', serif; font-size: 110%; text-align: center; border-radius: 10px; overflow: hidden; font-weight: bold;"> Complete</div>