<a href="https://colab.research.google.com/github/swopnimghimire-123123/Maths_For_ML/blob/main/05_Conditional_Probability_%26_Theorem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Colab 5: Conditional Probability & Bayes Theorem
- Learning Goals

- Understand conditional probability

- Apply the multiplication rule

- Learn Bayes theorem with intuition

- See how it works with a simple dataset

1. Conditional Probability

👉 Definition:

$P(A∣B) = \frac{P(A \cap B)}{P(B)}$

"Probability of A given B has occurred."

Example:

A = rolling an even number

B = rolling a number > 3

In [None]:
import numpy as np

die = np.arange(1, 7)
A = {2, 4, 6}       # even numbers
B = {4, 5, 6}       # >3

P_A = len(A)/6
P_B = len(B)/6
P_A_and_B = len(A & B)/6

P_A_given_B = P_A_and_B / P_B
print(f"The probability of A given B is: {P_A_given_B}")

The probability of A given B is: 0.6666666666666666


### 2. Multiplication Rule
$P(A \cap B) = P(A∣B) \cdot P(B)$

  This connects conditional probability back to joint probability.

### 3. Bayes Theorem

 Formula:

$P(A∣B) = \frac{P(B∣A) \cdot P(A)}{P(B)}$

Reverses conditional probability.

Essential when we know likelihoods but want posterior.

Let's look at the Bayes Theorem formula with some examples:

$P(A∣B) = \frac{P(B∣A) \cdot P(A)}{P(B)}$

### Example 1: Coin Flip

Imagine you have two coins:
*   Coin A: A fair coin (50% chance of heads, 50% chance of tails). $P(H∣A) = 0.5$, $P(T∣A) = 0.5$
*   Coin B: A biased coin (70% chance of heads, 30% chance of tails). $P(H∣B) = 0.7$, $P(T∣B) = 0.3$

You randomly pick one coin (let's say there's a 50% chance of picking either coin, so $P(A) = 0.5$ and $P(B) = 0.5$) and flip it. You get a Head. What is the probability you picked the biased Coin B, given that you got a Head? We want to find $P(\text{Coin B}∣\text{Head})$.

In [None]:
# Given values
P_Coin_A = 0.5
P_Coin_B = 0.5

P_Head_given_Coin_A = 0.5
P_Head_given_Coin_B = 0.7

# Evidence Probability P(Head) using the Law of Total Probability
P_Head = (P_Head_given_Coin_A * P_Coin_A) + (P_Head_given_Coin_B * P_Coin_B)

# Bayes theorem to find P(Coin B | Head)
P_Coin_B_given_Head = (P_Head_given_Coin_B * P_Coin_B) / P_Head

print(f"The probability of picking Coin B given a Head is: {P_Coin_B_given_Head}")

The probability of picking Coin B given a Head is: 0.5833333333333334


### Example 2: Spam Filter

Let A be the event that an email is Spam, and B be the event that the email contains the word "free".

*   Suppose 20% of emails are Spam: $P(A) = 0.20$
*   Suppose 80% of Spam emails contain the word "free": $P(B∣A) = 0.80$
*   Suppose 10% of non-Spam emails contain the word "free": $P(B∣\neg A) = 0.10$

If an email contains the word "free", what is the probability that it is Spam? We want to find $P(A∣B)$.

In [None]:
# Given values
P_Spam = 0.20
P_not_Spam = 1 - P_Spam # P(~A)

P_Free_given_Spam = 0.80 # P(B|A)
P_Free_given_not_Spam = 0.10 # P(B|~A)

# Evidence Probability P(Free) using the Law of Total Probability
P_Free = (P_Free_given_Spam * P_Spam) + (P_Free_given_not_Spam * P_not_Spam) # P(B)

# Bayes theorem to find P(Spam | Free) - P(A|B)
P_Spam_given_Free = (P_Free_given_Spam * P_Spam) / P_Free

print(f"The probability that an email is Spam given it contains 'free' is: {P_Spam_given_Free}")

The probability that an email is Spam given it contains 'free' is: 0.6666666666666666


### Bayes' Theorem is a fundamental concept in probability theory that describes how to update the probability of a hypothesis based on new evidence.

Here's a breakdown of the formula and its components:

$P(A∣B) = \frac{P(B∣A) \cdot P(A)}{P(B)}$

*   **$P(A∣B)$ (Posterior Probability):** This is what you want to find – the probability of hypothesis A being true given the evidence B.
*   **$P(B∣A)$ (Likelihood):** This is the probability of observing the evidence B given that hypothesis A is true.
*   **$P(A)$ (Prior Probability):** This is your initial belief about the probability of hypothesis A being true before you see any evidence.
*   **$P(B)$ (Evidence Probability):** This is the probability of observing the evidence B, regardless of whether hypothesis A is true or not. It can be calculated as $P(B) = P(B∣A) \cdot P(A) + P(B∣\neg A) \cdot P(\neg A)$, where $\neg A$ is the complement of A.

In essence, Bayes' Theorem allows you to revise your initial probability of an event (prior) in light of new data (evidence) to get a revised and updated probability (posterior). It's widely used in various fields, including statistics, machine learning, and medical diagnosis.

### 4. Example: Medical Test Problem

Suppose:

1% of people have a disease (P(D) = 0.01).

Test is 99% accurate:

If diseased → 99% positive (P(Pos|D) = 0.99)

If healthy → 5% false positive (P(Pos|¬D) = 0.05)

 If a patient tests positive, what is the probability they actually have the disease?

In [None]:
# Given values
P_D = 0.01
P_notD = 1 - P_D

P_Pos_given_D = 0.99
P_Pos_given_notD = 0.05

# Total probability of positive
P_Pos = P_Pos_given_D*P_D + P_Pos_given_notD*P_notD

# Bayes theorem
P_D_given_Pos = (P_Pos_given_D * P_D) / P_Pos
print(f"The probability of having the disease given a positive test is: {P_D_given_Pos}")

The probability of having the disease given a positive test is: 0.16666666666666669


 Output will show something around 16% — surprising!
Even though the test is 99% accurate, most positives are false alarms because the disease is rare.

The **Law of Total Probability** states that if you have a set of mutually exclusive and exhaustive events (like A and its complement $\neg A$), the probability of an event B can be calculated by summing the probabilities of B occurring under each of those events, weighted by the probability of each event.

In terms of generic events A and B, the formula is:

$P(B) = P(B∣A) \cdot P(A) + P(B∣\neg A) \cdot P(\neg A)$

Where:
*   $P(B)$ is the total probability of event B occurring.
*   $P(B∣A)$ is the probability of event B occurring given that event A has occurred.
*   $P(A)$ is the probability of event A occurring.
*   $P(B∣\neg A)$ is the probability of event B occurring given that event A has *not* occurred ($\neg A$ is the complement of A).
*   $P(\neg A)$ is the probability of event A not occurring (which is $1 - P(A)$).