### Joint Probability

You’re flipping a coin and rolling a die at the same time. You want to know the chance (probability) of two things happening together:
- The coin shows heads $ P(A) $.
- The die rolls a 6 $ P(B) $.

These two things are happening at the same time, so we are finding the joint probability.

Joint Probability $=$ Chance of heads $\times$ Chance of rolling a 6 which is $ P(A $ and $B)$ or $ P(A \cap B)$

\begin{aligned}
P(A \cap B) &= \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} \\
\end{aligned}

### Marginal Probability

You’re flipping a coin and rolling a die, but now you only care about one thing: the chance (probability) of rolling a 6, no matter what happens with the coin.

This is called **marginal probability** because you’re focusing on just one event and ignoring everything else.

So $P(B) = \frac{1}{6} $

### Joint Probability Tables

Imagine a survey of 100 people asked two questions:
- Do you prefer Coffee or Tea?
- Do you add Milk or No Milk?

Here are the survey results:

- 40 people like coffee with milk.
- 20 people like coffee without milk.
- 25 people like tea with milk.
- 15 people like tea without milk.

Joint Probability Table:

| | Milk $P(M)$ | No Milk $P(NM)$ | Total |
|---|---|---|---|
| Coffee $P(C)$ |0.40|0.20|0.60|
| Tea $P(T)$ |0.25|0.15|0.40|
| Total |0.65|0.35|1.00|

- Joint Probability:
    - P(Coffee and Milk) = 0.40
    - P(Tea and No Milk) = 0.15
- Marginal Probability:
    - P(Coffee) = 0.60 (sum of the Coffee row: 0.40 + 0.20)
    - P(Milk) = 0.65 (sum of the Milk column: 0.40 + 0.25)

### The Chain Rule of Probability

In ML, it is often required to calculate the joint probability of multiple random variables, this is where the chain rule of probability is helpful. The chain rule for the probability of n random variables can be expressed as:

\begin{aligned}
P(X_1, X_2,\dots,X_n) &= \prod_{i=1}^{n} P(X_i|X_1^{i-1}) = P(X_1) \cdot P(X_2|X_1) \cdot P(X_3|X_1,X_2) \dots P(X_n|X_{n-1},X_{n-2}, \dots, X_1) \\
\end{aligned}

For example, if there are three random variables X, Y, and Z, this formula can be written as:

\begin{aligned}
P(X,Y,Z) &= P(X|Y,Z) \cdot P(Y,Z) = P(X|Y,Z) \cdot P(Y|Z) \cdot P(Z) \\
\end{aligned}

The chain rule breaks down a joint probability into the product of smaller joints probabilities and conditional probabilities.

For example,

In a company of 100 people, 16 people are working on ML Project, if we pick 3 people at random, what is the probability that none of them have been working on the ML project?

- Let $A_i = $ the event of a person who has not been working on the ML project 
- Using chain rule: $ P(A_3,A_2,A_1) = P(A_3|A_2,A_1).P(A_2|A_1).P(A_1) $
- $ P(A_1) = $ the probability of picking a random person who has not been working on the ML project

\begin{aligned}
P(A_1) &= \frac{100-16}{100} = \frac{84}{100} \\
P(A_2|A_1) &= \frac{99-16}{99} = \frac{83}{99} \\
P(A_3|A_2,A_1) &= \frac{98-16}{98} = \frac{82}{98} \\
P(A_3,A_2,A_1) &= \frac{84}{100} . \frac{83}{99} . \frac{82}{98} = 0.5893
\end{aligned}