# Probability Theory

## Sets Theory:
- The set of outcomes S of an experiment is the set of all possible outcomes of the experiment
- $S \cup T$: the Unioin of S and/or T
- $S \cap T$: the Intercesction of S and T
- $T \subset S$: T is the Subset of S
- $S \cup T^c$: Disjoin

### Example: Deck of 52 cards (Sample Space S):
- A is an Ace card = Subset A
- B is a Heart Card = subset B
- Intercect of A and B ($A \cap B$) is Ace of Hearts
- Union of A and B ($A \cup B$) is either Ace or Heart

## Definition of Probability:
- Naive definition of Probability of A: $P(A) = \frac{|A|}{|S|}$ 
- Example: Which is a more likely sum of two dice: 11 or 12?
    - 11 -> (5,6) (6,5)
    - 12 -> (6,6)
    - 11 is twice as likely

## General Definition of Probability:
- $0 \leq P(A) \leq 1$
- Probability of an event is always positive $ P(A) \geq 0$
- If two events exclude eachother, then $P(A or B) = P(A) + P(B)$
- Probability that an event will happen and that it will not happen: $P(A or not A) = P(A) + P(not A) = 1$

---

# Probability Functions
- Cumulative Probability Function
- Probability Mass Function
- Probability Density Function

## Cumulative Distribution Function: CDF
- $CDF$ of Random Varibale $X$ is defined as probability of the event {$X \leq x$}
- $CDF = P(X \leq x)$: Probability that $X$ takes on value in the set {$-\infty, x$}
- Example: $CDF(x)$ of the random varibale $X$ defined as the number of heads in three tosses of a fair a coin, and takes only the values {$0,1,2,3$}
    - $cdf(0-1)$ -> $\frac{1}{8}$
    - $cdf(1-2)$ -> $\frac{1}{2}$
    - $cdf(2-3)$ -> $\frac{7}{8}$
    - $cdf(3-4)$ -> $1$

## Probability Mass Function: PMF
- When a random vriable $x$ is discrete, assign a probability positive number to each value that $x$ can take and then get probability distribution for $x$
- $PMF=P(X=x)$
- Example: $PMF(x)$ of the random variable $X$ defined as the number of heads in three tosses of a fair coin
    - $pmf(0)$ ->  $\frac{1}{8}$
    - $pmf(1)$ ->  $\frac{3}{8}$
    - $pmf(2)$ ->  $\frac{3}{8}$
    - $pmf(3)$ ->  $\frac{1}{8}$

## Probability Density Function: PDF
- If the cumulative distribution function has the first derivative, we can define *"probability density function"* as the first derivative of the cumulative distribution function $CDF(x) = F(x)$ -> $PDF = \frac{dF(x)}{dx}$
- $PDF$ represent probability that $X$ is in small interval of the vicinity $x$: {$x \leq X \leq x+h$}

---

# Probability Distributions Used in ML
- Joint
- Marginal
- Conditional

## Joint Probability
- Dependence between two events: Probability of $A_{i}$ and $B_{j}$ from $i^{th}$ and $j^{th}$ experiments happening together is denoted by $P(A_{i} B_{j})$
$$\sum_{i}^\infty \sum_{j}^\infty P(A_{i}, B_{j}) = 1$$

|        | Y=1     | Y=0    |
| -------|:-------:| ------:|
| X=1    | 5/100   | 20/100 |
| X=0    | 3/100   | 72/100 |


## Marginal Probability
- Probability distribution over a subset of variables is known as *marginal probability distribution*
- For discrete random variables $X$ and $Y$, the marginal $PMF$ of $X$ is $$P(X = x) = \sum_{y} P(X = x, Y= y)$$
- In marginal $PMF$ of $X$, $X$ is viewed individually rather tahn jointly with $Y$
- The operation of summing over the possible values of Y in order to convert the joint $PMF$ into the marginal $PMF$ of $X$ is known as *marginalizing out Y*

|        | Y=1     | Y=0    |Total  |
| -------|:-------:| ------:|------:|
| X=1    | 5/100   | 20/100 | 25/100|
| X=0    | 3/100   | 72/100 | 75/100|
| Total  | 8/100   | 92/100 | 100/100|

## Conditional Probability
- $P(B|A)$ -> Given A, Probability of B
- $A \cap B$: the Intercesction of A and B
- $P(B|A) = \frac{number \space of \space elements \space of \space A \cap B)}{number \space of \space elements \space of \space A}$
- $P(B|A) = \frac{P(A \cap B)}{P(A)}$
- $P(A \cap B) = P(A) * P(B|A)$

### Example: Elder is a girl vs at least one is a girl
- Probability that both children are girls given conditions 
- $P(both \space girls | at \space least \space one \space girl) = \frac {P(both \space girls, at \space least \space one \space girl)}{P(at \space least \space one \space girl)} = \frac{\frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}$ 
- $P(both \space girls | elder \space is \space girl) = \frac {P(both \space girls, elder \space is \space girl)}{P(elder \space is \space girl)} = \frac{\frac{1}{4}}{\frac{1}{2}} = \frac{1}{2}$
- {GG,GB,BG,BB}: At least one girl -> {GG,GB,BG} -> 1/3
- {GG,GB,BG,BB}: Elder is a girl -> {GG,GB} -> 1/2

---

---