# Bayes' Theorem Overview

This entire course will be about **Bayes' Theorem**.$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$

Where:
- A - hypothesis
- B - evidence
- P(A) - prior. In the absence of any data, what is your belief in the hypothesis?
- P(B)- evidence. How likely was the evidence that you observed?
- P(B|A) - likelihood. The likelihood of observing the evidence assuming the hypothesis, i.e., the fit of your data to the hypothesis.
- P(A|B) - posterior. 

The evidence \( P(B) \) is usually expressed as: $ P(B) = P(B|A) \cdot P(A) + P(B|\overline{A}) \cdot P(\overline{A})$. This represents the sum over all possible hypotheses.

#### 1. Prior P(A):
* This represents your initial belief or knowledge about a hypothesis (or event) **before** observing any new data.
* In Bayesian terms, the prior is the probability distribution that expresses your beliefs about the hypothesis \( A \) before you get any new evidence.
* For example, if you're trying to estimate whether it will rain tomorrow, your prior could be the general historical probability of rain on any given day in your location.

#### 2. Likelihood P(B|A):
* This term represents the likelihood of observing the new evidence (\( B \)) given that the hypothesis (\( A \)) is true.
* It tells you how probable the new data (\( B \)) is, assuming the hypothesis (\( A \)) holds. It quantifies the fit of the data to the hypothesis.
* For example, if the hypothesis is that it will rain tomorrow, the likelihood could be the probability of observing certain weather patterns (like cloud cover or humidity) if it were actually going to rain.

#### 3. Evidence P(B):
* This is the overall probability of observing the evidence, regardless of the hypothesis.
* It acts as a normalizing constant to ensure that the posterior probabilities sum to 1. Essentially, it’s a factor that makes the probabilities in the equation valid by ensuring they all add up properly.
* The evidence tells you how likely the data is, across all possible hypotheses.

#### 4. Posterior P(A|B):
* This is the key result of Bayes’ theorem. It represents the updated probability of the hypothesis (\( A \)) after incorporating the new evidence (\( B \)).
* The posterior is what you’re ultimately interested in: given the evidence, what is the updated belief or probability that the hypothesis is true?



### Bayesian Inference on Students Wearing Glasses

Let:

- **x** be the proportion of students who wear glasses.
- **y** be the observation of whether a student wears glasses (`y = 1`) or not (`y = 0`).

We want to compute $ P(x \mid y)$ and we assume $P(x) = 1$ (i.e uniform distribution)

The likelihoods are given by :  
  $$
  P(y = 0 \mid x) = 1 - x
  $$
  $$
  P(y = 1 \mid x) = x
  $$


#### Example 1: Observe one person **not** wearing glasses ($y = 0$)

$$
P(x \mid y = 0) = \frac{P(y = 0 \mid x) \cdot P(x) }{\int_0^1 P(y = 0 \mid x) \, dx}
    = \frac{(1 - x) \cdot 1}{\int_0^1 (1 - x) \, dx} = 2 \cdot (1-x)
$$


#### Example 2: Observe **two** people not wearing glasses ($y = 0, 0$)

$$
P(x \mid y_1 = 0, y_2 = 0) 
    = \frac{P(y_1=0,y_2=0 \mid x) \cdot P(x)}{P(y_1 = 0, y_2 = 0)} 
    = \frac{P(y_1=0,y_2=0 \mid x) \cdot P(x)}{\int_0^1 P(y_1=0,y_2=0 \mid x) \cdot dx} 
    = \frac{P(y_1 = 0 \mid x) \cdot P(y_2=0 \mid x) \cdot P(x)}{\int_0^1 P(y_1 = 0 \mid x) \cdot P(y_2=0 \mid x) \, dx}
    = \frac{(1 - x)^2}{\int_0^1 (1 - x)^2 \, dx}
    = 3 \cdot (1-x)^2
$$

NOTE: 
* $ P(y_1=0, y_2 = 0) \neq P(y_1=0)\cdot P(y_2=0) $
* $P(y_1 = 0, y_2 = 0\mid x) = P(y_1=0 \mid x) \cdot P(y_2=0 \mid x)$
* When we fix an $x$ it means our observations came from the same distribution. In that case the two observations are independant.
* It is possible that $y_1$ and $y_2$ came from different distributions (perhaps we observed a student from a different university who was visiting campus)


#### Example 3: Observe one person **wearing** glasses ($y = 1$)

$$
P(x \mid y = 1) = \frac{P(y = 1 \mid x) \cdot P(x)}{\int_0^1 P(y = 1 \mid x) \, dx}
    = \frac{x}{\int_0^1 x \, dx} = 2 \cdot x
$$


#### Example 4: Observe **two** people wearing glasses ($y = 1, 1$)

$$
P(x \mid y = 1, 1) = \frac{P(y = 1 \mid x)^2 \cdot P(x)}{\int_0^1 P(y = 1 \mid x)^2 \, dx}
    = \frac{x^2}{\int_0^1 x^2 \, dx} = 3 \cdot x^2
$$




#### Example 5: Observe **one** person wearing glasses and another person not wearing glasses ($y = 0, 1$)

$$
P(x \mid y = 0, 1) = \frac{P(y = 0 \mid x) \cdot P(y=1 \mid x)}{\int_0^1 P(y = 0 \mid x) \cdot P(y=1 \mid x) \, dx}
    = \frac{x \cdot (1-x)}{\int_0^1 x \cdot (1-x) \, dx} = 1.5 \cdot x \cdot (1-x)
$$



#### General case: Observe **N** people wearing glasses and **M** people not wearing glasses

$$
P(x \mid y_1 = 0, y_2 = 1, ... y_{M+N} = 0) = \frac{P(y = 0 \mid x)^N \cdot P(y=1 \mid x)^M}{\int_0^1 P(y = 0 \mid x)^N \cdot P(y=1 \mid x)^M \, dx}
    = \frac{x^N \cdot (1-x)^M}{\int_0^1 x^N \cdot (1-x)^M \, dx}
$$