<a href="https://colab.research.google.com/github/paarthbamb/dataScience/blob/main/Unit2/Unit2ExercisesSF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Unit 2 Exercises: Bayesian Building Blocks

This first set of exercises focuses on conceptual understanding of the three parts of bayesian statistics we'll manipulate the most: the prior, likelihood, and posterior.

These vocabulary words will help us categorize and explain all statistical models, even models that don't fit inside the standard bayesian framework.

**Task1**:

Why do we make guesses? In other words, what is the benefit of trying to predict something we are uncertain about?

We make guesses because we are uncertain about the future or about unobserved aspects of the world. The benefit of trying to predict something we are uncertain about is that it allows us to make informed decisions and gain a better understanding of the underlying processes. Even if a guess is not perfectly accurate it can still be useful for reducing the impact of uncertainty.

**Task2**:

Is is it possible to make a guess/prediction without making an assumption?

If yes, then give an example of such a guess, and state whenever that guess would be useful or not.

If no, then briefly justify why we need assumptions to make predictions.

No. Every guess requires an assumption of some sort. To give a few examples, if I tried to estimate the number of cells in Yale's brain, I would be assuming he has a brain in the first place. Similarly, if I were to guess the number of jelly beans in a jar, I would be assuming that the jar contained jelly beans that I couldn't see.

**Task3**:

Should we use all the available information we have to make a guess/prediction? Justify your answer.

If our goal is to make the guess as accurate as possible, yes, we should certainly use all the available information. More information can lead to a more accurate prediction by providing a more complete picture of the situation. However, it is important that we make the distinction between relevant and useful irrelevant. Using irrelevant information can introduce noise and potentially lead to worse predictions.

**Task4**:

What is a prior? How are priors related to
- context?
- assumptions?
- predictions?

A prior is the initial best guess or what you already believe about something before you see any new information. When you set up a prior, you're making some assumptions based on some existing knowledge which guide your starting point. Then, this prior combines with what the new data tells you (that's the likelihood part) to give you an updated belief, called the posterior. This posterior is what you use to make your final predictions. So, your initial guess (the prior), shaped by your context and assumptions, plays a role in where your predictions end up.



**Task5**:

What is a likelihood? How are likelihoods related to:

- context?
- assumptions?
- predictions?

A likelihood measures how well certain parameter values explain observed data. Probability predicts data given parameters, but likelihood evaluates parameters given data. Context matters because it defines what the data and parameters mean, while assumptions shape the model and the form of the likelihood. Predictions come after, since once parameters are estimated from likelihood, they can be used to forecast future outcomes.

**Task6**:

What is a posterior? How are posteriors related to:

- context?
- assumptions?
- predictions?

A posterior is the updated probability of a parameter after seeing data. It combines prior beliefs about the parameter with the likelihood of the observed data. Context matters because it tells us what the parameter and data represent in the real world. Assumptions matter because the choice of prior and the model for the data determine the exact form of the posterior. Predictions come from the posterior because once we update our beliefs about the parameter, we can use that distribution to forecast future outcomes with uncertainty built in.

**Task7**:

Why would anyone want to define a prior and a likelihood in order to make a prediction? In other words, what's the point of using a likelihood and a prior to form a posterior?

The point of combining a prior with a likelihood is to make predictions that use both what we already know and what we’ve observed. The prior encodes background knowledge or reasonable beliefs before seeing data, while the likelihood shows how well different parameter values explain the data. By putting them together, the posterior balances old information with new evidence. This makes predictions more informed and realistic, especially when data is limited or uncertain, because the prior prevents us from relying only on noisy observations and the likelihood ensures we still learn from the data.

## Bayes' Rule Math

The following exrcises will be graded for completion, with no accuracy component. That said, correct answers below will replace mistakes in tasks 1-7.



### Mathematical Framing

In this series of exercises, we'll calculate a probability using the full version of Bayes' Rule.

The version seen in the notes, $p(θ|y) ∝ p(y|θ)p(θ)$, ignores the normalizing constant found in the full equation: $p(θ|y) = \frac{p(y|θ)p(θ)}{p(y)}$.

As stated in the notes, in practical applications we don't need to worry too much about $p(y)$, AKA the marginal likelihood, AKA the prior predictive density, AKA the normalizing constant. And when we do, we'll approximate like we do everything else.

The following exercises are closer to theoretical abstraction, rather than practicality.

So why do them?

These exercises will (hopefully) help you gain additional intuition for probability, and how it behaves.

As you work through the exercises, consider $p(y)$, the  prior predictive density, and why using it to divide $p(y|θ)p(θ)$ gurantees that we get a probability.

Additonaly, wonder about:
- the likelihood $p(y|θ)$
- the prior $p(θ)$,
- why multiplying the likelihood and piror *almost* gives us the posterior $p(θ|y)$.

###Problem Setting

Imagine we have a bag of red and white marbles, identical in every other way. Each individual marble is either entirely white or entirely red.

Let's assume there are 4 total marbles, that we can't see inside the bag, and when we grab a ball from the bag, we replace it and shake the bag to scramble the balls.

Additionally:

- we draw three balls in this order: red-white-red. Call these the data, $y$. Remember, we replaced the ball and shook the bag between each draw.
- we are interested in finding the true proportion of red balls in the bag, called $θ$

**Task8**:

Write out all the possible color compositions of the marbles in the bag, before we observed our data $y$ (which are the marbles drawn in the order of red-white-red).

Let each of these possible color compositions be a possible $θ$, or true proportion of red marbles.

(RRRR),(RRRW),(RRWW),(RWWW),(WWWW) with theta = 0, .25, .5, .75, and 1.

**Task9**:

Which color compositions are possible after seeing the data $y$?

(RWWW),(RRWW),(RRRW) with theta = .25, .5, and .75.

**Task10**:

How many ways can you select red-white-red, assuming that there are 2 red marbles and 2 white marbles?

2 x 2 x 2 = 8 ways assuming we treat each marble as a distinct object and care about more than just color.

**Task11**:

How many different ways can you select three balls so that order matters, given that there are 2 red marbles and 2 white marbles?

2 x 2 x 2 = 8 ways



**Task12**:

What's the probablity you select red-white-red, given that there are 2 red marbles and 2 white marbles?

Stated differently,

Find the likelihood $p(y|θ)$, where $θ=RRWW$

P(R) = 2/4, P(W) = 2/4, so P(RWR) = (2/4)^3 = 1/8.

**Task13**

If--before seeing the data--all color compostions are equally likely,

then what is $p(\theta)$, if $\theta = RRWW$?

P(RRWW) = 1/5 = 0.2.

**Task14**:

Find:

- $p(y|WWWW)$
- $p(y|RWWW)$
- $p(y|RRWW)$
- $p(y|RRRW)$
- $p(y|RRRR)$

0,
3/64,
(1/4)(3/4)(1/4) = 3/64,
1/8,
9/64,
(3/4)(1/4)(3/4) = 9/64,
0

**Task15**

Assume that each color compostions is equally likely before seeing the data.

Find:

- $p(y|WWWW)p(WWWW)$
- $p(y|RWWW)p(RWWW)$
- $p(y|RRWW)p(RRWW)$
- $p(y|RRRW)p(RRRW)$
- $p(y|RRRR)p(RRRR)$


0,
(3/65)(1/5) = 3/320,
(1/8)(1/5) = 1/40,
(9/64)(1/5) = 9/320,
0

**Task16**:

Find the probablity of getting red-white-red, $p(y)$, given each possible color combination is equally likely.

0+3/320+1/40+9/320+0 = 1/16

**Task17**:

After observing a draw of red-white-red, find the probability that there are two red marbles and two white marbles in the bag. Assume that all color compositions were equally likely before the draw.

In other words, find $p(θ|y)$, where $θ=RRWW$.

((1/8)(1/5))/1/16 = 0.4

**Task18**:

Story time: The marble factory produces bags of four marbles. They want to make red marbles rare, so that people will get excited about them.  Therefore, for each 1 bag containing four red, they made 2 that contain three red, 3 that contain two red, 4 that contain one red, and 5 that contain zero red.

With this new prior information, find $p(θ|y)$, where $θ=RRWW$.

**NOTE**: You MUST calculate  a new marginal likelihood $p(y)$ with the new prior information.

First we can convert counts to prior probabilities, total is 1+2+3+4+5 = 15, p(RRRR) = 1/15, p(RRRW) = 2/15, p(RRWW) = 1/5, p(RWWW) = 4/15, p(WWWW) = 1/3. Marginal p(y) = 9/160. Thus the posterior is ((1/8)(1/5))/(9/160) = 44.4%.

**Task19**:

Write down similarities and differences between this marble example, and the VIctor Wembanyama FT example.

Both involve estimating an unknown probability based on observed outcomes. In each case, we use a likelihood to show how likely different values are given the data, and we combine that with a prior to form a posterior that updates our belief. Both also show how more data increases our confidence in the estimate. The marble example has a small, finite number of possible outcomes (like 0-4 red marbles), while Wembanyama's true free throw percentage is continuous between 0 and 1. Marble draws are perfectly independent with replacement, whereas free throws may be influenced by fatigue, pressure, etc. Priors in the marble example can be counted exactly based on how the factory makes the bags, but Wembanyama's prior comes from his previous Euroleague performance and is more subjective.
