#Unit 2 Exercises: Bayesian Building Blocks

This first set of exercises focuses on conceptual understanding of the three parts of bayesian statistics we'll manipulate the most: the prior, likelihood, and posterior.

These vocabulary words will help us categorize and explain all statistical models, even models that don't fit inside the standard bayesian framework.

**Task1**:

Why do we make guesses? In other words, what is the benefit of trying to predict something we are uncertain about?

We make guesses because all the decisions we make that impact the future are guided by what we expect to happen. If we want to succeed in something, we have to predict the impact of different approaches will work, and if we want to avoid something bad, we have to guess what will cause it to happen. To take the example in the notes, a basketball team would use a player's predicted free throw percentage to determine (in part) whether to hire them and how much to pay them.

**Task2**:

Is is it possible to make a guess/prediction without making an assumption?

If yes, then give an example of such a guess, and state whenever that guess would be useful or not.

If no, then briefly justify why we need assumptions to make predictions.

It is not possible to make a prediction without an assumption. This is because any prediction requires you to choose one potential outcome over another. Any choice requires you to either assume that there is some relevant information and make it based on that or to assume that no information is relevant to the choice. Thus, either way you make an assumption.

**Task3**:

Should we use all the available information we have to make a guess/prediction? Justify your answer.

We should not necessarily use all available information to make a prediction. It is possible that some information is low-quality and will mislead the predictor. For example, if you want to predict the weather, data such as current storm movements, cloud formations, wind patterns should be used, but a random weather report from a month ago is probably too old to be useful. Another risk of including too much information is overfitting your model. You could end up with a model that perfectly records all the random variation in all your information but fails to see the broader trend and thus cannot make accurate predictions.

**Task4**:

What is a prior? How are priors related to
- context?
- assumptions?
- predictions?

A prior is the probability before new information is taken into account. For example, if you are trying to predict whether someone has COVID-19 based on the result of a test, the prior might be the chance that a random person had COVID-19 without any knowledge of a test result It requires assumptions to determine how to interpret context (e.g. that a person has an average chance of getting COVID-19). It can be used to make predictions itself or updated with more information to form a posterior that can then be used to make predictions.

**Task5**:

What is a likelihood? How are likelihoods related to:

- context?
- assumptions?
- predictions?

A likelihood is the probability that something is true given that the thing you are trying to predict is true. For example, in the COVID-19 example from above, the likelihood would be the chance that you have a positive test given that you have COVID-19. It is context that informs your predictions by modifying your prior to get your posterior.

**Task6**:

What is a posterior? How are posteriors related to:

- context?
- assumptions?
- predictions?

A posterior is the updated probability after you have more information than you had for the prior. To continue the example above, the posterior would be the probability that you have COVID-19 given that you have a positive test. It is the most accurate model to make predictions with and incorporates all relevant context.

**Task7**:

Why would anyone want to define a prior and a likelihood in order to make a prediction? In other words, what's the point of using a likelihood and a prior to form a posterior?

The prior allows pre-existing information to be incorporated in your prediction, and the likelihood allows you to update that prediction with new data. The posterior they form then gives the most accurate prediction by combining both.

## Bayes' Rule Math

The following exrcises will be graded for completion, with no accuracy component. That said, correct answers below will replace mistakes in tasks 1-7.



### Mathematical Framing

In this series of exercises, we'll calculate a probability using the full version of Bayes' Rule.

The version seen in the notes, $p(θ|y) ∝ p(y|θ)p(θ)$, ignores the normalizing constant found in the full equation: $p(θ|y) = \frac{p(y|θ)p(θ)}{p(y)}$.

As stated in the notes, in practical applications we don't need to worry too much about $p(y)$, AKA the marginal likelihood, AKA the prior predictive density, AKA the normalizing constant. And when we do, we'll approximate like we do everything else.

But we these exercises are closer to theoretical abstraction, rather than practicality.

So why do them?

These exercises will hopefully help you gain additional inuition for probability, and how it behaves.

As you work through the exercises, consider $p(y)$, the  prior predictive density, and why using it to divide $p(y|θ)p(θ)$ gurantees we get a probability.

Additonaly, wonder about:
- the likelihood $p(y|θ)$
- the prior $p(θ)$,
- why multiplying the likelihood and piror (almost) gives us the posterior $p(θ|y)$.

###Problem Setting

Imagine we have a bag of red and white marbles, identical in every other way. Let's assume there are 4 total marbles, we can't see inside the bag, and when we grab a ball from the bag, we replace it and shake the bag to scramble the balls.

Additionally:

- we draw three balls in this order: red-white-red. Call these the data, $y$. Remember, we replaced the ball and shook the bag between each draw.
- we are interested in finding the true proportion of red balls in the bag, called $θ$

**Task8**:

Write out all the possible color compositions of the marbless in the bag, before we observed our data $y$.

Let each of these possible color compositions be a possible $θ$, or true proportion of red marbles.

4 red, 0 white <br>
3 red, 1 white <br>
2 red, 2 white <br>
1 red, 3 white <br>
0 red, 4 white

**Task9**:

Which color compositions are possible after seeing the data $y$?

3 red, 1 white <br>
2 red, 2 white <br>
1 red, 3 white

**Task10**:

How many ways can you select red-white-red, assuming that there are 2 red marbles and 2 white marbles?

1

**Task11**:

How many different ways can you select three balls so that order matters, given that there are 2 red marbles and 2 white marbles?

8

**Task12**:

What's the probablity you select red-white-red, given that there are 2 red marbles and 2 white marbles?

Stated differently, Find the likelihood $p(y|θ)$, where $θ=RRWW$

0.125

**Task13**:

Find:

- $p(y|WWWW)$
- $p(y|RWWW)$
- $p(y|RRWW)$
- $p(y|RRRW)$
- $p(y|RRRR)$

0 <br>
0.046875 <br>
0.125 <br>
0.140625 <br>
0

**Task14**:

Find the probablity of getting red-white-red, $p(y)$

0.0625

**Task15**:

Given that all color compositions are equally likely, and after observing  a draw of red-white-red, find the probability that there are two red marbles and two white marbles in the bag.

In other words, find $p(θ|y)$, where $θ=RRWW$.

0.4

**Task16**:

Story time: The marble factory produces bags of four marbles. They want to make red marbles rare, so that people will get excited about them.  Therefore, for each 1 bag containing four red, they made 2 that contain three red, 3 that contain two red, 4 that contain one red, and 5 that contain zero red.

With this new prior information, find $p(θ|y)$, where $θ=RRWW$.

0.148

**Task17**:

Write down similarities and differences between this marble example, and the VIctor Wembanyama FT example.

The two examples used different data. In this example, we were trying to predict the make up of the bag given what we drew from it. In the FT example, we were trying to predict the future free throw percentage given the past free throw percentage. In both examples we used Bayes' rule to update a prior to get a posterior.