<a href="https://colab.research.google.com/github/thedarredondo/data-science-fundamentals/blob/main/Unit2/Unit2ExercisesSF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unit 2 Exercises: Bayesian Building Blocks

This first set of exercises focuses on conceptual understanding of the three parts of bayesian statistics we'll manipulate the most: the prior, likelihood, and posterior.

These vocabulary words will help us categorize and explain all statistical models, even models that don't fit inside the standard bayesian framework.

**Task1**:

Why do we make guesses? In other words, what is the benefit of trying to predict something we are uncertain about?

Making guesses gives valuable information, even if we don't have complete certainty of a result. Probability distributions show the inherent uncertainty of a result, but at the same time allow us to use different heuristics (such as mean, median, mode) to come up with a generalized result. At the end of the day, these ways allow us to come up with a good idea of what the future may look like, which is a lot better than nothing at all.

**Task2**:

Is is it possible to make a guess/prediction without making an assumption?

If yes, then give an example of such a guess, and state whenever that guess would be useful or not.

If no, then briefly justify why we need assumptions to make predictions.

No, we need assumptions to make predictions. Predictions imply a level of uncertainty. In order to deal with this, we have to make assumptions about what models and parameters to use in order to represent a future scenario. Whether we include a factor or don't, we are making an assumption about the factor's importance in our predictive analysis.

**Task3**:

Should we use all the available information we have to make a guess/prediction? Justify your answer.

I would say yes. While there is always uncertainty, real-life scenarios have a number of known inputs that can help make predictions more accurate. We can look at just past free throws, but that would leave out important factors such as morale, injuries, whether it's a home game, etc. that might affect the result of the free throw. Ultimately, we currently have no way to create a 1:1 model of reality, nor do our computing resources allow for it. Thus, we must choose the most important and quantifiable factors within the constraints of our data collection/computing resources to make a prediction.

**Task4**:

What is a prior? How are priors related to
- context?
- assumptions?
- predictions?

A prior is a probability distribution made before taking account of some data. We make assumptions based on the context of the situation we are trying to analyze. We also make assumptions in order to determine which model to use for a prior. In particular, we have to decide what inputs are important to give a prior based on the context of the situation. The prior takes these inputs and makes predictions.

**Task5**:

What is a likelihood? How are likelihoods related to:

- context?
- assumptions?
- predictions?

A likelihood is a measure of the amount of times a certain event that has occured. It is a data set that we can use to make predictions. Its power comes from the fact that it can be provided with context from a prior. A prior gives us an idea of the distribution that the data set will follow. We make the assumption that the prior is helpful enough to be used to make predictions based on past events.

**Task6**:

What is a posterior? How are posteriors related to:

- context?
- assumptions?
- predictions?

A posterior is the combination of a prior and likelihood. It is essentially a model of our uncertainty that we can use to make predictions. The context for the posterior comes from the prior and likelihood. We make assumptions about the compatibility about the likelihood and posterior used to make the prior. The prior can be used as a tool to make predictions based on everything we know about a certain topic.

**Task7**:

Why would anyone want to define a prior and a likelihood in order to make a prediction? In other words, what's the point of using a likelihood and a prior to form a posterior?

[*write your answer here*]

## Bayes' Rule Math

The following exrcises will be graded for completion, with no accuracy component. That said, correct answers below will replace mistakes in tasks 1-7.



### Mathematical Framing

In this series of exercises, we'll calculate a probability using the full version of Bayes' Rule.

The version seen in the notes, $p(θ|y) ∝ p(y|θ)p(θ)$, ignores the normalizing constant found in the full equation: $p(θ|y) = \frac{p(y|θ)p(θ)}{p(y)}$.

As stated in the notes, in practical applications we don't need to worry too much about $p(y)$, AKA the marginal likelihood, AKA the prior predictive density, AKA the normalizing constant. And when we do, we'll approximate like we do everything else.

But we these exercises are closer to theoretical abstraction, rather than practicality.

So why do them?

These exercises will hopefully help you gain additional inuition for probability, and how it behaves.

As you work through the exercises, consider $p(y)$, the  prior predictive density, and why using it to divide $p(y|θ)p(θ)$ gurantees we get a probability.

Additonaly, wonder about:
- the likelihood $p(y|θ)$
- the prior $p(θ)$,
- why multiplying the likelihood and piror (almost) gives us the posterior $p(θ|y)$.

### Problem Setting

Imagine we have a bag of red and white marbles, identical in every other way. Let's assume there are 4 total marbles, we can't see inside the bag, and when we grab a ball from the bag, we replace it and shake the bag to scramble the balls.

Additionally:

- we draw three balls in this order: red-white-red. Call these the data, $y$. Remember, we replaced the ball and shook the bag between each draw.
- we are interested in finding the true proportion of red balls in the bag, called $θ$

**Task8**:

Write out all the possible color compositions of the marbless in the bag, before we observed our data $y$.

Let each of these possible color compositions be a possible $θ$, or true proportion of red marbles.

[*write your answer here*]

**Task9**:

Which color compositions are possible after seeing the data $y$?

[*write your answer here*]

**Task10**:

How many ways can you select red-white-red, assuming that there are 2 red marbles and 2 white marbles?

[*write your answer here*]

**Task11**:

How many different ways can you select three balls so that order matters, given that there are 2 red marbles and 2 white marbles?

[*write your answer here*]

**Task12**:

What's the probablity you select red-white-red, given that there are 2 red marbles and 2 white marbles?

Stated differently, Find the likelihood $p(y|θ)$, where $θ=RRWW$

[*write your answer here*]

**Task13**:

Find:

- $p(y|WWWW)$
- $p(y|RWWW)$
- $p(y|RRWW)$
- $p(y|RRRW)$
- $p(y|RRRR)$

[*write your answer here*]

**Task14**:

Find the probablity of getting red-white-red, $p(y)$

[*write your answer here*]

**Task15**:

Given that all color compositions are equally likely, and after observing  a draw of red-white-red, find the probability that there are two red marbles and two white marbles in the bag.

In other words, find $p(θ|y)$, where $θ=RRWW$.

[*write your answer here*]

**Task16**:

Story time: The marble factory produces bags of four marbles. They want to make red marbles rare, so that people will get excited about them.  Therefore, for each 1 bag containing four red, they made 2 that contain three red, 3 that contain two red, 4 that contain one red, and 5 that contain zero red.

With this new prior information, find $p(θ|y)$, where $θ=RRWW$.

[*write your answer here*]

**Task17**:

Write down similarities and differences between this marble example, and the VIctor Wembanyama FT example.

[*write your answer here*]