<a id='top'></a>

# CSCI 3022: Intro to Data Science - Fall 2018 Practicum 
***

This practicum is due on Moodle by **11:55pm on Wednesday December 12**. Your solutions to theoretical questions should be done in Markdown/MathJax directly below the associated question.  Your solutions to computational questions should include any specified Python code and results as well as written commentary on your conclusions.  

**Here are the rules:** 

1. All work, code and analysis, must be your own. 
1. You may use your course notes, posted lecture slides, textbooks, in-class notebooks, and homework solutions as resources.  You may also search online for answers to general knowledge questions like the form of a probability distribution function or how to perform a particular operation in Python/Pandas. 
1. This is meant to be like a coding portion of your final exam. So, the instructional team will be much less helpful than we typically are with homework. For example, we will not check answers, help debug your code, and so on.
1. If something is left open-ended, it is because we want to see how you approach the kinds of problems you will encounter in the wild, where it will not always be clear what sort of tests/methods should be applied. Feel free to ask clarifying questions though.
2. You may **NOT** post to message boards or other online resources asking for help.
3. You may **NOT** copy-paste solutions *from anywhere*.
4. You may **NOT** collaborate with classmates or anyone else.
5. In short, **your work must be your own**. It really is that simple.

Violation of the above rules will result in an immediate academic sanction (*at the very least*, you will receive a 0 on the practicum or an F in the course, depending on severity), and a trip to the Honor Code Council.

**By submitting this assignment, you agree to abide by the rules given above.**

***

**Name**:  
Alex Kennedy
***


**NOTES**: 

- You may not use late days on the practicum nor can you drop your practicum grade. 
- If you have a question for us, post it as a **PRIVATE** message on Piazza.  If we decide that the question is appropriate for the entire class, then we will add it to a Practicum clarifications thread. 
- Do **NOT** load or use any Python packages that are not available in Anaconda 3.6. 
- Some problems with code may be autograded.  If we provide a function API **do not** change it.  If we do not provide a function API then you're free to structure your code however you like. 
- Submit only this Jupyter notebook to Moodle.  Do not compress it using tar, rar, zip, etc. 
- This should go without saying, but... For any question that asks you to calculate something, you **must show all work to receive credit**. Sparse or nonexistent work will receive sparse or nonexistent credit.

---
**Shortcuts:**  [Problem 1](#p1) | [Problem 2](#p2) | [Problem 3](#p3)

---

In [7]:
from scipy import stats
import numpy as np 
import statsmodels.api as sm
import pandas as pd
import matplotlib.pyplot as plt
import random
%matplotlib inline

<br>

---
[Back to top](#top)
<a id='p1'></a>

### [35 points] Problem 1: Yahtzee!

**Part A:** You are playing [Yahtzee](https://en.wikipedia.org/wiki/Yahtzee) with your friends. A player's turn in Yahtzee consists of rolling a set of 5 dice. Then the player is given two additional rolls, where they are allowed to re-roll any number of the dice, including potentially all of them or none of them. The goal is to obtain certain combinations of the dice values resulting after the third roll. Different combinations are worth different amounts of points, and the goal of the game is to get as many points as possible.

This game of Yahtzee is a bit unlike any you have ever played before, however. This is because Darth Ketelsen is back, and with her she brought her famous **5-sided dice**. These are fair dice with sides numbered 1-5. So, you are playing Yahtzee with a Sith Lord with 5-sided dice. Indeed, things just got real.

A **straight** in Darth Ketelsen's game consists of 5 values all in a row. For example, the outcome $[1,2,3,4,5]$ is a  straight but the outcome $[1,2,3,4,4]$ is not.

**Do two things:**
1. Compute by hand the probability of rolling a straight in a single roll of all 5 dice. Show all work.
2. Write a simulation to verify the probability that you computed. Run at least 10,000 simulations. 

***
__Part A:__

The probability of rolling a straight in a single roll of all dice is the probability of rolling 5 distinct numbers.  For this to be possible you must roll 5 options for the first die, 4 for the second (removing the one you already rolled), then 3 options, etc. This means there are 5! = 120 ways to roll a straight.

Total, there are $ 5^5 $ possible ways to roll the dice.

This means there is a $\frac{5!}{5^5}$ chance of rolling a straight in one roll.

$ \frac{5!}{5^5} = 0.0384$
***

In [8]:
# randomly roll 5 dice
def roll5dice():
    dice = []
    for _ in range(5):
        dice.append(random.randint(1,5))
    return dice

# checks if the dice are rolled in a straight
def isStraight(dice):
    s = set()
    for die in dice:
        if die in s:
            return False
        else: 
            s.add(die)
    return True

print(120/5**5)

# calculate the probability of rolling a straight with a single roll turn
def prob_straight_single_roll(num_sims):
    straights = 0
    for _ in range(num_sims):
        if(isStraight(roll5dice())):
            straights += 1
    return straights / num_sims
        
print(f"Calculated probability: {120/5**5}")
print(f"Simulated probability: {prob_straight_single_roll(100000)}")

0.0384
Calculated probability: 0.0384
Simulated probability: 0.03905


**Part B:** The goal of this problem is to compute the probability of getting a straight using all three of your rolls, instead of just the single roll approach that you computed in Part A. Here, we'll need to implement a strategy so that after the first roll and after the second roll, we keep the dice that get us closer to a straight and re-roll the dice that are not useful for our straight.

For instance, suppose your first roll comes up $[1,2,3,3,3]$. You really want to get that straight! So, you would follow the strategy of saving the $[1,2,3]$ and re-roll two of the threes, hoping for a 4 and 5 to get the straight. Then, for your third roll, you would save as many of the dice as possible that would be part of a straight, and re-roll any remaining dice.

Finish the function below called `dire_straights` to simulate many complete 3-roll turns, and computes the probability of ending your turn with a straight. The only input to the function should be `ntrial`, an integer for the number of turns to simulate. Remember, each turn consists of 3 rolls.


Then, use your function to estimate the probability of a straight after a full turn of Yahtzee. Use at least 10,000 simulations, and comment on the results.

In [95]:
# build a binary array deciding which dice to keep (the ones that build a straight)
def getkeepersstraight(dice):
    keepers = [0,0,0,0,0]
    # check for two adjacent
    num_dice = len(dice)
    foundAdjacent = False
    min_die, max_die = 0,0
    for i in range(num_dice):
        if foundAdjacent:
            break
        for j in range(i + 1, num_dice):
            if foundAdjacent:
                break
            dice_difference = dice[i] - dice[j]
            if abs(dice_difference) == 1:
                keepers[i] = 1
                keepers[j] = 1
                if dice[i] < dice[j]:
                    min_die = dice[i]
                    max_die = dice[j]
                else:
                    min_die = dice[j]
                    max_die = dice[i]
                    
                foundAdjacent = True
    # add on to adjacent at the ends
    for i in range(len(dice)):
        if dice[i] - min_die == -1:
            min_die = dice[i]
            keepers[i] = 1
        elif dice[i] - max_die == 1:
            max_die = dice[i]
            keepers[i] = 1
    return keepers

# reroll the ones that you aren't keeping
def reroll(dice_set, keepers):
    newdice = []
    for i in range(len(dice_set)):
        if keepers[i] == 0:
            newdice.append(random.randint(1,5))
        else:
            newdice.append(dice_set[i])
    return newdice

def dire_straights(ntrial):
    num_straights = 0
    for trial in range(ntrial):
        foundstraight = False
        turn = 1
        dice = roll5dice()
        # roll 3 times OR until you get a straight
        for turn in range(2):
            if isStraight(dice):
                num_straights += 1
                foundstraight = True
                break
            dice = reroll(dice, getkeepersstraight(dice))
        if isStraight(dice) and not foundstraight:
            num_straights += 1
    return num_straights / ntrial # this is a placeholder

print(f"Probability of getting a straight with strategy: {dire_straights(20000)}")

Probability of getting a straight with strategy: 0.1972


***
__Part B Conclusions:__

The probability of getting a straight on a 3 roll turn is consistently around 20% with the three roll turns on 20k simulations.  Another idea for this strategy would be to save all distinct values. In theory this would give you a higher probability because a straight is built up of all distinct values.  This constrasts the strategy above because in a roll of [1,2,3,3,5] one would save 1, 2, and 3 instead of 1, 2, 3, and 5.  Saving the second set would get you closer to the straight than the first one.  This theory will be tested with the different strategy in part C.
***

**Part C:** Write a simulation to estimate the probability of obtaining a straight if the first roll contains exactly three distinct unique values. For example, a valid first roll could be $[1,5,3,3,3]$ but not $[1,3,3,4,5]$. You are still using the set of 5-sided dice.

In [96]:
def getdistinct(dice):
    keptvalues = set()
    keepers = [0,0,0,0,0]
    for die_index in range(len(dice)):
        if dice[die_index] not in keptvalues:
            keptvalues.add(dice[die_index])
            keepers[die_index] = 1
    return keepers

def distinct_straight_sim(ntrials):
    num_straights = 0
    for trial in range(ntrials):
        # start with the condition which is 3 distinct dice exactly
        dice = [1,2,3,3,3]
        # play the rest of the two turns to see if you can pull a straight
        for turn in range(2):
            # re-roll the dice that aren't distinct
            dice = reroll(dice, getdistinct(dice))
            # test for straight
            if isStraight(dice):
                num_straights += 1
                break
    # return the probability of getting a straight in the trials
    # return num_straights / ntrials
    return num_straights / ntrials

print(f"Simulated probability of getting a straight given 3 distinct starter dice {round(distinct_straight_sim(20000), 3)}")

Simulated probability of getting a straight given 3 distinct starter dice 0.219


**Part D:** Verify your calculation from Part C by hand. Show all work, and comment on whether the two agree.

*Hint: you will need to consider a variety of different cases - what are all the ways you could end up with a straight, given that your first roll contained exactly 3 unique values?*

***
__Part D:__

We are given that the first roll contains 3 distinct numbers on the dice.  This means that there are only a few options left of ways that we can get a straight.  Using the law of total probability we can calculate the total probability of rolling a straight given this starting condition.

Ways to get a straight:

* Roll 0 on the second turn, and 2 on the third turn.  This means rolling 2 repeats of numbers that you saved on your second turn, and then finishing the straight on the 3rd turn.  The probability of this happening is $\frac{2}{25}$. $\frac{2}{5}$ chance to roll 1 new die AND $\frac{1}{5}$ to finish the straight. AND rolling 0 right on the second turn which is $\frac{9}{25}$

* Roll 1 right on the second turn and 2 on the third turn.  The probability of this event is $\frac{14}{25} * \frac{1}{5}$.  Rolling 1 right and 1 repeat on the second turn has a probability of $\frac{14}{25}$ as one can see from a chart that displays all possible rolls.  1-5 on the x and 1-5 on the y, there are 14 boxes out of the 25 that fulfill this requirement. On the third roll you have a 1/5 chance to get the finishing blow for the straight.

* Roll 2 right on the second turn, and the straight is finished there.  The probability of this is the same as the third roll of the first way to get a straight: $\frac{2}{25}$

If you use the law of total probability on all of these values the final calculated result is

$$
    \frac{9}{25} * \frac{2}{25} + \frac{14}{25} * \frac{1}{5} + \frac{2}{25} = 0.2208
$$

This result agrees with the simulated value
***

**Part E:**  
Your friend offers you the following deal. Each time your Yahtzee turn (i.e., all three rolls) results in a 5-of-a-kind, she will give you \$5.
Each time your Yahtzee turn results in a straight, she will give you \$3.
But, she will charge you \$1 for each turn (where a turn includes all 3 rolls of the five 5-sided dice). Should you take this deal? Fully justify your answer using calculations that include expected values. You may include some simulations to estimate relevant probabilities. Clearly state any assumptions you are making in your modeling choices.

***
__Part 
2 strategies, going for the straight or going for the 5-of-a-kind. Decide the strategy based on the first roll.

1st strategy, going for the straight.
Save the distinct dice.  First, calculuate the probability of getting a straight depending on how many distinct dice are in the start using a simulation.

2nd strategy, going for the five of a kind.

This can be shortcut by just finding the expected values for both of the strategies.  The expected value for each of the strategies can be broken into arrays of expected win values for each of the different types of starts for the strategies.  The starts are scored based on the number of dice that put you closer toward the goal.  In the case of the Yahtzee strategy, the index in the array is the number of dice that you started with that are the same as one another.

In the case of the straight strategy, the index in the array is the number of distinct values that were rolled on the first roll.

Both of these arrays can be multiplied with their counterpart that is the probability of starting with this number of 'points on the first roll', and then each value can be multiplied with the win amount to get the expected value for each start.

That is, E(X) = prob(starting points) * win_amount.

With these two expected arrays printed we can see that for each possible starting combination of dice, the expected value is ALWAYS less than 1$, which is what we paid for the game. Thus, we should never play this game

***

In [143]:
def distinct_straight_sim_return_array(ntrials):
    num_straights = [0,0,0,0,0]
    starts = [0,0,0,0,0]
    for trial in range(ntrials):
        foundStraight = False
        # start with the condition which is 3 distinct dice exactly
        dice = roll5dice()
        num_distinct_start = sum(getdistinct(dice))
        if isStraight(dice):
                num_straights[num_distinct_start - 1] += 1
                foundStraight = True
        starts[num_distinct_start - 1] += 1
        # play the rest of the two turns to see if you can pull a straight
        for turn in range(2):
            if foundStraight: 
                break
            # re-roll the dice that aren't distinct
            dice = reroll(dice, getdistinct(dice))
            # test for straight
            if isStraight(dice):
                num_straights[num_distinct_start - 1] += 1
                break
    # probs of getting the straight give that you rolled a first roll with number of distinct dice equal to the index (zero based) + 1
    prob_win = []
    for i in range(len(num_straights)):
        if(starts[i] != 0):
            prob_win.append(num_straights[i] / starts[i])
        else:
            prob_win.append(0)
    prob_roll_start = [x / ntrials for x in starts]
    # return the probability of getting a straight in the trials
    return prob_win, prob_roll_start

# return the keepers going for a five of a kind strategy, keeping the number that you have the most of
def get_most_same_keepers(dice):
    keepers = []
    num_with_most = 0
    most_found = 0
    for i in range(len(dice)):
        num_current = 1
        for j in range(len(dice)):
            if dice[i] == dice[j] and i != j:
                num_current += 1
        if num_current > most_found:
            most_found = num_current
            num_with_most = dice[i]
    
    for die in dice:
        if die == num_with_most:
            keepers.append(1)
        else:
            keepers.append(0)
    return keepers

def fiveofakind_sim_return_array(ntrials):
    num_fives = [0,0,0,0,0]
    starts = [0,0,0,0,0]
    for trial in range(ntrials):
        foundYahtzee = False
        dice = roll5dice()
        num_started = sum(get_most_same_keepers(dice))
        if sum(get_most_same_keepers(dice)) == 5:
            num_fives[num_started - 1] += 1
            foundYahtzee = True
        starts[num_started - 1] += 1
        for turn in range(2):
            if foundYahtzee == True:
                break
            dice = reroll(dice, get_most_same_keepers(dice))
            if sum(get_most_same_keepers(dice)) == 5:
                num_fives[num_started - 1] += 1
                break
    prob_win = []
    for i in range(len(num_fives)):
        if(starts[i] != 0):
            prob_win.append(num_fives[i] / starts[i])
        else:
            prob_win.append(0)
    prob_roll_start = [x / ntrials for x in starts]
    return prob_win, prob_roll_start

        
        

straight_win = 3
prob_win_straight, prob_straight_start = distinct_straight_sim_return_array(100000)
prob_win_five, prob_five_start = fiveofakind_sim_return_array(100000)
expected_five = []
for i in range(5):
    expected_five.append(prob_five_start[i] * prob_win_five[i] * 5)
print(f"expected win for yahtzee strat: {expected_five}")
expected_straight = []
for i in range(5):
    expected_straight.append(prob_straight_start[i] * prob_win_straight[i] * 3)
print(f"expected win for straight strat: {expected_straight}")

expected win for yahtzee strat: [0.005700000000000001, 0.16479999999999997, 0.16425, 0.055150000000000005, 0.0081]
expected win for straight strat: [0.00066, 0.05001, 0.31593, 0.41697, 0.11187]


<br>

---
[Back to top](#top)
<a id='p2'></a>

### [30 points] Problem 2: Sharknado Prediction

Governor Hickenlooper has charged you with the task of assessing the factors associated with sharknado risk in Colorado. As everyone knows, sharknadoes are a leading cause of sharknado-related illness, and you are a world-renowned data/shark scientist.

You decide to use multiple linear regression to understand and predict what factors lead to increased sharknado hazard. Your lead scientist, aptly named Fin, has collected lots of relevant data at a local sharknado hotspot, the Boulder Reservoir[\*](#footnote). The data cover a variety of sharknado-related environmental and other conditions, and you'll find this data in the file `sharknadoes.csv`. 

**Response**: 

- $\texttt{sharknado hazard}$: the hazard of a sharknado, where 1 is very unlikely and 100 is highly likely

**Features**: 

- $\texttt{taunts}$: the number of times over the past year that someone has taunted a shark
- $\texttt{clouds}$: what percentage of the sky was covered by clouds (fraction, 0-1)
- $\texttt{precipitation}$: amount of precipitation in the past 72 hours (inches)
- $\texttt{earthquake}$: the intensity of the most recent earthquake measured in the continental United States
- $\texttt{shark attacks}$: the number of shark attacks within 72 hours prior to the observation
- $\texttt{ice cream sold}$: the number of units of ice cream sold at the beach concession stand 
- $\texttt{misery index}$: an economic indicator for how miserable the average United States citizen is, based on the unemployment rate and the inflation rate. More [here](https://www.stuffyoushouldknow.com/podcasts/whats-the-misery-index.htm) and [here](https://en.wikipedia.org/wiki/Misery_index_(economics)). Higher values correspond to more miserable citizens.
- $\texttt{temperature}$: the outside temperature, measured in degrees Fahrenheit
- $\texttt{humidity}$: relative humidity (percent, 0-100)
- $\texttt{pizzas sold}$: the number of pizzas sold at the beach concession stand in the past year
- $\texttt{pressure}$: local air pressure (millibar) 
- $\texttt{octopuses}$: the number of octupuses in the vicinity on the day of the observation
- $\texttt{Dan's shoe size}$: the size of the shoes Dan was wearing when the observation was made
- $\texttt{Tony's shoe size}$: the size of the shoes Tony was wearing when the observation was made

**Part A**: Read the data from `sharknadoes.csv` into a Pandas DataFrame.  Note that since we will be doing a multiple linear regression we will need all of the features, so you should drop any row in the DataFrame that is missing data. 

**Part B**: Perform the appropriate statistical test at the $\alpha = 0.01$ significance level to determine if _at least one_ of the features is related to the the response $y$.  Clearly describe your methodology and show all computations in Python. 

**Part C**: Write a function `backward_select(df, resp_str, maxsse)` that takes in the DataFrame (`df`), the name of the column corresponding to the response (`resp_str`), and the maximum desired sum of squared errors (`maxsse`), and returns a list of feature names corresponding to the most important features via backward selection.  Use your code to determine the reduced MLR model with the minimal number of features such that the SSE of the reduced model is less than 570. At each stage in backward selection you should remove the feature that has the highest p-value associated with the hypothesis test for the given slope coefficient $\beta_k \neq 0$.

Your code should clearly indicate which feature was removed in each stage, and the SSE associated with the model fit before the feature's removal. _Specifically, please write your code to print the name of the feature that is going to be removed and the SSE before its removal_. Afterward, be sure to report all of the retained features and the SSE of the reduced model.

**Note**: The point of this exercise is to see if you can implement **backward_select** yourself.  You may of course use canned routines like statmodels OLS, but you may not call any Python method that explicitly performs backward selection.

In [None]:
def backward_select(df, resp_str, maxsse):
    
    # your code goes here!
    
    remaining_features = [] # placeholder
    
    return remaining_features

**Part D**: Write down the multiple linear regression model, including estimated parameters, obtained by your backward selection process. 

**Part E**: Perform the appropriate statistical test at the $\alpha = 0.01$ significance level to determine whether there is a statistically significant difference between the full model with all features and the reduced model obtained by backward selection in **Part D**. You may use output from your model fit above, but all calculations should be set up in Markdown/MathJax.

**Part F**: Based on your conclusions in **Part E**, use the _better_ of the two models to predict the sharknado hazard when the following features are observed: 

- $\texttt{taunts}$: 47
- $\texttt{clouds}$: 0.8
- $\texttt{precipitation}$: 1 inch
- $\texttt{earthquake}$: 5
- $\texttt{shark attacks}$: 11
- $\texttt{ice cream sold}$: 120
- $\texttt{misery index}$: 15
- $\texttt{temperature}$: 70 degrees F
- $\texttt{humidity}$: 83
- $\texttt{pizzas sold}$: 5500
- $\texttt{pressure}$: 850 millibar 
- $\texttt{octopuses}$: 6
- $\texttt{Dan's shoe size}$: 9.5
- $\texttt{Tony's shoe size}$: 9

**Part G:** Consider the model you used in Part E, and consider the fact that you are trying to predict **sharknado hazard**. What is one critical drawback to the MLR model (or any MLR model) for predicting shardnado hazard? What are some modifications that could improve on this issue?

<br>

---
[Back to top](#top)
<a id='p3'></a>

### [35 points] Problem 3: FlipMaster5000

In the file `flips.csv` you'll find the results of an experiment that was conducted with Stella O'Flaherty (the famous octopus data scientist) flipping coins. Her experiment was as follows. 

1. She reaches into her coin purse and grabs one of two coins, labeled $x$ and $y$. 
2. She flips her coin until it comes up heads 8 times, and records the coin ID and the number of flips it took to get 8 heads. 
3. She then replaces the coin in her coin purse and repeats the experiment. 

**Part A:**

By considering the total number of flips and the total number of "heads" in the data file for each coin, estimate the bias of each coin $p_x$ and $p_y$, and use an appropriate statistical test to determine whether the coins have the same bias, i.e. whether $p_x$ and $p_y$ are the same. Perform your test at a significance level that will mistakenly reject the null hypothesis _when that null hypothesis is actually true_ 5% of the time. Report a p-value for your test, and clearly state your conclusions.

**Part B:** 

You learn that, actually, the coin $x$ is from a manufacturer that produces coins whose biases follow some statistical regularity. In particular, the bias of the $x$ coin is in the set $$p_x \in \{0.1, 0.2, 0.3, \dots, 0.9\}.$$ Furthermore, these biases all occur with equal probability. In other words, $\tfrac{1}{8}$ of coins have bias $p_x=0.1$, $\tfrac{1}{8}$ of coins have bias $p_x=0.2$, and so on. 

For each possible value of $p_x$, compute the probability that Stella's $x$ coin has bias of $p_x$, given the data in her data file. 

Plot your results with $p_x$ on the horizontal axis and $Pr(p_x \mid \text{data})$ on the vertical axis. Make the points or lines that you plot blue. Plots without axis labels will receive zero credit.

_Hint_: We have done problems like this before! Think back to how you solved the problem on the midterm where you determined the probability that someone had ESP, given that they guessed the cards correctly. There was a "rule", and maybe a "law" involved in your calculation...

**Part C:**

You learn that, actually, the coin $y$ is from a different manufacturer that produces coins whose biases follow some statistical regularity. In particular, the bias of the $y$ coin is in the set $$p_y \in \{0.1, 0.2, 0.3, \dots, 0.9\}.$$ Furthermore, these biases all occur with different probability. In particular, the probability that a coin has bias $p_y$ is proportional to $p_y$, which could be written as 
$$Pr(p_y) \propto p_y \quad \text{for} \quad p_y \in \{0.1, 0.2, 0.3, \dots, 0.9\}$$

First, write clearly the PMF for $p_y$, based on the information above. 

Then, for each possible value of $p_y$, compute the probability that Stella's $y$ coin has bias of $p_y$, given the data in her data file. 

Plot your results with $p_y$ on the horizontal axis and $Pr(p_y \mid \text{data})$ on the vertical axis. Make the points or lines that you plot red. Plots without axis labels will receive zero credit.

**Part D:**

The information that you have about the manufacturer of coin $x$ and coin $y$ is called _prior information_ since it can influence the estimates of a coin's bias at which you arrive, given the data from the coin's flipping. We often call the distribution $Pr(p_x)$ or $Pr(p_y)$ a _prior distribution_, and call $Pr(p_x \mid \text{data})$ or $Pr(p_y \mid \text{data})$ a _posterior distribution_, since it represents the estimate that you arrive at after you have taken the data into account. 

You have already computed posterior distributions for each coin's bias. However, you'll now investigate the importance of the prior by _switching the priors for the two coins_.

In other words, using the prior probabilities $Pr(p_x)$, what is your posterior distribution of $Pr(p_y \mid \text{data from y})$? Similarly, using the prior probabilities $Pr(p_y)$, what is your posterior distribution of $Pr(p_x \mid \text{data from x})$? 

Create two plots. 

1. In the first plot, show your results from Part B (the posterior distribution for $p_x$ with the correct prior) plotted with a blue solid line as well as your results from Part D for the posterior distribution for $p_x$ with the incorrect prior with a blue dashed line.  

2. In the second plot, show your results from Part C (the posterior distribution for $p_y$ with the correct prior) with a red solid line as well as your results from Part D for the posterior distribution for $p_y$ with the incorrect prior with a red dashed line.  

**Part E:**

What is the name of the distribution that Stella's experiment is drawn from?

<br>

---
[Back to Problem 2](#p2)

<a id='footnote'></a> Yeah yeah - fresh water versus salt water - I know, I know. But sharknadoes also are not real, so...