# Review
  1. You're looking at the weather forecast for this coming Saturday.  
  You note that P(rain) = 80% and P(windy) = 35%.  
  Assume that these are independent events.  
  
    a. What is P(rain and windy)?

    b. What is P(rain or windy)?

    c. Would your answers to (a) and (b) still be correct if the events aren't independent? Explain.

    d. What is P(rain and no rain)?

  2. A COVID-19 test has a 90% probability to give a correct diagnosis. What is the probability that a person who has COVID-19 will receive a correct diagnosis after:

    a. Taking the test once?

    b. Taking the test twice?

    c. Taking the test 3 times?

  3. Explain why an *observed* probability might end up different from a calculated *expected* probability.


# Random Variables
We've been talking about random outcomes and finding the probability of a certain event happening. To formalize this idea, we can use the vocabulary of **random variables**. 

A **random variable** is a numerical variable whose value depends on the result of a random phenomenon. Each random variable has a corresponding **probability distribution** which describes both the sample space and the probability of each possible value. 

The sum of all probabilities in the probability distribution must add up to 1.

## Example
We've looked at many examples now of rolling a 6-sided die. How do we describle this scenario using random variables and probability distributions?

  4. Let X = outcome of rolling a 6-sided die once  
    a. What is the sample space?

    b. What is the probability of rolling a 1?

    c. Write code in the cell below that generates a dictionary `X`. The keys of `X` should be each event in the sample space, and the values should be the corresponding probability.

    d. Write a function `P(x)` that takes an event and returns the probability of it happening.
    

In [10]:
# Write your code here


# The following tests should print all true
tests = []
error = 0.0000001


tests.append(X[1] - 1/6 < error) # The proper way to test float equality
tests.append(X[3] == X[5])

tests.append(P(1) == P(2))
tests.append(P(4) - 1/6 < error)
tests.append(P(7) == 0)
for t in tests:
    print(t)

  5. Another way of defining a **probability distribution** is simply the code that you just wrote for (4d): a function that returns a probability given an event. Write code for the probability distribution of flipping a fair coin.

  6. Write code to answer the following questions.  
    Let X = outcome of rolling two dice at the same time  
    a. What is the sample space? Use a list of **tuples**.

    b. Write a function for the probability distribution of X.  
    *Hint: Treat the input as a tuple.*

    c. Write a function `P_sum(n)` that returns the probability of the two dice adding up to `n`.
    

# Discrete vs Continuous
The previous few examples involve random variables with *dicrete* outcomes. What if there aren't discrete outcomes? We'll go back to our friends the Palmer penguins.

When we conduct a statistical experiment, we treat each variable in the data as a random variable. For example, let X = the body mass of a Gentoo penguin.

Analysis of continuous random variables involves calculus and integration; however, we can get good estimates even without doing calculus. How? By looking at histograms!

In [12]:
#!pip3 install palmerpenguins
from palmerpenguins import load_penguins
import pandas as pd

penguins = load_penguins()
#penguins.sample(5)

## Using a Histogram
So how do we use a histogram to figure out the probability distribution of a continuous random variable? We'll walk through an example.

  7. Let X = the body mass of a randomly chosen penguin. Write code to generate a histogram for body mass. Use `altair` and add a **tooltip** so that you can see the `count` on hover.  
    a. What is the size of the sample space?

    b. How many penguins have a body mass between 4500 and 5000 grams?

    c. What is the probability of choosing a penguin with body mass between 4500 and 5000 grams?

    d. How would you describe the probability distribution of penguin body mass?
    

# Expected Value
Once we've figured out the probability distribution of a random variable, what can we do with this information? One useful application is something called **expected value**. In other words, what outcome do we generally *expect* to see? 

For a given random variable X, the **expected value** is equal to the **weighted average**. For every possible event, we multiply it by the probability of that event happening, and then we add all those together. In mathematical terms,

$$ ExpectedValue(X) = \Sigma xP(x), \forall x \in X$$

  8. Your friend wants to make a bet with you: a fair coin will be flipped. If it lands in heads, you get a cup of coffee, but if it lands on tails, you lose a cup of coffee.  
    a. What is the random variable in this scenario?
    

    b. What is the probability distribution of this random variable?

    c. What is the expected value of this random variable?
    

  9. Your favorite coffee shop is having a promotion where each customer who enters gets to spin a wheel. They have a 50% chance of winning one free coffee and 30% chance of winning two free coffees. The rest of the wheel is blank.  
    a. What is the probability of not winning at least one coffee?

    b. On average, how many coffees would a customer expect to win upon spinning the wheel? 
    

  10. The code below describes the probability distribution P_10 for some random variable.  
    a. What is the sample space?

    b. What is P_10(3)?

    c. What is P_10(29)?

    d. What is the expected value of this random variable? Write code to find the answer.

In [3]:
def P_10(x):
    if x in range(1,6):
        return x/15
    else:
        return 0


# Questions
Answer the following questions. You may add code cells and write code if that will help you answer the question.

  11. A player stumbles upon a rare chest in a video game and knows that a random amount of gold will be generated upon opening it. This is the probability distribution:

|Probability|Gold|
|--------|----|
|1%       |1000|
|10%      |500 |
|50%      |100 |
|39%      |80  |

  What's the expected amount of gold this player will receive?

  12. A basketball player is analyzing their own stats, and realizes that so far they've made 70% of their free throws. Let's analyze a situation where this player is given 2 free throws.  
    a. What is the random variable and sample space?

    b. What is the probability distribution?

    c. Out of the 2, how many shots will this player get in on average?
    

  13. You're shopping online, and you're deciding between two different sellers for what seems to be the same item. Seller A has a rating of 9.5 with 10 reviews, while Seller B has a rating of 9.0 with 300 reviews. Which seller would you want to buy from more, and why?