## From Last Time
In the last worksheet, we established some basic vocabulary and concepts. Here's some review:

  1. What is the difference between experimental and theoretical probability?
  

  2. Dice with more than 6 sides are commonly used in board games. Suppose we had a die with 20 sides (numbered from 1 to 20). Find the following probabilities:  
    a. P(rolling a 3)
    
    b. P(not rolling a 3)

    c. P(rolling within the range (10, 15])

    d. P(rolling under 5 OR above 15)
    

  3. Suppose that you have a 20-sided die in real life, and you're trying to figure out if it's a fair die. In other words, is there the same probability of landing on each side? Explain how you would find out, and justify your answer using the Law of Large Numbers.
  

**Income Level**|**Not Audited**|**Audited**|**Total**
:-----:|:-----:|:-----:|:-----:
Under 200k|143425|619|**144044**
200k to 1m|7809|44|**7833**
Over 1m|568|14|**582**
**Total**|**151782**|**677**|**152459**

  4. In the United States, tax returns are due around April 15th. The information above is taken from [the IRS](https://www.irs.gov/pub/irs-pdf/p55b.pdf) on individual tax returns in 2019; the numbers in the table are reported in the thousands.

  a. For a randomly selected taxpayer in 2019, what is the probability of being audited?

  b. For a taxpayer with an income of more than $1m, what is the probability of being audited?

  c. For a taxpayer with an income below $200k, what is the probability of not being audited?

  d. Is there a relationship between income level and likelihood of being audited? Support your answer using what you know about contingency tables and conditional proportions.



## Probability Continued
So far we've been looking at the results of a single random event: flipping a single coin, or rolling a die once. How do we think about combining probabilies?

To begin to formalize this idea, imagine we have a random generator that can do two things: it either fairly picks a color from {red, blue, green}, or it fairly rolls a 6-sided die.

### Intersection
When we're interested in the probability of an event A happening *and* another event B happening, we're looking at the **intersection** of these two events. How do we calculate P(A and B)? We'll work through an example.

Let's say we want a random color first, then a random number from our generator. What is P(green and 2)?  
First, we find P(green), which is 1/3.  
Then, we find P(2), which is 1/6.  
So how do we get to P(green and 2)? We **multiply** 1/3 /* 1/6 = 1/18.

### Union
When we're interested in the probability of an event A happening *or* another event B happening, we're looking at the **union** of these two events. How do we calculate P(A or B)? We'll work through an example.

What is P(green or 2)? Well, we already know the individual probabilities, which are 1/3 and 1/6.  
If we get green, then we don't care what number is rolled.  
It we get 2, then we don't care what color was chosen.  
We can **add** the probabilities together while taking into account the intersection.  
P(green or 2) = P(green) + P(2) - P(green and 2) = 1/3 + 1/6 - 1/18 = 8/18


### Simulation
How can we reassure ourselves that the **multiplication** and **addition rules** work? We can go back to the idea of using events and sample space to confirm.

  5. Write code that prints out the sample space of the scenario above.  
    a. What is the size of the sample space?

    b. What is P(green and 2)? Justify your answer with your code.  
    *Hint: Count how many events in the sample space are green and 2*

    c. What is P(green or 2)? Justify your answer with your code.

  *Hint:* In your code, literally use `and` and `or`!

In [5]:
# Write your code here.

  6. Let's say an unfair coin is flipped twice. You know that the coin has a 70% chance of landing on tails.  
    a. What is the probability of landing on heads?

    b. What is the sample space? An example event is TH, which indicates landing on tails then heads.

    c. What is the probability of landing on tails first?

    d. Since there are 4 possible outcomes and HH appears only once, someone concludes that the probability of flipping two heads is 1/4. Explain why this is incorrect.

    e. What is the expected probability of landing on heads twice?
    

  **Income Level**|**Not Audited**|**Audited**|**Total**
  :-----:|:-----:|:-----:|:-----:
  Under 200k|143425|619|**144044**
  200k to 1m|7809|44|**7833**
  Over 1m|568|14|**582**
  **Total**|**151782**|**677**|**152459**

  7. Let's return to our tax audit contingency table from before. Here it is again:  
    a. What is the probability that a randomly selected taxpayer has an income of under $200k?

    b. What is the probability that a randomly selected taxpayer is audited?

    c. What is the *expected* probability that a randomly selected taxpayer with income of under $200k is audited?

    d. What is the *observed* probability that a randomly selected taxpayer with income of under $200k is audited?

    e. Explain why your answers to (c) and (d) are not exactly the same.


  8. Continue to look at the tax table.  
    a. What is the *expected* probability that a randomly selected taxpayer has income of under $200k or is audited?

    b. What is the *observed* probability that a randomly selected taxpayer has income of under $200k or is audited?

    c. Explain why your answers to (a) and (b) are not exactly the same.

## Conditional Probability
We've seen this syntax in the past - P(A | B) is the probability of an event A *given* an event B has occurred.  
Using the example from before, let's break down how to calculate P(2 | green).

As a refresher, we know that the possible colors are {red, green, blue}, and the number is from a 6-sided die. We can assume green has already been chosen, and of the events where green is chosen, 2 is rolled once.

So P(2 | green) = P(green and 2) / P(green) = (1/3 * 1/6) / (1/3) = 1/6


### Simulation
Similar to (5), we can use code to calculate conditional probabilities.

  9. Print the entire sample space again. (Same as (5))  
    a. What is the size of the subset of the sample space where green is chosen?

    b. Of the green subset, count how many times 2 is rolled.

    c. Calculate P(2 | green) using your answer to (a) and (b).
    

In [None]:
# Write your code here

## Independent Events
Recall your answers to (7) and (8). There was a slight difference between the *expected* and the *observed* probabilities. This indicates that perhaps there is a better way to calculate P(A and B)!

The formula from the previous worksheet only works if A and B are **independent** events; in other words, A has no relationship at all with B. However, that's not always the case.

If we take conditional probabilities into account, P(A and B) = P(B) * P(A | B).

  10. What can you conclude about two events A and B if you know that P(A) = P(A|B)?

## Summary
This will be a good summary box to have for your notes!

  11. In terms of P(A) and P(B), find:  
    a. P(A and B)

    b. P(A or B)

    c. P(A | B)
    

## Probability Practice
Answer the following questions. You may add code cells and write code if that will help you answer the question.

  12. A school is trying to figure out the lunch menu for next year, and conducts a random sample of 20 students. All 20 students say that they are not vegetarians, so the school concludes that the probability of being a vegetarian is 0%. Explain why the school's conclusion is incorrect.


  13. Based on the number of cases in the US, the chance that a randomly selected person in the US has COVID-19 is 9.5%. A testing company claims to have a 99% accuracy rate for detecting COVID-19.  
Denote a positive test as + and negative test as -  
Denote having COVID-19 as C+ and not having COVID-19 as C-  
    a. According to the company's claim, what is P(+|C+)?

    b. What is P(C-)?

    c. Use the summary box to help you. P(+) = 0.0982  
    What is P(C+|+)? In other words, what is the probability that a person who tests positive actually has COVID-19?

    d. What is P(C+|-)? In other words, what is the probability that a person who tests negative actually has COVID-19?
    