# lab_simulation : Into the Matrix

As you learned in lecture, **simulation** is an extremely powerful tool to estimate the probability of an event occurring by simulating many observations an event and determining the successful observations.

This lab will have you build increasingly interesting simulations and find the results.

## Simulation 0: Your Groups! 
Edit the next Python cell to add information about who you're working within your lab section:

In [5]:
# First, meet your CAs and TA if you haven't already!
# ...first name is enough, we'll know who they are! :)
ta_name = "Sogol"
ca1_name = "Katy"
ca2_name = "Emily"


# Also, make sure to meet your team for this lab! Find out their name, what major they're in,
# and learn something new about them that you never knew before!
partner1_name = "Shawn"
partner1_major = "Info Sci"
partner1_qotd_answer = "California"

partner2_name = "Alex"
partner2_major = "Econ"
partner2_qotd_answer = "Home"

partner3_name = "Davin"
partner3_major = "Econ"
partner3_qotd_answer = "Work"

## Simulation 1: Pre-Quiz Dice Rolls

The pre-quiz question that was asked by Karle and Wade in lecture was as follows:

> You roll two different fair six-sided dice at the same time.  One die is colored blue, one is colored red.  What is the probability that the blue die lands on 4 or the red die lands on 2?

Simulate the above problem 1,000 times and storing your observations of the value of the red die and blue die into `df1`.

In [4]:
# Step 0: Import any libraries you need:
import pandas as pd
import random


In [8]:
# Step 1: Always start with an empty list to store our simulation data:
data = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    blue = random.randint(1, 6)
    red = random.randint(1, 6)
    d= {"blue":blue, "red":red}
    data.append(d)
# Step 3: Store the simulation data into a DataFrame
df1 = pd.DataFrame(data)



# ...and show a few random rows:
df1.sample(5)

Unnamed: 0,blue,red
72,4,6
373,2,5
6,1,2
111,6,2
36,3,3


### Puzzle 1.1: Probability Calculations

Find our estimation of the probability that the blue die lands on 4 or the red die lands on 2?

- To do this, create a `df1_success` DataFrame with only the rows that were successful.
- Use `df1` and `df1_success` to find the probability of success using your simulation and store that value in `P_puzzle1` below.

In [12]:
# Create a DataFrame that contains only the subset of observations that were successful:
df1_success = df1[(df1["blue"]==4)|(df1["red"]==2)]



# ...and show a few random rows:
df1_success.sample(5)

Unnamed: 0,blue,red
577,3,2
606,4,3
180,2,2
889,2,2
575,4,3


In [13]:
# Find the value of P_puzzle1, the probability of success:
P_puzzle1 = len(df1_success)/1000
P_puzzle1



0.318

### Puzzle 1.2: Finding the Exact Answer

This simulation simulated a pretty easy example that you can find an exact answer to!  Using the probability learned from lecture, calculate `P_puzzle1_exact`, the **exact** probability of the blue die landing on a 4 **or** the red die landing on a 2. 

- No python code is **required** for this question, we just need you to store the answer in `P_puzzle1_exact`.

In [14]:
P_puzzle1_exact = 11/36
P_puzzle1_exact



0.3055555555555556

### Puzzle 1.3: Finding the Error

The **error** in a simulation is the difference between the exact value and value found from the simulation.  Subtract the estimated value (`P_puzzle1`) from the exact value (`P_puzzle1_exact`) to find the total error and store it in `puzzle1_error`.

In [15]:
puzzle1_error = P_puzzle1 - P_puzzle1_exact
puzzle1_error



0.012444444444444425

In [16]:
## == TEST CASES for Simulation 1 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df1) == 1000), "Make sure your df1 has exactly 1,000 observations"
assert(len(df1_success) < 1000), "Make sure your df1_success only has successes"
assert(P_puzzle1 > 0 and P_puzzle1 < 1), "Make sure your P_puzzle1 is a probability of success"
assert(round(P_puzzle1_exact, 3) == 0.306), "Make sure your P_puzzle1_exact contains the exact probability of success"
assert(puzzle1_error < 1), "Make sure to calculate the error by subtraction"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")
print()
print(f"Simulated Probability: P(blue == 4 | red == 2) = {round(100 * P_puzzle1, 2)}%")
print(f"    Exact Probability:                         = {round(100 * P_puzzle1_exact, 2)}%")

🎉 All tests passed! 🎉

Simulated Probability: P(blue == 4 | red == 2) = 31.8%
    Exact Probability:                         = 30.56%


## Simulation 2: Rolling Three Die

Let's add another die into the mix.  Suppose we roll three dice: a **white**, a **red**, and a **blue** die.

Write a 1,000-run simulation of that event and store the observations in `df2`:

In [17]:
# Step 1: Always start with an empty list to store our simulation data:
data = []

# Step 2: Write the simulation inside of a for-loop
for i in range(1000):
    white = random.randint(1,6)
    red = random.randint(1,6)
    blue = random.randint(1,6)
    d = {"white":white, "red":red, "blue":blue}
    data.append(d)
# Step 3: Store the simulation data into a DataFrame
df2 = pd.DataFrame(data)



# ...and show a few random rows:
df2.sample(5)

Unnamed: 0,white,red,blue
54,1,6,3
732,5,2,3
839,2,3,6
265,3,2,1
12,3,3,2


### Puzzle 2.1: Probability Calculations

Find our estimation of the probability that the **sum of all three die** is equal to exactly 9.

- To do this, create a `df2_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle2` below.

In [18]:
# Create a DataFrame that contains only the subset of observations that were successful:
df2["score"]= df2["white"]+df2["red"]+df2["blue"]
df2_success = df2[df2["score"]==9]



# ...and show a few random rows:
df2_success.sample(5)

Unnamed: 0,white,red,blue,score
239,2,2,5,9
432,5,2,2,9
373,1,4,4,9
747,4,3,2,9
737,6,1,2,9


In [19]:
# Find the value of P_puzzle2, the probability of success:
P_puzzle2 = len(df2_success)/1000
P_puzzle2



0.104

In [20]:
## == TEST CASES for Simulation 2 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df2) == 1000), "Make sure your df2 has exactly 1,000 observations"
assert(len(df2_success) < 1000), "Make sure your df2_success only has successes"
assert(P_puzzle2 > 0 and P_puzzle2 < 1), "Make sure your P_puzzle2 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 3: Fliping Four Coins

Supoose we flip **four coins**, one coin at a time, one after another.  Each coin has two sides, "Heads" and "Tails".

Write a simulation of that event and store the observations in `df3` and run the simulation **50,000** times:

In [26]:
# Refer to the previous simulations if needed, but write the code yourself (don't just copy/paste and edit it)!

data = []
for i in range(50000):
    one = random.randint(1,2)
    two = random.randint(1,2)
    three = random.randint(1,2)
    four = random.randint(1,2)
    
    d = {"one":one, "two":two, "three":three, "four":four}
    data.append(d)
    



df3 = pd.DataFrame(data)


# 1= H, 2 =T
# ...and show a few random rows:
df3.sample(5)

Unnamed: 0,one,two,three,four
10323,2,2,1,2
43254,2,2,2,1
23881,1,2,1,2
40011,2,2,1,1
24109,1,2,2,2


### Puzzle 3.1: Probability Calculations

Find our estimation of the probability that your first two coin flips were both heads and your last two coin flips were both tails?

- To do this, create a `df3_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle3` below.

In [28]:
# Create a DataFrame that contains only the subset of observations that were successful:
df3_success = df3[((df3["one"]&df3["two"]==1))&((df3["three"]&df3["four"]==2))]



# ...and show a few random rows:
df3_success.sample(5)

Unnamed: 0,one,two,three,four
46893,1,1,2,2
17587,1,1,2,2
48836,1,1,2,2
9690,1,1,2,2
32028,1,1,2,2


In [29]:
# Find the value of P_puzzle3, the probability of success:
P_puzzle3 = len(df3_success)/ 50000
P_puzzle3



0.06276

In [30]:
## == TEST CASES for Simulation 3 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df3) == 50000), "Make sure your df3 has exactly 50,000 observations"
assert(len(df3_success) < 10000), "Make sure your df3_success only has successes"
assert(P_puzzle3 > 0.03 and P_puzzle3 < 0.125), "Make sure your P_puzzle3 is a probability of success"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 4: A short exam

Suppose you take a short exam with **four quesitons**:

- Two multiple choice questions with five possible responses, AND
- Two true/false questions

Write a simulation of that randomly guess on each question and store the observations in `df4`.  Run the simulation **107,000** times:

In [32]:
data = []
for i in range(107000):
    q1 = random.randint(1,5)
    q2 = random.randint(1,5)
    q3 = random.randint(1,2) #1 =T, #2=F
    q4 = random.randint(1,2)
    
    d = {"q1":q1,"q2":q2,"q3":q3,"q4":q4}
    
    data.append(d)

df4 = pd.DataFrame(data)



# ...and show a few random rows:
df4.sample(5)

Unnamed: 0,q1,q2,q3,q4
68722,5,2,2,2
72304,2,3,2,1
17159,1,4,1,2
74009,3,4,1,2
61830,5,2,2,2


### Puzzle 4.1: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a 100%** on the exam.

- To do this, create a `df4_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4` below.

In [38]:
# Create a DataFrame that contains only the subset of observations that were successful:
df4_success = df4[(df4["q1"]==1)&(df4["q2"]==1)&(df4["q3"]==1)&(df4["q4"]==1)]

# 1 is the correct answer for all!

# ...and show a few random rows:
df4_success.sample(5)

Unnamed: 0,q1,q2,q3,q4
27998,1,1,1,1
79206,1,1,1,1
49188,1,1,1,1
2451,1,1,1,1
45258,1,1,1,1


In [39]:
# Find the value of P_puzzle4, the probability of success:
P_puzzle4 = len(df4_success)/107000
P_puzzle4



0.010018691588785046

### Puzzle 4.2: Probability Calculations

Suppose you have a solution for the exam (the solution itself can be anything, you just need to make sure each question only has one correct answer).  Find an estimation of the probability that a student, who randomly guesses on each question, **earned a passing grade** on the exam.  *(Each question is worth the same amount, so a passing grade means you got at least 3 of the four quesitons correct.)* Hint: How many ways are there to get a **passing grade** on the exam?

- To do this, create a `df4_passing` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle4_passing` below.

In [58]:
# Create a DataFrame that contains only the subset of observations that were successful:



df4_passing = df4[((df4["q1"] == 1) & (df4["q2"] == 1) & (df4["q3"] == 1) & (df4["q4"] != 1)) | ((df4["q1"] == 1) & (df4["q2"] == 1) & (df4["q3"] != 1) & (df4["q4"] == 1)) | ((df4["q1"] == 1) & (df4["q2"] != 1) & (df4["q3"] == 1) & (df4["q4"] == 1)) | ((df4["q1"] != 1) & (df4["q2"] == 1) & (df4["q3"] == 1) & (df4["q4"] == 1)) | ((df4["q1"] == 1) & (df4["q2"] == 1) & (df4["q3"] == 1) & (df4["q4"] == 1))]

# ...and show a few random rows:
df4_passing.sample(5)


Unnamed: 0,q1,q2,q3,q4,score
52870,1,3,1,1,6
8207,3,1,1,1,6
43277,1,5,1,1,8
48504,1,1,1,2,5
65046,1,3,1,1,6


In [59]:
# Find the value of P_puzzle4_passing, the probability of success:
P_puzzle4_passing = len(df4_passing)/107000
P_puzzle4_passing



0.11193457943925234

In [60]:
## == TEST CASES for Simulation 4 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df4) == 107000), "Make sure your df4 has exactly 107,000 observations"
assert(len(df4_success) < (107000 * 0.05)), "Make sure your df4_success only has students scoring 100%"
assert(len(df4_passing) < (107000 * 0.2)), "Make sure your df4_passing has all students passing"

assert(P_puzzle4 > 0 and P_puzzle4 < 0.05), "Make sure your P_puzzle4 is a probability of earning a 100%"
assert(P_puzzle4_passing > 0.05 and P_puzzle4_passing < 0.2), "Make sure your P_puzzle4_passing is a probability of earning a passing grade"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Simulation 5: Marbles in a Bag

Suppose you have a bag of 12 marbles.  The bag contains:

- Three red marbles,
- Four blue marbles, and
- Five clear marbles

Write a simulation of that randomly draws a total of two marbles from the bag **with replacement** after drawing each one.  Run the simulation **50,000** times and store your observations in `df5`:

In [62]:
# Suggestion! You can use 1-3: red, 4-7: blue, 8-12: clear

data = []
for i in range(50000):
    m1 = random.randint(1,12)
    m2 = random.randint(1,12)
    y={"m1":m1,"m2":m2}
    data.append(y)


df5 = pd.DataFrame(data)



# ...and show a few random rows:
df5.sample(5)

Unnamed: 0,m1,m2
26110,12,12
2468,8,10
49720,12,9
13612,4,11
47936,1,1


### Puzzle 5.1: Probability Calculations

Find an estimation of the probability that you draw exactly one red marble and exactly one blue marble.

- To do this, create a `df5_success` DataFrame with only the rows that were successful.
- Store our estimation of the probability of success in `P_puzzle5` below.

In [63]:
# Create a DataFrame that contains only the subset of observations that were successful:
df5_success = df5[ ((df5["m1"]<=3)& ((df5["m2"]<=7)&(df5["m2"] >=4))) |
                  ((df5["m2"]<=3)&((df5["m1"]<=7)&(df5["m1"]>=4)))]



# ...and show a few random rows:
df5_success.sample(5)

Unnamed: 0,m1,m2
616,5,2
37824,7,2
17131,1,4
24426,4,2
48158,2,5


In [64]:
# Find the value of P_puzzle5, the probability of success:
P_puzzle5 = len(df5_success)/ 50000
P_puzzle5



0.16756

In [65]:
## == TEST CASES for Simulation 5 ==
# - This read-only cell contains test cases for your previous cell.
# - If this cell runs without any error our output, you PASSED all test cases!
# - If this cell results in any errors, check you previous cell, make changes, and RE-RUN your code and then this cell.

assert(len(df5) == 50000), "Make sure your df5 has exactly 50,000 observations"
assert(len(df5_success) < (107000 * 0.2)), "Make sure your df5_success only has students scoring 100%"

assert(P_puzzle5 > 0.02 and P_puzzle5 < 0.2), "Make sure your P_puzzle5 is a probability of earning a 100%"

## == SUCCESS MESSAGE ==
# You will only see this message (with the emoji showing) if you passed all test cases:
tada = "\N{PARTY POPPER}"
print(f"{tada} All tests passed! {tada}")

🎉 All tests passed! 🎉


## Submit Your Work!

Make sure to **Save and Checkpoint** your notebook, exit Jupyter, and submit your work! :)