## Simulation exercises

In [2]:
import numpy as np
import pandas as pd

np.random.seed(29)

1. How likely is it that you roll doubles when rolling two dice?

In [166]:
# generate dice rolls
n_trials = 10_000
n_dice = 2

rolls = np.random.choice([x for x in range(1,7)], n_trials*n_dice)
rolls = rolls.reshape(n_dice, n_trials)

rolls #this represents the roll as two parallel arrays

array([[3, 2, 4, ..., 3, 1, 1],
       [6, 5, 2, ..., 4, 3, 4]])

In [167]:
doubles = rolls[0] == rolls[1]

In [168]:
#take the mean to get the percentage of doubles
doubles.mean()

0.1613

2. If you flip 8 coins, what is the probability of getting exactly 3 heads? What is the probability of getting more than 3 heads?

In [245]:
#simulate the coin flip a few thousand times
# let the heads be represented as 1
n_flip = 10_000
coins = 8

flips = np.random.choice([x for x in range(2)], n_flip*coins)
flips = flips.reshape(n_flip, coins)

flips

array([[0, 1, 0, ..., 0, 0, 0],
       [1, 1, 1, ..., 1, 1, 0],
       [0, 0, 1, ..., 1, 0, 0],
       ...,
       [1, 0, 1, ..., 1, 1, 1],
       [1, 1, 0, ..., 0, 1, 1],
       [0, 0, 1, ..., 1, 1, 1]])

In [241]:
heads_per_flip = flips.sum(axis=1)

In [242]:
#a. number of rolls with 3 heads
(heads_per_flip == 3).mean()

0.2203

In [243]:
#b. number of rolls with more than three heads
(heads_per_flip > 3).mean()

0.6355

3. There are approximitely 3 web development cohorts for every 1 data science cohort at Codeup. Assuming that Codeup randomly selects an alumni to put on a billboard, what are the odds that the two billboards I drive past both have data science students on them?

In [261]:
ds_odds = 0.25
# run a simulation of 10_000 ads
n_ads = 10**6
n_drivebys = 2

#two parallel arrays
data = np.random.random((n_drivebys, n_ads))
data

array([[0.85425235, 0.13785384, 0.66182168, ..., 0.12709319, 0.40816526,
        0.6542652 ],
       [0.25960736, 0.74486039, 0.87969511, ..., 0.78079438, 0.06388333,
        0.8315508 ]])

In [262]:
((data[0] <= ds_odds) & (data[1] <= ds_odds)).mean()

0.062289

4. Codeup students buy, on average, 3 poptart packages with a standard deviation of 1.5 a day from the snack vending machine. If on monday the machine is restocked with 17 poptart packages, how likely is it that I will be able to buy some poptarts on Friday afternoon? (Remember, if you have mean and standard deviation, use the np.random.normal)

In [376]:
days_per_week = 5
weeks_to_simulate = 10_000
simulations = np.random.normal(3, 1.5, (weeks_to_simulate, days_per_week))

In [377]:
(simulations.sum(axis=1) < 17).mean()

0.7244

5. 


Compare Heights

    - Men have an average height of 178 cm and standard deviation of 8cm.
    - Women have a mean of 170, sd = 6cm.
    - Since you have means and standard deviations, you can use np.random.normal to generate observations.
    - If a man and woman are chosen at random, what is the likelihood the woman is taller than the man?



In [203]:
n_choose = 2
number_of_pairs = 10_000
mens_heights = np.random.normal(178, 8, (number_of_pairs))
womens_heights = np.random.normal(170, 6, (number_of_pairs))

In [204]:
pairs = np.stack((mens_heights, womens_heights))
(pairs[0] < pairs [1]).mean()

0.2154

6. 


When installing anaconda on a student's computer, there's a 1 in 250 chance that the download is corrupted and the installation fails. What are the odds that after having 50 students download anaconda, no one has an installation issue? 100 students?

What is the probability that we observe an installation issue within the first 150 students that download anaconda?

How likely is it that 450 students all download anaconda without an issue?


In [237]:
p_fail = 1/250
n_of_students = 50
n_simulations = 10_000
installs = np.random.random((n_simulations, n_of_students))
installs

array([[0.98298008, 0.54676326, 0.92522503, ..., 0.87024776, 0.68355857,
        0.12730155],
       [0.83945577, 0.49974993, 0.09158153, ..., 0.34469976, 0.25953574,
        0.3594448 ],
       [0.64204596, 0.94624976, 0.97094925, ..., 0.83704684, 0.17498806,
        0.54654674],
       ...,
       [0.82856731, 0.39333717, 0.38534737, ..., 0.97772652, 0.28848062,
        0.17142751],
       [0.57567622, 0.39686719, 0.34115879, ..., 0.42785237, 0.2008449 ,
        0.77761522],
       [0.48553343, 0.12669893, 0.90853923, ..., 0.6005764 , 0.14012644,
        0.38369589]])

In [238]:
# probability that a class of 50 has no installation issues
fails = (installs < p_fail)
(fails.sum(axis = 1) < 1).mean()

0.8144

In [239]:
# probability that a class of 100 has no installation issues
n_of_students = 100
n_simulations = 10_000
installs = np.random.random((n_simulations, n_of_students))
installs

fails = (installs < p_fail)
(fails.sum(axis = 1) < 1).mean()

0.666

In [254]:
# probability there is a download issue within the first 150 downloads
n_of_students = 150
n_simulations = 10_000
installs = np.random.random((n_simulations, n_of_students))
installs

success_installs = (installs > p_fail)
(success_installs.sum(axis = 1) < 150).mean()

0.4438

In [256]:
# probability 450 students will install w/o problems
n_of_students = 450
n_simulations = 10_000
installs = np.random.random((n_simulations, n_of_students))
installs

success_installs = (installs > p_fail)
(success_installs.sum(axis = 1) < 450).mean()
success_installs

array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]])

7.

There's a 70% chance on any given day that there will be at least one food truck at Travis Park. However, you haven't seen a food truck there in 3 days. How unlikely is this?

How likely is it that a food truck will show up sometime this week?


In [257]:
fd_prob = 0.7
n_days = 3
n_simulations = 10_000
days_simulations = np.random.random((n_simulations, n_days))
days_simulations

array([[0.88760839, 0.96248482, 0.01160843],
       [0.2000118 , 0.56322203, 0.76017823],
       [0.39744404, 0.27734156, 0.24623003],
       ...,
       [0.27564627, 0.07412439, 0.66291749],
       [0.64461413, 0.22018861, 0.65764524],
       [0.03073175, 0.53682888, 0.22091196]])

In [305]:
# how many simulations never had a number < 0.7
taco_truck_days = (days_simulations > fd_prob)
(taco_truck_days.sum(axis = 1) == 0).mean()

0.0806

In [271]:
# probability a truck shows up this week

n_days = 7
n_simulations = 10_000
days_simulations = np.random.random((n_simulations, n_days))
days_simulations

taco_truck_days = (days_simulations < fd_prob)
(taco_truck_days.sum(axis = 1) > 1).mean()

0.9966

8. 


If 23 people are in the same room, what are the odds that two of them share a birthday? What if it's 20 people? 40?

In [348]:
n_in_room = 23
n_simulations = 10_000

#assign a birthday to each person
b_days = np.random.choice([x for x in range(1,366)], n_simulations*n_in_room)
b_days = b_days.reshape(n_simulations, n_in_room)
b_days

array([[ 33, 348, 152, ..., 341, 310,  71],
       [312,  18, 327, ...,  38,  87, 293],
       [ 51, 247, 178, ..., 175, 347, 142],
       ...,
       [164, 140, 315, ..., 279,  33, 249],
       [302,   5,   2, ...,  15,  84, 364],
       [  7, 280, 364, ...,  87, 214, 150]])

In [356]:
matches = []
for row in b_days:
    if len(set(row)) < 23 :
        matches.append(row)
len(matches)/10_000

0.5222

In [359]:
# if the number of people is 20
n_in_room = 20
n_simulations = 10_000

#assign a birthday to each person
b_days = np.random.choice([x for x in range(1,366)], n_simulations*n_in_room)
b_days = b_days.reshape(n_simulations, n_in_room)
b_days

matches = []
for row in b_days:
    if len(set(row)) < 20 :
        matches.append(row)
len(matches)/10_000

0.4127

In [360]:
# if the number of people is 40
n_in_room = 40
n_simulations = 10_000

#assign a birthday to each person
b_days = np.random.choice([x for x in range(1,366)], n_simulations*n_in_room)
b_days = b_days.reshape(n_simulations, n_in_room)
b_days

matches = []
for row in b_days:
    if len(set(row)) < 40 :
        matches.append(row)
len(matches)/10_000

0.8933

## bonus
Mage duel

In [453]:
# make dice
n_trials = 10_000
mage_1 = np.random.choice(list(range(1,5)), n_trials*6)
mage_1 = mage_1.reshape(n_trials, 6)
mage_2 = np.random.choice(list(range(1,7)), n_trials*4)
mage_2 = mage_2.reshape(n_trials, 4)

In [454]:
#now make an array of the sum of their rows
duel = np.stack((mage_1.sum(axis=1), mage_2.sum(axis=1)))

In [455]:
# true means mage_1 wins
(duel[0] >= duel[1]).mean()

0.6328

chuck a luck