# Simulation Exercises

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
import viz # curriculum example visualizations

np.random.seed(29)

Using the repo setup directions, setup a new local and remote repository named `statistics-exercises`. The local version of your repo should live inside of `~/codeup-data-science`. This repo should be named `statistics-exercises`.

Do your work for this exercise in either a python file named `simulation.py` or a jupyter notebook named `simulation.ipynb`.

In [2]:
#Set 100_000 as a number of trials for all exercises 
n_trials= 10 ** 5

1. How likely is it that you roll doubles when rolling two dice? 

In [20]:
n_dice = 2

rolls = np.random.choice([1, 2, 3, 4, 5, 6], size = (n_trials, n_dice))
rolls

array([[3, 4],
       [3, 5],
       [3, 4],
       ...,
       [3, 6],
       [6, 2],
       [6, 6]])

In [21]:
rolls[0, :1]

array([3])

In [29]:
doubles_count = 0
for i in range(n_trials):
    if rolls[i, :1] == rolls[i, 1:]:
        doubles_count += 1
doubles_count

16650

In [30]:
doubles_probability = doubles_count / ntrials
doubles_probability

0.1665

2. If you flip 8 coins, what is the probability of getting exactly 3 heads? What is the probability of getting more than 3 heads?

In [24]:
# 'Head', 'Tail'
n_coins = 8
#True for Heads, False for Tails
coins_flips = np.random.choice([True, False], size = (n_trials, n_coins))
coins_flips

array([[False,  True,  True, ...,  True, False, False],
       [ True, False, False, ...,  True, False,  True],
       [False, False,  True, ...,  True,  True,  True],
       ...,
       [ True, False, False, ..., False, False,  True],
       [ True, False,  True, ...,  True,  True,  True],
       [False, False,  True, ..., False, False,  True]])

In [25]:
#create an array that sums all True (Heads)
heads_count = coins_flips.sum(axis=1)

In [26]:
heads_3 = 0
for c in heads_count:
    if c == 3:
        heads_3 += 1
heads_3

21998

In [28]:
heads_probability = heads_3 / n_trials
display(heads_probability)

0.21998

3. There are approximately 3 web development cohorts for every 1 data science cohort at Codeup. Assuming that Codeup randomly selects an alumni to put on a billboard, what are the odds that the two billboards I drive past both have data science students on them?

In [72]:
n_billboards = 2
students = np.random.choice(['DS', 'web1', 'web2', 'web3'], size = (n_trials, n_billboards))
students = pd.DataFrame(students, columns = ['first', 'second'])
students

Unnamed: 0,first,second
0,DS,web3
1,web3,web3
2,web2,web2
3,web3,web2
4,DS,web3
...,...,...
99995,web3,web2
99996,DS,DS
99997,web1,web2
99998,web2,web1


In [80]:
students['DS'] = (students['first'] == 'DS') & (students['second'] == 'DS')
display(students)
students.DS.sum()

Unnamed: 0,first,second,DS
0,DS,web3,False
1,web3,web3,False
2,web2,web2,False
3,web3,web2,False
4,DS,web3,False
...,...,...,...
99995,web3,web2,False
99996,DS,DS,True
99997,web1,web2,False
99998,web2,web1,False


6201

In [68]:
# solution with numpy array
students1 = np.random.choice(['DS', 'web1', 'web2', 'web3'], size = (n_trials, n_billboards))
s_count = 0
for s in students1:
    if s[0] == 'DS' and s[1] == 'DS':
        s_count += 1
s_count / n_trials

0.06213

4. Codeup students buy, on average, 3 poptart packages with a standard deviation of 1.5 a day from the snack vending machine. If on monday the machine is restocked with 17 poptart packages, how likely is it that I will be able to buy some poptarts on Friday afternoon? (Remember, if you have mean and standard deviation, use the np.random.normal) You'll need to make a judgement call on how to handle some of your values.

In [113]:
stack = 17
mu = 3
sigma = 1.5
n_days = 5
#random buys for 5 days of week mon to fri
poptarts = np.random.normal(mu, sigma, size = (n_trials, n_days))
poptarts

array([[ 1.66364854,  3.9741408 ,  4.46893152,  5.05349235,  1.28526485],
       [ 5.34259414,  3.63936205, -0.60763753,  2.5101858 ,  4.80290744],
       [ 3.10302716,  4.03647717,  3.25565699,  1.73108976,  4.41178658],
       ...,
       [ 3.59195508,  4.65811161, -0.47587617,  2.30423293,  2.93002948],
       [ 0.52935102,  3.96366822,  0.5855362 ,  5.66207119,  2.65875443],
       [ 3.27668149,  4.81250614,  6.09775479, -0.30543894,  4.59810632]])

In [116]:
# check how many items bought per week
buys = poptarts.sum(axis = 1)
display(buys)
(buys < stack).mean()

array([16.44547807, 15.6874119 , 16.53803766, ..., 13.00845292,
       13.39938107, 18.4796098 ])

0.72548

5. Compare Heights

- Men have an average height of 178 cm and standard deviation of 8cm.
- Women have a mean of 170, sd = 6cm.
- Since you have means and standard deviations, you can use `np.random.normal` to generate observations.
- If a man and woman are chosen at random, what is the likelihood the woman is taller than the man?

In [3]:
men = np.random.normal(178, 8, n_trials)
women = np.random.normal(170, 6, n_trials)
display(men)
display(women)

array([174.66014299, 183.64825677, 193.32787762, ..., 177.69471557,
       183.16628375, 183.60317424])

array([161.37004324, 171.32657141, 158.59955254, ..., 180.00215376,
       176.85589682, 164.05611134])

In [10]:
random_pairs = pd.DataFrame({'Men' : men, 'Women' : women})
random_pairs['Difference'] = (couple.Men - couple.Women) < 0
random_pairs.Difference.mean()

0.21252

In [12]:
random_pairs.Difference.sum() / n_trials

0.21252

6. When installing anaconda on a student's computer, there's a 1 in 250 chance that the download is corrupted and the installation fails. 
- What are the odds that after having 50 students download anaconda, no one has an installation issue? 100 students?
- What is the probability that we observe an installation issue within the first 150 students that download anaconda?
- How likely is it that 450 students all download anaconda without an issue?

In [41]:
def count_success_prob(number):
    '''
    Takes a number of installations as an argument
    Probability of success is 249/250
    Probability of failure is 1 / 250
    Creates n_trials (100_000) simulations
    Returns a probability of success, when all trials are True
    # True - no issues
    # False - there was a problem during the instalation 
    '''
    arr = np.random.choice([True, False], size = (n_trials, number), p = [249/250, 1/250])
    return (arr.sum(axis = 1) == number).mean()

In [42]:
def count_fail_prob(number):
    '''
    probability of failure = 1 - probability of success
    '''
    return 1 - count_success_prob1(number)

In [44]:
#What are the odds that after having 50 students download anaconda, no one has an installation issue?
count_success_prob(50)

0.81817

In [45]:
# 100?
count_success_prob1(100)

0.67085

In [46]:
count_fail_prob(150)

0.44753

In [47]:
# How likely is it that 450 students all download anaconda without an issue?
count_success_prob(450)

0.16685

In [40]:
count_success_prob1(450)

0.16458

7. There's a 70% chance on any given day that there will be at least one food truck at Travis Park. 
- However, you haven't seen a food truck there in 3 days. How unlikely is this?
- How likely is it that a food truck will show up sometime this week?

In [54]:
# You haven't seen a food truck there in 3 days. How unlikely is this?
food_truck = np.random.choice([True, False], size = (n_trials, 3), p = [0.7, 0.3])
(food_truck.sum(axis = 1) == 0).mean()

0.02621

In [57]:
# How likely is it that a food truck will show up sometime this week?
food_truck1 = np.random.choice([True, False], size = (n_trials, 7), p = [0.7, 0.3])
(food_truck1.sum(axis = 1) >= 1).mean()

0.99974

8. If 23 people are in the same room, what are the odds that two of them share a birthday? What if it's 20 people? 40 people?

In [59]:
days_in_year = range(1, 366)
len(days_in_year)

365

In [71]:
def same_birthday_prob(number_of_people):
    days_in_year = range(1, 366)
    df = pd.DataFrame(np.random.choice(days_in_year, size=(n_trials, number_of_people)))
    return (df.nunique(axis = 1) < number_of_people).mean()                        

In [72]:
same_birthday_prob(23)

0.50596

In [97]:
same_birthday_prob(20)

0.4109

In [98]:
same_birthday_prob(40)

0.89221

In [93]:
def same_birthday_prob1(number_of_people):
    days_in_year = range(1, 366)
    arr = np.random.choice(days_in_year, size=(n_trials, number_of_people))
    count = 0
    for trial in arr:
        if len(np.unique(trial)) < number_of_people:
            count += 1
    return count / n_trials

In [94]:
same_birthday_prob1(23)

0.50716

In [95]:
same_birthday_prob1(20)

0.41146

In [96]:
same_birthday_prob1(40)

0.89212