## Probability Theory Pt.3

#### example: The birthday problem

What is the probability that 2 or more people in a room of 25 people will have the same birthday?

Simulate:

1. let 0 to 364 stand for days in the year
2. draw 25 with replacement and record 'yes' if there is one or more duplicate, otherwise record 'no'
3. repeat this 1000 times calculate the proportion of duplicates

In [3]:
import numpy as np
np.random.seed(123)

# create array of days to draw from
days = np.array(range(365))
days

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 18

In [4]:
# like our deck creation lets create 1000 years to take samples from
years = np.array([days] * 1000)
years

array([[  0,   1,   2, ..., 362, 363, 364],
       [  0,   1,   2, ..., 362, 363, 364],
       [  0,   1,   2, ..., 362, 363, 364],
       ...,
       [  0,   1,   2, ..., 362, 363, 364],
       [  0,   1,   2, ..., 362, 363, 364],
       [  0,   1,   2, ..., 362, 363, 364]])

In [5]:
from collections import Counter

# take 1000 samples of 25 birthdays
def sample_birthdays(year):
    return np.random.choice(year, 25, replace=True)

birthdays = np.apply_along_axis(sample_birthdays, 1, years)

# count the rows with 2 or more same birthdays
def has_duplicate(birthday):
    counts = Counter(birthday).values()
    if len(set(counts)) > 1:
        return True
    return False

sum(np.apply_along_axis(has_duplicate, 1, birthdays)) / 1000

0.594

#### example: three daughters among four children

What is the probability that exactly three of the four children in a four child family are daughters?

1. Using coins let 'heads' be boy and 'tails' be girl
2. throw four coins
3. if there are exactly three tails record 'yes' else record 'no'
4. repeat step 2 two hundred times
5. count the proportion 'yes'

In [13]:
from collections import Counter

# 0 is heads, 1 is tails
coin = np.array([0,1])

# choose head/tail four times then repeat that 200 times
births = np.array([[np.random.choice(coin) for _ in range(4)] for _ in range(200)])

# return True if tails("1") count is == 3
def has_three_girls(birth):
    if Counter(birth)[1] == 3:
        return True
    return False

sum(np.apply_along_axis(has_three_girls, 1, births)) / 200

0.27

If we are to take into account the biological fact that more males are born than females 52:48 will slightly change our experiment.

Well add a weighted coin that comes up heads 52% of the time and tails 48% of the time

In [14]:
# now create a coin with 52% chance heads("0")
# and 48% tails("1")
choices = np.array([0,1])
weighted_coin = np.repeat(choices, [52,48])

# choose head/tail four times then repeat that 200 times
births = np.array([[np.random.choice(weighted_coin) for _ in range(4)] for _ in range(200)])

# return True if tails("1") count is == 3
def has_three_girls(birth):
    if Counter(birth)[1] == 3:
        return True
    return False

sum(np.apply_along_axis(has_three_girls, 1, births)) / 200

0.305

#### Binomial trials

three daughters four births problem is known as "binomial sampling experiment". fundamental property of binomial processes is the independence of the trials. Equaly likely outcomes mean we assume that the probability of girl or boy is the same.

#### example: three or more basketball shots in five attempts

What is the probability a basketball player will score three or more baskets in 5 shots if they succeed on average 25% of the time?

Here the probability of success isn't 50-50. So we can't just use our coin example

1. Let 'spade' stand for success and all the other suits stand for miss
2. Draw a card, record it's suit and replace it. repeat 5 times
3. Record if step 2 was three or more
4. Repeat steps 2-4 400 times
5. Count proportions yes out of 400 throws

Below we are going to be using a similar method that we used for the weighted coinflip above

In [15]:
# create an array with 75 0s and 25 1s
choices = np.array([0,1])
shot_probability = np.repeat(choices, [75,25])

# create our array of shots (400,5) chosen from our shot_probability array
shots = np.array([[np.random.choice(shot_probability) for _ in range(5)] for _ in range(400)])

num_scores = np.sum(shots, axis=1)

sum(num_scores >= 3) / 400


0.115

#### example: one in the black(bullseye), two in the white, and no misses in three archery shots

1. let 1=bullseye 2-7=in the white and 8-10=miss
2. choose three random numbers and examine wheter ther is one '1' two numbers '2-7' if so record 'yes' else 'no'
3. Repeat step 2 perhaps 400 times

traditionally this would be handled with the Multinomial Distribution

In [16]:
choices = [1,2,3,4,5,6,7,8,9,10]

shots = np.array([[np.random.choice(choices) for _ in range(3)] for i in range(400)])
shots

array([[8, 8, 8],
       [9, 2, 1],
       [5, 1, 7],
       ...,
       [4, 2, 4],
       [9, 4, 1],
       [3, 8, 1]])

In [17]:
# if there was one bullseye check
# to see if the other numbers are in [8,9,10]
# if they all aren't return True
def one_black_two_white(shot):
    if Counter(shot)[1] == 1:
        if sum([i in shot for i in [8,9,10]]) == 0:
            return True
    return False
    
bull_two_hits = np.apply_along_axis(one_black_two_white, 1, shots)

sum(bull_two_hits) / 400

0.09

#### example: two groups of heart patients

want to learn how likely it is that group A would have as little as two deaths more than group B

~ | Live | Die
------|------|----
Group A | 79 | 11
Group B | 21 | 9

this problem is the prototype of a problem we will consider later (chi square distribution)

1. put 120 balls in urn, 100 white (live) and 20 black (die)
2. Draw 30 balls and assign them to group B; assign the rest to group A
4. Count the number of black balls in the two groups and determine Group A's excess 'deaths' (black balls) compared to group B is two or fewer (eqivalent to whether there are 11 or fewer black balls in group A)
    - repeat 1000 times and compute proporion yes

In [18]:
# 0 live 1 die
choices = [0,1]
patients = np.repeat(choices,[100,20])
# shuffle since we are sampling by index
np.random.shuffle(patients)

groupAsums = []
for i in range(1000):
    indexes = list(range(120))
    groupBind = np.random.choice(indexes, 30, replace=False)
    groupB = patients[groupBind]
    groupA = np.delete(patients, groupBind)
    groupAsums.append(sum(groupA))

In [19]:
sum(np.array(groupAsums) <= 11) / 1000

0.024

#### example: dispersion of a sum if random variables - hammer lengths - heads and handles

The distribution of lengths for hammer handles is as follows: 20% are 10 inches, 30% are 10.1 inches, 30% are 10.2, 20% are 10.3 inches

The distribution of lengths for hammer heads is a s follows: 20% 2 inches, 20% 2.1 inches, 30% 2.2 inches, 20% 2.3 inches, 10% 2.4 inches

What proportion of heads and handles drawn at random will be equal or longer than 12.4 inches?


1. fill an urn with 2 balls marked 10 in, 3 balls marked 10.1 inches, 3 balls marked 10.2 inches and 2 marked 10.3 inches. fill another urn with 2 balls marked 2 inches, 2 balls marked 2.1 inches, 3 balls marked 2.2 inches, 2 balls marked 2.3 inches and 1 ball marked 2.4 inches
2. pick a ball from each urn and calulate the sum
3. repeat 200 times and calculate the proportion greater or equal to 12.4 inches?

In [21]:
handles = [10,10,10.1,10.1,10.1,10.2,10.2,10.2,10.3,10.3]
heads = [2,2,2.1,2.1,2.2,2.2,2.2,2.3,2.3,2.4]

hammer_lengths = np.array([np.random.choice(handles) + np.random.choice(heads) for _ in range(200)])
sum(hammer_lengths >= 12.4) / 200

0.295

#### example: flipping pennies to the end

two players each with 10 pennies. A coin is tossed if it's heads player A gives player B a penny. If it is tails player B gives player A a penny. What is the probability one player will loose all his pennies if they play for 200 tosses?

1. 1-5 = 'head' = +1 ; 6-10 = 'tail' = -1
2. proceed down a series of 200, keeping a running tally of the '+1' and the '-1'. if the tally reaches +10 or -10 on or before the two hundreth flip, record yes else no
3. repeat step 2 400 times and calculate the proportion 'yes'


In [22]:
coin = [0,1]
games = []
for _ in range(400):
    playerA = 10
    playerB = 10

    flips = [np.random.choice(coin) for _ in range(200)]
    for flip in flips:
        if flip == 0:
            playerA -= 1
            playerB += 1
        elif flip == 1:
            playerB -= 1
            playerA += 1
        
        if playerA == 0 or playerB == 0:
            games.append(True)
    games.append(False)

# calculate the proportion of games someone lost their pennies
sum(np.array(games)) / 400

8.3

#### example: state liquor store problem

[idk see text for this one]

1. write 42 prices on a card and shuffle
2. draw cards randomly with replacement into groups of 16 and 26 cards. calculate the mean price difference between the groups and compare the simulation trial difference to the observed mean difference of $.49. if it is greater record 'yes' else 'no'
3. repeat step 2 1000 times calculate proportion 'yes'

In [23]:
price_differences = np.array([np.mean(np.random.randint(100,150,26)) - np.mean(np.random.randint(100,150,26)) for _ in range(1000)])
sum(price_differences >= 0.49) / 1000

0.438