## Probability Theory Pt.4: Estimating Probabilities from Finite Universes

Now we are concerned with situations in which you will be given a finite set of objects. Sampling without replacement occurs when items are chose from a finite universe.

#### Building block problems

case 1: six balls labeled between 1-6. What is the probability of choosing balls 1,2,3 in that order if we choose three balls without replacement?

P(123) = 1/6 * 1/5 * 1/4 = 1/120

[see book for diagrams]

case 2: same setup as case 1 but now ask the probability of choosing 1,2,3 in any order if we choose three balls without replacement? 

P(123 any order) = 3/6 * 2/6 * 1/6 = 6/216 = 1/36

case 3: odd number balls 1,3,5 are painted red and even numbered balls 2,4,6 are painted black. what is the probability of getting a red ball then a black ball in that order?

P(red then black) = 3/6 * 3/5 = 9/30

case 4: What is the probability of getting two red balls and one black ball in any order?

P(2 red 1 black) = 3/6 * 2/5 * 3/4

case 5: probability of getting ball one on the first draw or ball two on the second draw or ball three on the third draw?

#### example: what is the probability of selecting four girls and one boy when selecting five students from twenty five girls and twenty five boys?

(sampling without replacement with two outcomes when order doesn't matter)

1. let 0-25 = girls, 26-50 = boys
2. select five non-duplicate numbers. count whether there are 4 numbers 0-25 and one number 26-50 if so record 'yes' else record 'no'
3. repeat step 2 400 times and count proportion yes

below I actuall think we're gonna do this using 25 0s and 25 1s because I think the code for confirmation will be a bit simpler

In [3]:
import numpy as np
from collections import Counter
np.random.seed(123)

# 0=girl 1=boy
kids = np.repeat([0,1], [25,25])

# sample 5 without replacement 400 times
selections = np.array([np.random.choice(kids, 5, replace=False) for i in range(400)])

def four_girls_one_boy(selection):
    if Counter(selection) == {0:4, 1:1}:
        return True
    return False

results = np.apply_along_axis(four_girls_one_boy, 1, selections)
sum(results) / 400

0.165

#### example: nine spades and four clubs in bridge hand

(multiple outcome sampling without replacement order doesn't matter)



In [26]:
# 0 = spade, 1 = clubs
deck = np.repeat([0,1,2,3], [13]*4)

hands = np.array([np.random.choice(deck, 13, replace=False) for _ in range(1000)])

def nine_spades_four_clubs(hand):
    if Counter(hand) == {0:9,1:4}:
        return True
    return False

total_hands = sum(np.apply_along_axis(nine_spades_four_clubs, 1, hands))

total_hands / 1000

0.0

#### example: A total of fifteen points in a bridge hand

where A = 4, K = 3, Q = 2, J = 1

now we actually have to use a real deck

In [38]:
deck = np.array([x for x in range(1,14)] * 4)

hands = [np.random.choice(deck, 13, replace=False) for _ in range(1000)]

def points(hand):
    count = Counter(hand)
    return count[13] * 4 + count[12] * 3 + count[11] * 2 + count[10] 

points = np.apply_along_axis(points, 1, hands)

sum(points == 15) / 1000

0.034

#### example: four girls then one boy from 25 girls and 25 boys order matters

sampling without replacement

1. Let [0] 25 times for girls and [1] 25 times for boys
2. sample 5 students from the group. if series == [0,0,0,0,1] record 'yes' else record 'no'
3. repeat step 2 1000 times count proportion 'yes'

In [46]:
kids = np.repeat([0,1],[25,25])

selections = np.array([np.random.choice(kids, 5, replace=False) for _ in range(1000)])

four_girls_one_guy = lambda sample: sample == [0,0,0,0,1]

results = np.apply_along_axis(four_girls_one_boy, 1, selections)

sum(results) / 1000

0.154

#### example: Four or more couples getting their own partners when ten couples are paired randomly

Ten couples come to a party. The host pairs them at random for the first dance. What is the chance that four or more couples will get the partners they came with?

This one uses a card example but I don't think well use cards for our simulation

1. A-10 of hearts be first partners, A-10 spades be other partners
2. shuffle the hearts and deal then in a row; shuffle the spades and deal them in a row
3. count the pairs - pair is one card from the hard row one card from the spade row that contain the same denomination. if 4 or more pairs match record 'yes', else record 'no'
4. Repeat steps 2 & 3 200 times
5. Count proportion of 'yes'

In [73]:
partners1 = np.arange(10)
partners2 = np.arange(10)

pairs = []
for i in range(200):
    # draw from partners
    p1 = np.random.choice(partners1, 10, replace=False)
    p2 = np.random.choice(partners2, 10, replace=False)
    # add partner pairs to array
    pairs.append([p1, p2])

pairs = np.array(pairs)

In [77]:
# add up the number of couples that are the same
sum_pairs = lambda couples: sum(couples[0] == couples[1])

n_matched_couples = []
for pairing in pairs:
    n_matched_couples.append(sum_pairs(pairing))

sum(np.array(n_matched_couples) >= 4) / 200

0.01

#### example: matching hats another famous problem

The hat checker at a restaurant mixes up the hats of a party of 6 men. What is the probability that at least one will get his own hat?

Same problem just hats instead of couples.

In [98]:
men = np.arange(6)
hats = np.arange(6)

hat_pairings = np.array([(np.random.choice(men,6,replace=False), np.random.choice(hats,6,replace=False)) for _ in range(1000)])

matches = np.apply_along_axis(lambda pair: pair[0] == pair[1], 1, hat_pairings)
num_matches = matches.sum(1)

sum(num_matches >= 1) / 1000

0.648

#### example: twenty executives are to be assigned to two divisions of a firm

What are the probabilities of best 10 executives of the twenty being split among two divisions in ratios of 5 and 5, 4 and 6, 3 and 7, ect. if they are randomly assigned

1. Put 10 balls labeld 'w' for worst 10 balls labeled 'b' for best in an urn
2. Draw 10 balls without replacement and count 'w's
3. Repeat 400 times
4. Count the number of times each split occurs

In [102]:
executives = np.repeat([0,1], [10,10])

splits = np.array([np.random.choice(executives, 10, replace=False) for _ in range(400)])

np.histogram(splits.sum(1), bins=[1,2,3,4,5,6,7,8,9,10])

(array([  1,   8,  34, 114, 128,  84,  29,   2,   0]),
 array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]))

#### example: executives moving

Chain moves its store managers to new cities every three years. New locations are drawn at random. Allowed to draw the location that they are currently in. What are the probabilities that 1,2,3... will get their present post again if there are 30 managers?

1. number balls 1-30 and put them in urn A randomly. number balls 1-30 and put them in urn B.
2. draw a row of 10 balls from urn A, next to that draw a row of 10 balls from urn B
3. count how many of the pairs have the same number
4. repeat 2 & 3 1000 times then count in the scoreboard numbers 0, 1, 2, 3

In [107]:
executives = np.arange(30)
locations = np.arange(30)

same_place = []
for _ in range(1000):
    num_same = sum(np.random.choice(executives, 30, replace=False) == np.random.choice(locations, 30, replace=False))
    same_place.append(num_same)

Counter(same_place)

Counter({4: 14, 1: 363, 2: 180, 0: 377, 3: 62, 5: 3, 6: 1})

#### example: state liquor systems again

26 private-ownership states: $4.82, $5.29, $4.89, $4.95, $4.55, $4.90,
$5.25, $5.30, $4.29, $4.85, $4.54, $4.75, $4.85, $4.85, $4.50, $4.75,
$4.79, $4.85, $4.79, $4.95, $4.95, $4.75, $5.20, $5.10, $4.80, $4.29

16 monopoly states: $4.65, $4.55, $4.11, $4.15, $4.20, $4.55, $3.80,
$4.00, $4.19, $4.75, $4.74, $4.50, $4.10, $4.00, $5.05, $4.20

1. write each of the 42 prices on a card
2. draw cardsa randomly with out replacement into groups of 16 and 26 cards. compute the mean price difference between the groups. compare the simulation trial difference to the observed difference of $0.49 if it is greater record 'yes' else record 'no'
3. Repeat step 2 1000 times

In [113]:
private = [4.82, 5.29, 4.89, 4.95, 4.55, 4.90,
5.25, 5.30, 4.29, 4.85, 4.54, 4.75, 4.85, 4.85, 4.50, 4.75,
4.79, 4.85, 4.79, 4.95, 4.95, 4.75, 5.20, 5.10, 4.80, 4.29]

monopoly = [4.65, 4.55, 4.11, 4.15, 4.20, 4.55, 3.80,
4.00, 4.19, 4.75, 4.74, 4.50, 4.10, 4.00, 5.05, 4.20]

diffs = []
for _ in range(1000):
    choices = np.random.choice(private + monopoly, 42, replace=False)
    diff = np.mean(choices[:16]) - np.mean(choices[16:42])
    diffs.append(diff)

sum(np.array(diffs) > 0.49)

0

In [114]:
np.histogram(diffs)

(array([  8,  29,  96, 156, 253, 216, 138,  67,  33,   4]),
 array([-0.36802885, -0.29291346, -0.21779808, -0.14268269, -0.06756731,
         0.00754808,  0.08266346,  0.15777885,  0.23289423,  0.30800962,
         0.383125  ]))

#### example: coumpound problem: 5 or more spades in a bridge hand, and four girls and a boy in a 5 child family

1. use a bridge deck and 5 coins w/ 'heads' = girl
2. Deal 13 card hand count the spads. If 5 or more spades record 'yes' and continue, if not record 'no' end trial
3. Throw 5 coins count heads, if 4 heads record 'yes' otherwise record 'no'
4. repeat 2-3 1000 times. Count proportion of 'yes' in both trials

In [126]:
deck = np.repeat([1,0], [13, 43])
coins = np.repeat([0,1], [5,5])

number_spades = np.array([np.random.choice(deck, 13, replace=False) for _ in range(1000)]).sum(1)
greater_equal_5 = number_spades >= 5

number_heads = np.array([np.random.choice(coins, 5) for _ in range(1000)]).sum(1)
four_heads = number_heads == 4

sum(np.array([greater_equal_5, four_heads]).sum(0) == 2) / 1000

0.026