## Problem 1

A bag contains a chip, known to be either white or black. A white chip is put in, the bag is shaken, and a chip is drawn out, which proves to be white. What is now the chance of drawing a white chip?

#### 1-A) Given this function that simulates the game $n$ times, where $n$ is some positive integer input, include comments in here.

In [24]:
import numpy as np
def chip_game(n):
    results = []
    count = 1
    # Looping over n interations
    while count < n:
        # Initializing the bag with chips
        bag = ['white']
        bag.append(np.random.choice(['white','black']))
        # Choosing a random chip that is either black or white
        first_chip = np.random.choice([0,1])
        if bag[first_chip] == 'white':
            # If chip is white, remove it from the bag and add what is left to our results
            bag.pop(first_chip)
            results.append(bag)
            count += 1
        # Calculating the percent of times the chip left in th bag was white
    return float(results.count(['white'])) / len(results)

#### 1-B) Simulate this game 5 times. Record your results. What do you conclude about them?

In [2]:
results = chip_game(6)
print(results)

0.6


After five simulations it appears that there is approximately 75% chance that the chip pulled was white

#### 1-C) Simulate this game 50 times. Record your results. What do you conclude about them?

In [3]:
results = chip_game(50)
print(results)

0.551020408163


This time the value is somewhat less, but likely converging on a more accurate result. Something close to 60% chance to draw the white chip.

#### 1-D) Simulate this game 50,000 times. Record your results. 

In [4]:
results = chip_game(50000)
print(results)

0.665533310666


#### 1-E) By hand, find the probability that the second chip drawn is white. How does this match up with your answers from 1-B, 1-C, and 1-D?

There are two scenarios. Either the bag had a white chip, or the bag had a black chip.
If the bag had a white chip and we add a white chip, there is a 100% chance we will draw a white chip next.
If the bag had a black chip, then there is a 0% chance the next chip drawn will be white.

Let's start be find the probability we would have drawn a white chip first.

We need to calculate the probability we are in a given scenario based on the first event we have witnessed.

So, how many ways could we have drawn a white chip first in the this scenario. We could have drawn the first chip that was in the bag to start, or we could have draw the chip we added to the bag. So there are two total scenarios.


```#P(scenario_1) + P(scenario_2)
P(scenario1) = original/total_chips + new/total_chips # in this case they are both white
P(scenario2) = new/total_chips # only the new chip is white```



Looking at the above, we can see there are 3 events, that individually are all equally likely, 2 are in the first scenario, and 1 is in the second scenario. Of these 3 events, both events from the first scenario result in a white chip still remaining in the bag. So we can conclude that the probability of the first scenario, divided by the probability of screnario1 + scenario2 should equal the probability that we will draw a white chip next.

```P = P(scenario1)/(P(scenario1) + P(scenario2))
P = (1/2 + 1/2)/((1/2+1/2) + 1/2)```

In [5]:
P = (1/2.0 + 1/2.)/((1/2.+1/2.) + 1/2.)
print(P)

0.666666666667


We can see as we increase teh number of trials, we get closer to the true value. We can also see at low trials, it is not possible for their to be enough results to get a resolution high enough to resolve the true value. There is no possible combination of 5 simulation that could result to an average of .6667. The closest decimal to .6667 that can be multiplied by 5 to get a whole number is .6.

## Problem 2

Two players are playing a game where they flip a *not necessarily fair* coin, starting with Player 1. The first person to flip heads wins. The probability that a coin flipped lands on heads is p. What is the probability that Player 1 will win the game?

#### 2-A) Build a function that simulates this game with two inputs: n (the number of times to simulate this game) and p (the probability of flipping heads).

In [25]:

#returns the percent of times the event with probability p occured in n simulations, and the number of rounds it
#took on average to find a winner
def coin_flip (n, p):
    wins = 0
    rounds = []
    for i in range(0,n):
        game_over = False
        # This is a loop for the number of turns it takes to find a winner, it should ideally never hit the
        # max of 10000, but it's better to have it then risk an infinite loop.
        for j in range(1,10000):
            # If P is greater than the random number created, then we are saying the win event took place
            if(p >= np.random.random()):
                #It only counts as a win if you win on your turn, so we need to mod i by 2
                if j%2 != 0:
                    wins+=1
                #Regardless of who the winner is, if the heads event occurs the game is OVER
                rounds.append(j)
                break
            j+=1
            # Here I am keeping track of games that never result in a win. Specifically I am addressing the use
            # case of p=0.
            if(j==10000):
                rounds.append(j)
                
    return float(wins/float(n)), sum(rounds)/float(len(rounds))
  
    

#### 2-B) Simulate this game 10,000 times each for p = 0.1, 0.25, 0.33, 0.5, 0.67, 0.75, 0.9, and 1. Record your results. Based on this, can you come up with an estimate of the probability that player 1 will win the game?

(_Hint: the answer will be a function of the probability of heads $p$ and the probability of tails $1-p$._)

In [7]:
p = [0.1, 0.25, 0.33, 0.5, 0.67, 0.75, 0.9, 1]

results = list(map(lambda a: coin_flip(10000,a)[0], p))

In [8]:
print(results)

[0.5175, 0.5732, 0.6046, 0.6736, 0.7551, 0.8032, 0.9127, 1.0]


The way I wrote this method, the results are the probability that player one wins the game. You can clearly see the advantage of going first.

#### 2-C) Suppose you simulate this game with p = 0. What do you expect to happen? Why?

I expect my method to return 0 wins, and take a long time to run. I wrote it so that it would not infinite loop by putting a max number of rounds until the game ends in a tie. Had I not do that, it would have looped forever waiting for a heads to appear.

In [9]:
import time
time1 = time.time()
print("number of wins: " + str(coin_flip(10000, 0)))
print("total time to run: " + str(time.time()-time1))

number of wins: (0.0, 10000.0)
total time to run: 34.2978339195


## Problem 3

Suppose I want to build two functions: `random_sample(k,data)` and `stratified_sample(k,var,data)`.

`random_sample(k,data)` should take a **simple random sample** of $k$ observations **without replacement** from `data`.

`stratified_sample(k,var,data)` should take a **stratified random sample** of $k$ observations **without replacement** from `data`, where the variable on which you stratify is `var`. (Hint: Use the simple random sample function you defined above, then find the percentage of observations from each stratum in `var`, then randomly select from each stratum in `var`, then combine the results.)

Build these functions.

In [26]:
def random_sample(k, data):
    return np.random.choice(data, k, replace=False)


def stratified_rand_sample(k, var, data):
    #I am going to assume var is a list of boolean arrays that filters the data into unique sets
   
    result = []
    for v in var:
        k_ = sum(filter(lambda a: a, v))/float(len(data))
        result.append(random_sample(int(round(k*k_)), list(data[v])))

    return result
        

## Problem 4

In a class of 50 students, there are 20 men and 30 women. Exactly one man is named `Sam` and exactly one woman is named `Miranda`.

4-A) What is the probability that a simple random sample of 5 people contains two men and three women?

4-B) What is the probability that a simple random sample of 5 people contains both Sam and Miranda?

4-C) What is the probability that a stratified random sample (stratified on sex) of 5 people contains both Sam and Miranda?

We are going to initialize a data set where men are even and women are odd. We can then use mod to count them for easy math, and use their numbers as unique IDs.

In [30]:
men = np.array(range(0,40, 2)) # Men are even numbers
women = np.array(range(1,60, 2))# Women are odd numbers
data = np.append(men, women) # creating the data set
print(data)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38  1  3  5  7  9
 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59]


In [31]:
def hacker_prob(k, condition, data, iterations=10000):
    count = 0
    # We are iterating over a large number by default, but we can up the accuracy if we would like
    for i in range(0,iterations):
        temp = random_sample(k,data)
        if condition(temp):
            count+=1
    return float(count/float(iterations))

In [33]:
# We are summing our array because we are passing in 1 and 0s (after we . The number N is the number of woman
# So summing is the same as counting each occurance of woman.
cond = lambda temp: int(sum(temp%2)) == 3
print("The probability of exactly 3 women with 5 people is: " + str(hacker_prob(5, cond, data)))

The probability of exactly 3 women with 5 people is: 0.3664


In [32]:
#We are now trying to find two arbitrary members in the group of people. we can use 0 and 1 to represent sam and miranda

cond = lambda temp: all(x in temp for x in [0,1])
print("The probability of both sam and miranda in a group of with 5 people is: " + str(hacker_prob(5, cond, data)))

The probability of both sam and miranda in a group of with 5 people is: 0.0083


In [15]:
var_filter_men = data % 2 == 0
var_filter_women = data % 2 != 0
var_filter = [var_filter_men, var_filter_women]
    
cond = lambda temp: all(x in temp[i] for (i,x) in enumerate([0,1]))

def hacker_prob_2(k, condition, data, iterations=1000000):
    count = 0
    # We are iterating over a large number by default, but we can up the accuracy if we would like
    for i in range(0,iterations):
        temp =stratified_rand_sample(k, var_filter, data)
        if all(x in temp[i] for (i,x) in enumerate([0,1])):
            count+=1
    return float(count/float(iterations))



In [16]:

print("The probability of both sam and miranda in respective groups: " + str(hacker_prob_2(5, cond, data)))

The probability of both sam and miranda in respective groups: 0.010061


## Below I am working out the math by hand to see if my hacker stats lines up.

In [17]:
p1 = 44*48*47*46*45
p2 = 50*49*48*47*46
p3 = 49*48*47*46*45

In [18]:
#The probability that I would not draw sam or maranda
t = 1-((p3/float(p2))*2 - p1/float(p2))


In [19]:
t

0.008163265306122436

In [20]:
w1 = 29* 28* 27.
w2 = 28 * 27*16
w = 30*29*28.

m1  = 19*18.
m = 20*19.
m2 = 1

p = m1/m* w1/w

In [21]:
t1 = 1/30. + 1/29. + 1/27.
t2 = 1/20. + 1/19.
t1 *t2

0.010761242185924582

In [22]:
20/50. * 19/49. * 30/48.* 29/47. * 28/46. * 10 #multiplying by 10 for 5 choose 3)

0.3640808774943835

In [23]:
#what is the probability you will select sam and miranda

1 -((2*(49.*48*47*46*45)/(50*49*48*47*46)) - ((48*47*46*45*44.)/(50*49*48*47*46)))

0.008163265306122436

So my hacker stats lines up with my math stats, but I'm not totally sure either one is right.

