# Monty Hall

If you've ever heard of the "Monty Hall Problem," it is a famous problem in statistics based on the game show "Let's Make a Deal." (Monty Hall was the original host of this game show.) If you haven't heard of this game show, no worries.

As part of "Let's Make a Deal," there are three doors labeled "A," "B," and "C." You are the contestant and are informed that behind exactly one door, there is a new car. Behind the other two doors are goats. Obviously, your goal as the contestant is to select the door with the car.

The game goes as follows:
1. You select a door.
2. The game show host, knowing which door hides the car, opens one of the doors you do not select to reveal a goat. (Important: If you selected a door with a goat, the host picks the other door with a goat. If you started by selecting the door with the car, the host picks from the remaining two doors at random.)
3. The host then asks you if you would like to stick with the door you originally picked, or if you would want to switch to the other remaining door.

Question 1: Suppose you pick a door. The host opens one of the remaining doors. You are then asked to stick with your original door or to switch to the remaining door. Based on your intuition, is it more advantageous to stick with your original door, to switch to the remaining door, or does your probability of success not change?

In [None]:
## Answer 1: switch!!!

Now let's apply some Bayesian reasoning to this problem. Recall that the formula for Bayes' Rule, as applied to some data $y$ and an unknown parameter $\theta$, is:

$$P(\theta|y) = \frac{P(y|\theta)P(\theta)}{P(y)}$$

Further recall that:
- $P(\theta)$ is the **prior probability** of $\theta$.
- $P(y|\theta)$ is the **likelihood** of our data $y$ given $\theta$.
- $P(y)$ is the **marginal likelihood** of our data $y$.

Our strategy here will be to find $P(\theta|y)$ for $\theta=A,B,C$ and decide which probability is highest.

For these scenarios, the data $y$ is that the host selects door B *and* that door B does not contain the car.

Question 2: Suppose you pick door A. The host opens door B to reveal a goat.

For $P(\theta=A|y=B)$, identify the prior $P(\theta=A)$, the likelihood $P(y=B|\theta=A)$, and the marginal likelihood $P(y=B)$. Then calculate the posterior $P(\theta=A|y=B)$.

In [1]:
## Answer 2:
## P(theta=A) = 1/3
prior = 0.33333
## probability that door A has the car. 3 cars with equal probability

## p(y=B|theta=A) = 1/2
likelihood = 0.5
## probability that door B has the other goat given that A has been opened
## since A has been opened and there's no car, this changes our probability
## we now have 2 doors to open, and 1 has to have a goat behind it

## p(y=B) = 1/2
marginal = 0.5
## p(y=B|theta = A)*P(theta = A) + p(y=B|theta = B)*P(theta = B) + p(y=B|theta = C)*P(theta = C)
## p(y=B) = (1/2 * 1/3) + (0 * 1/3) + (1 * 1/3)
## = 1/2

## p(theta=A|y=B) = 

## probability that door A has the car
## given that door B was opened and had a goat

def posterior(prior, likelihood, marginal):
    return (float(likelihood) * float(prior))/float(marginal)

posterior(prior, likelihood, marginal)

0.33333

Question 3: Is this surprising? Why or why not?

In [2]:
## Answer 3: 
## Not surprising. Asking for the prob that the car 
## is behind door A given B has been opened and has a goat
## means that we haven't switched our door
## so nothing has changed

Question 4: Suppose you pick door A. The host opens door B to reveal a goat.

For $P(\theta=B|y=B)$, identify the prior $P(\theta=B)$, the likelihood $P(y=B|\theta=B)$, and the marginal likelihood $P(y=B)$. Then calculate the posterior $P(\theta=B|y=B)$.

In [3]:
## Answer 4:

# probability that car is behind B
prior = 0.33333

# probability that Monty picks B given that door B has the car behind it
likelihood =  0

marginal = 0.5


posterior(prior, likelihood, marginal)



0.0

Question 5: Is this surprising? Why or why not?

In [None]:
## Answer 5: 

# if we know that Monty is going to pick door B, by the rules
# of the game, we know that there won't be a car behind it
# so there's 0 chance a car is behind door B given Monty opened door B

Question 6: Suppose you pick door A. The host opens door B to reveal a goat.

For $P(\theta=C|y)$, identify the prior $P(\theta=C)$, the likelihood $P(y|\theta=C)$, and the marginal likelihood $P(y)$. Then calculate the posterior $P(\theta=C|y)$.

In [4]:
## Answer 6:


prior = 0.333333
likelihood = 1
marginal = 0.5

posterior(prior, likelihood, marginal)

0.666666

Question 7: Is this surprising? Why or why not?

In [None]:
## Answer 7: 

## Yes if I am to pretend I didn't know from week 2 that the probability would increase
## reasoning: because we are given more information through Monty opening a door
## after he opens the goat door, we now know that a car must be behind
## one of the two remaining doors
## therefore, our probability has increased

Question 8: Build a function called lets_make_a_deal that runs the Let's Make a Deal game by taking:
- 'A', 'B', or 'C' as the input for the door
- 'K' or 'S' as the input indicating "keep" or "switch" when asked

The function should return:
- 'win' if the contestant won.
- 'lose' if the contestant lost.

Note that you'll need to make sure that, before anything else, the computer selects a random entry.

In [7]:
## Answer 8: 
import random
import numpy as np
def lets_make_a_deal(door,action):
    doors = ['A','B','C']
    car_door = np.random.choice(doors)
    if door==car_door:
        if action=='K':
            return 'win'
        elif action=='S':
            return 'lose'
    else:
        if action=='K': # you haven't picked the correct door and you stay put
            return 'lose'
        elif action=='S':
            return 'win'

lets_make_a_deal('A','K')

'lose'

Question 9: Simulate 10,000 games where the person always switched. Report your results.

In [8]:
## Answer 9:

def n_sims(n_games):
    results = []
    door = random.choice(['A','B','C'])
    for i in range(0,n_games):
        results.append( lets_make_a_deal(door,'S') )
    print 'Pct win = ' + str(results.count('win') / float(n_games) * 100) + '%\nPct loss = ' + str(results.count('lose') / float(n_games) * 100) + '%'

In [9]:
n_sims(10000)

Pct win = 67.29%
Pct loss = 32.71%


Question 10: Take your function from question 8 and adapt it to include:
- 'A', 'B', or 'C' as the input for the door
- 'K' or 'S' as the input indicating "keep" or "switch" when asked
- A list of three probabilities indicating the probability that the car is behind door A, door B, and door C, respectively. (Note: What happens if the user submits a list that doesn't sum to 1? Can you force the probabilities to sum to 1?)

Build this function, play around with different inputs, and summarize your findings. Be sure to contrast these findings with the function you wrote for question 8.

In [23]:
## Answer 10:


def list_probs(door,action,probs):
    total = sum(probs)                   # sum up list of entered probabilities
    if total != 1:                       # if the sum of probabilities exceeds 1, then we'll
        probs = [x/total for x in probs] # divide each p by the sum to force them under 1
    results = []
    car = np.random.choice(['A','B','C'],p=probs)
    return lets_make_a_deal(door,action)
    

In [25]:
list_probs('A','S',[0.3,0.2,0.8])

'win'

Question 11: Consider the bowls of cookies example from lecture. (We didn't cover this yet!) Create a function called cookie_monster that:
- takes a list for bowl 1 as an input indicating the distribution of cookies (i.e. [30, 10])
- takes a list for bowl 2 as an input indicating the distribution of cookies (i.e. [20, 20])
- takes a list of probabilities as an input indicating how likely one is to select a cookie from bowl 1 or bowl 2. (Note: What happens if the user submits a list that doesn't sum to 1? Can you force the probabilities to sum to 1?)
- outputs the probabilities of each bowl being the one from which the cookie was selected. Be explicit with your labels/output

Build this function, play around with different inputs, and summarize your findings.

In [38]:
## Answer 11:

def cookie_monster(bowl_1, bowl_2, probs):
    cookie = random.choice(['vanilla','chocolate'])
    total = float(sum(probs))                   # sum up list of entered probabilities
    if total != 1:                       # if the sum of probabilities exceeds 1, then we'll
        probs = [x/total for x in probs] # divide each p by the sum to force them under 1

    if cookie=='vanilla':   # assign the index of the bowl cookie dist
        x = 0
    else:
        x = 1
    
    likelihood_b1_numerator = (bowl_1[0]/float(sum(bowl_1))) * probs[0]
    likelihood_b2_numerator = (bowl_2[0]/float(sum(bowl_2))) * probs[1]
    
    p_chosen_cookie = likelihood_b1_numerator + likelihood_b2_numerator
    
    likelihood_b1 = likelihood_b1_numerator / float(p_chosen_cookie) * 100
    likelihood_b2 = likelihood_b2_numerator / float(p_chosen_cookie) * 100
    
    print 'Given that we have a ' + cookie + ' cookie, the probability that it came from\nBowl 1 = ' + str(round(likelihood_b1,2)) + '% \nBowl 2 = ' + str(round(likelihood_b2,2)) + '%'
    
    

In [40]:
cookie_monster(bowl_1 = [15,25], bowl_2 = [11, 7] , probs = [.5, .5])

Given that we have a vanilla cookie, the probability that it came from
Bowl 1 = 38.03% 
Bowl 2 = 61.97%
