# Let's Make a Deal!

## I recently came across a brain-melting concept that goes to show how hard it is for humans to think in statistical terms. It's known as the "Monty Hall" problem, and I had to see it in action for myself.

### It goes like this: You are playing a game show and choosing between three doors. One of them has the grand prize (a car is usually used as the example) and the other two are losing choices. 

### You choose a door. Your odds of winning the grand prize are effectively 1 in 3. The game show host then shows you that one of the doors you did not pick was a losing door (which, regardless of your choice, one of the remaining options was going to be.) They then give you the option to either keep your original choice, or switch to the door you did not initially pick. What should you do?

### The common - and it turns out **incorrect** answer - is it doesn't matter. There are two doors left, and one of them has a prize, so intuitively you would think you have a 50/50 shot of winning the prize behind the remaining doors. HOWEVER, the statistics actually work out such that switching your doors at this point gives you a 66% chance of winning, while keeping your original choice means you have only a 33% chance. 

# Say what now?

## Much has been written, exhaustively, about the math behind this, so I will refer you to a Google search to read up on how this works out. However, if you're like me, you need to see this work in the real world in order to believe it. Thus, to illustrate, I put together the code below.

In [199]:
import random

In [231]:
def pickwinner():
    winner = random.randint(1, 3) #pick the winning number between 1 and 3
    return winner

def pickguess():
    guess = random.randint(1, 3) #pick the initial guess between 1 and 3
    return guess

def revealfirst(winner, guess):
    #print(winner)
    #print(guess)
    alloptions = [1, 2, 3]
    #print(alloptions)
    alloptions.remove(winner) # Need to ensure the winning door is not possibly revealed by the host.
    #print(alloptions)

    if guess in alloptions:
        alloptions.remove(guess) #If our guess was not the winner, need to remove it from the list of doors to reveal.
        #print(alloptions)
        #print(alloptions[0])
        reveal = alloptions[0] 
        #print("reveal = {}".format(reveal))
    else:
        reveal = random.choice(alloptions) #If our guess was the winner, pick the revealed door at random from the non-winners.
        #print("reveal = {}".format(reveal))
    
    return reveal

In [201]:
def playgame(numberplays):
    
    scorekeeper = [] #list to keep track of wins and losses - will it be closer to 50% or 66%?
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner() #pick winner at random
        guess = pickguess()   #pick guess at random
        reveal = revealfirst(winner, guess) #reveal a non-winning door
        
        doors = [1, 2, 3]
        doors.remove(reveal) #remove the revealed door from the final choice options
        doors.remove(guess)  #since we are going to switch every time, remove our original guess from the list
        
        switchchoice = doors[0]  #Our final choice (switchchoice) is the value that remains
        
        if switchchoice == winner:
            scorekeeper.append(1) #write a 1 to the list for every time we win with this strategy
        else:
            scorekeeper.append(0) #write a 0 to the list for every time we lose with this strategy
        
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Switching strategy wins {}% of the time.".format(percentwin))

In [202]:
results10 = playgame(10)

Switching strategy wins 80.0% of the time.


In [203]:
results100 = playgame(100)

Switching strategy wins 61.0% of the time.


In [204]:
results1000 = playgame(1000)

Switching strategy wins 70.7% of the time.


In [205]:
results10000 = playgame(10000)

Switching strategy wins 66.64999999999999% of the time.


In [206]:
results100000 = playgame(100000)

Switching strategy wins 66.67999999999999% of the time.


## As you can see, the switching strategy, using randomly generated winning doors and initial guesses, ends up being the correct choice around 2/3 of the time - and definitively more than 50% of the time. Isn't statistics neat?

## BUT - I can hear the skeptics now - you only showed one outcome. While mathematically, it should be clear that every winner would be a loser and vice versa in the example above, just to put this completely to bed, let's write another playgame function, but this time **not** switch our choice and make sure the numbers are in the 33% range.

In [207]:
def playgame_noswitch(numberplays):
    
    scorekeeper = [] #list to keep track of wins and losses - will it be closer to 50% or 66%?
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner() #pick winner at random
        guess = pickguess()   #pick guess at random
        reveal = revealfirst(winner, guess) #reveal a non-winning door
        
        ## Since we're not going to switch our choice, the lines of code below are not needed.
        ## All we have to do is change from switchchoice == winner to guess == winner.
        
        #doors = [1, 2, 3]
        #doors.remove(reveal) #remove the revealed door from the final choice options
        #doors.remove(guess)  #since we are going to switch every time, remove our original guess from the list
        
        #switchchoice = doors[0]  #Our final choice (switchchoice) is the value that remains
        
        #if switchchoice == winner:
        #    scorekeeper.append(1) #write a 1 to the list for every time we win with this strategy
        #else:
        #    scorekeeper.append(0) #write a 0 to the list for every time we lose with this strategy
        
        if guess == winner: # This is the only change from the code above
            scorekeeper.append(1) #write a 1 to the list for every time we win with this strategy
        else:
            scorekeeper.append(0) #write a 0 to the list for every time we lose with this strategy
        
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Not switching strategy wins {}% of the time.".format(percentwin))

In [208]:
noswitch_results10 = playgame_noswitch(10)

Not switching strategy wins 40.0% of the time.


In [209]:
noswitch_results100 = playgame_noswitch(100)

Not switching strategy wins 21.0% of the time.


In [210]:
noswitch_results1000 = playgame_noswitch(1000)

Not switching strategy wins 35.199999999999996% of the time.


In [211]:
noswitch_results10000 = playgame_noswitch(10000)

Not switching strategy wins 32.800000000000004% of the time.


In [212]:
noswitch_results100000 = playgame_noswitch(100000)

Not switching strategy wins 33.39% of the time.


## So it's not a trick - when we don't switch, we only win about 1/3 times. Switching, we win 2/3 times. Mind-melting, but true!

# But Wait!

## First, let's make sure that this isn't just some weird statistical thing about switching a choice between two doors.

In [215]:
doors = [1, 2]

def pickwinner2():
    winner2 = random.randint(1, 2) #pick the winning number between 1 and 2
    return winner2

def pickguess2():
    guess2 = random.randint(1, 2) #pick the initial guess between 1 and 2
    return guess2

In [216]:
pickwinner2()

1

In [217]:
def playgame2(numberplays): #Play a 2-door game, no reveal, switch doors every time
    
    scorekeeper = [] #list to keep track of wins and losses - will it be closer to 50% or 66%?
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner2() #pick winner at random
        guess = pickguess2()   #pick guess at random
        #reveal = revealfirst(winner, guess) #reveal a non-winning door - in this version, there is no reveal
        
        doors = [1, 2] # Two possible doors from the get-go.
        #doors.remove(reveal) #remove the revealed door from the final choice options
        doors.remove(guess)  #since we are going to switch every time, remove our original guess from the list
        
        switchchoice = doors[0]  #Our final choice (switchchoice) is the value that remains
        
        if switchchoice == winner:
            scorekeeper.append(1) #write a 1 to the list for every time we win with this strategy
        else:
            scorekeeper.append(0) #write a 0 to the list for every time we lose with this strategy
        
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Switching strategy (2-door version) wins {}% of the time.".format(percentwin))

In [218]:
playgame2(100)

Switching strategy (2-door version) wins 43.0% of the time.


In [219]:
playgame2(1000)

Switching strategy (2-door version) wins 49.1% of the time.


In [220]:
playgame2(10000)

Switching strategy (2-door version) wins 49.730000000000004% of the time.


In [221]:
playgame2(100000)

Switching strategy (2-door version) wins 50.28% of the time.


## OK, so the universe still works the way I think it should - if only two doors are the starting condition, then switching vs. not is roughly a 50/50 proposition.

# BUT WAIT!!!

## WHY IS THIS DIFFERENT? And can we break the statistics? Let's find out.

## Let's add a wrinkle to the Monty Hall Problem and make it a two-player game. The first player picks a door, one incorrect option is revealed and removed. But this time, instead of that player making the choice whether to switch or not, we switch players. The new player has no awareness of the starting conditions of the game. The only thing they know is that there are two doors, and the first player chose one of them (and this choice is known.) The second player is then given the option - do they stick with the original player's choice, or do they switch to the other door?

## Put another way - is it the players knowledge of the initial conditions that affects the outcome (as weird as that sounds) or will the outcome be different for the second player based on the initial conditions, despite the fact they did not know them?

## To start, let's play the first half of the game, whose outputs will serve as the inputs for the second half.

In [249]:
winner = pickwinner() #pick winner at random
print(winner)
guess = pickguess()   #pick guess at random
print(guess)

def MH_interrupted(winner, guess):
    #print(winner)
    reveal = revealfirst(winner, guess) #reveal a non-winning door
        
    doors = [1, 2, 3]
    doors.remove(reveal) #remove the revealed door from the final choice options 
    #print(doors)
    
    return doors

3
3


In [250]:
doortochoose = MH_interrupted(winner, guess)

In [251]:
doortochoose

[2, 3]

## The output of MH_interrupted will always return the the winning door. If the winning door is different from the guess, the guess will be the other door. If it is the same as the guess, the other choice will be random.

## Now we build a new function that uses this list of doors and the initial guess as starting points, and then switches to the other option and evaluates whether this is a winner or loser.

In [292]:
def final_guess(doors, winner, guess):
    
    #print("doors = {}".format(doors))
    #print("winner = {}".format(winner))
    #print("first guess = {}".format(guess))
    
    if guess == doors[0]:
        switch = doors[1]
    else:
        switch = doors[0]
    
    #print("switched guess = {}".format(switch))
                
    if switch == winner:
        return 1
    else:
        return 0 

## Now we put this all together.

In [293]:
def playgame4(numberplays):
    
    scorekeeper = []
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner()
        guess = pickguess()
        doors = MH_interrupted(winner, guess)
        
        outcome = final_guess(doors, winner, guess)
        
        scorekeeper.append(outcome)
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Switching strategy (2-door MH_interrupted version) wins {}% of the time.".format(percentwin))    
    

In [294]:
playgame4(1000)

Switching strategy (2-door MH_interrupted version) wins 66.10000000000001% of the time.


## OK, so it looks like our initial conditions hold. Let's try the game with our starting point as two doors:

In [295]:
def playgame5(numberplays):
    
    scorekeeper = []
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner2()
        guess = pickguess2()
        doors = [1,2] #Changed the lines above to the two-door starting point, and then doors is always known
        
        outcome = final_guess(doors, winner, guess)
        
        scorekeeper.append(outcome)
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Switching strategy (2-door initial start version) wins {}% of the time.".format(percentwin))    

In [296]:
playgame5(1000)

Switching strategy (2-door initial start version) wins 48.1% of the time.


## OK, weirdly stats is holding up here! What happens if we abstract the labels? For example, what was originally door 1, 2, and 3 get reduced to two doors, and then those labels get changed. This shouldn't change the outcome, but will it?

In [303]:
def playgame6(numberplays):
    
    scorekeeper = []
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner()
        guess = pickguess()
        doors = MH_interrupted(winner, guess)
        #print(doors)
        
        if winner == doors[0]:
            winner = 'A'
        else:
            winner = 'B'
        
        if guess == doors[0]:
            guess = 'A'
        else:
            guess = 'B'
        
        doors[0] = 'A'
        doors[1] = 'B'
        
        #print(doors)
        
        outcome = final_guess(doors, winner, guess)
        
        scorekeeper.append(outcome)
        numberplayed += 1
    
    percentwin = round(sum(scorekeeper)/numberplays, 4) * 100  #calculate our percentage of wins
    
    print("Switching strategy (2-door MH_interrupted version) wins {}% of the time.".format(percentwin))    
    

In [304]:
playgame6(100)

Switching strategy (2-door MH_interrupted version) wins 67.0% of the time.


## So switching the labels on a per iteration basis doesn't affect anything. What if we break the two steps completely apart?

In [305]:
def builddata(numberplays):
    
    output = []
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner()
        guess = pickguess()
        doors = MH_interrupted(winner, guess)
        #print(doors)
        
        if winner == doors[0]:
            winner = 'A'
        else:
            winner = 'B'
        
        if guess == doors[0]:
            guess = 'A'
        else:
            guess = 'B'
        
        doors[0] = 'A'
        doors[1] = 'B'
        
        output.append([winner, guess, doors])
        
        numberplayed += 1
    
    return output

In [332]:
startingpoint = builddata(1000)

In [336]:
startingpoint[:20]

[['B', 'B', ['A', 'B']],
 ['B', 'B', ['A', 'B']],
 ['B', 'B', ['A', 'B']],
 ['B', 'A', ['A', 'B']],
 ['B', 'A', ['A', 'B']],
 ['B', 'A', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['A', 'A', ['A', 'B']],
 ['A', 'A', ['A', 'B']],
 ['A', 'A', ['A', 'B']],
 ['B', 'B', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['B', 'A', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['A', 'A', ['A', 'B']],
 ['A', 'A', ['A', 'B']],
 ['A', 'B', ['A', 'B']],
 ['A', 'B', ['A', 'B']]]

In [334]:
x = 0
scorekeeper = []
for row in startingpoint:
    inputs = startingpoint[x]
    correct = inputs[0]
    chosen = inputs[1]
    dooroptions = inputs[2]
    x += 1
    
    outcome = final_guess(dooroptions, correct, chosen)
    scorekeeper.append(outcome)

In [335]:
import numpy as np
np.mean(scorekeeper)

0.668

In [343]:
def builddata2(numberplays):
    
    output = []
    numberplayed = 0
    
    while numberplayed < numberplays:
        
        winner = pickwinner()
        guess = pickguess()
        doors = MH_interrupted(winner, guess)
        #print(doors)
        
        output.append([winner, guess])
        
        numberplayed += 1
    
    return output

In [363]:
start2 = builddata2(1000)

In [364]:
x = 0
TorF = []
for row in start2:
    if start2[x][0]==start2[x][1]:
        TorF.append(1)
    else:
        TorF.append(0)
    
    x += 1

In [365]:
np.mean(TorF)

0.325