# Simulation Exercise: Monty Hall Problem

Instaling additional Python libraries:

In [2]:

# Numpy is a library for working with Arrays

import numpy as np
print ("Numpy version:"  , np.__version__ ,"(need at least 1.7.1)" )

# SciPy implements many different numerical algorithms
import scipy as sp
print ("SciPy version:",sp.__version__ ,"(need at least 0.12.0)") 

# Pandas makes working with data tables easier
import pandas as pd
print ("Pandas version:",pd.__version__,"(need at least 0.11.0)") 

# Module for plotting
import matplotlib
print ("Mapltolib version:",matplotlib.__version__) 

# SciKit Learn implements several Machine Learning algorithms
import sklearn
print ("Scikit-Learn version:",sklearn.__version__)

# Requests is a library for getting data from the Web
import requests
print ("requests version:",requests.__version__)

# Networkx is a library for working with networks
import networkx as nx
print ("NetworkX version:", nx.__version__)

#BeautifulSoup is a library to parse HTML and XML documents
import bs4
print ("BeautifulSoup version:",bs4.__version__)

#MrJob is a library to run map reduce jobs on Amazon's computers
import mrjob
print ("Mr Job version:", mrjob.__version__)

#Pattern has lots of tools for working with data from the internet
import pattern
print ("Pattern version:",pattern.__version__)

#Seaborn is a nice library for visualizations
import seaborn
print ("Seaborn version:", seaborn.__version__)

Numpy version: 1.10.4 (need at least 1.7.1)
SciPy version: 0.17.0 (need at least 0.12.0)
Pandas version: 0.18.0 (need at least 0.11.0)
Mapltolib version: 1.5.1
Scikit-Learn version: 0.17.1
requests version: 2.9.1
NetworkX version: 1.11
BeautifulSoup version: 4.4.1
Mr Job version: 0.5.1
Pattern version: 2.6
Seaborn version: 0.7.0


## The Problem:


Here's a fun and perhaps surprising statistical riddle, and a good way to get some practice writing python functions

In a gameshow, contestants try to guess which of 3 closed doors contain a cash prize (goats are behind the other two doors). Of course, the odds of choosing the correct door are 1 in 3. As a twist, the host of the show occasionally opens a door after a contestant makes his or her choice. This door is always one of the two the contestant did not pick, and is also always one of the goat doors (note that it is always possible to do this, since there are two goat doors). At this point, the contestant has the option of keeping his or her original choice, or swtiching to the other unopened door. The question is: is there any benefit to switching doors? The answer surprises many people who haven't heard the question before.

We can answer the problem by running simulations in Python. We'll do it in several parts.

First, write a function called `simulate_prizedoor`. This function will simulate the location of the prize in many games -- see the detailed specification below:

In [3]:

def simulate_prizedoor(nsim):
    return np.random.randint(0, 3,nsim)

print(simulate_prizedoor(3))


[2 2 1]


Next, let us write a function that simulates the contestant's guesses for `nsim` simulations. Call this function `simulate_guess`. The specs:

In [10]:
def simulate_guess(nsim):
    return([1]*nsim)



Next, write a function, `goat_door`, to simulate randomly revealing one of the goat doors that a contestant didn't pick.

In [39]:
def goat_door(prizedoors,guesses):
    all = [0,1,2]
    i=0
    goat = np.empty(len(prizedoors),dtype=np.int)
    while i < len(prizedoors):
        chosen = [prizedoors[i],guesses[i]]
        goat[i] = np.setdiff1d(all,chosen)[0]
        i=i+1
    return goat

print(goat_door(np.array([0, 1, 2,2,1]), np.array([1, 1, 1,1,1])))

[2 0 0 0 0]


Write a function, `switch_guess`, that represents the strategy of always switching a guess after the goat door is opened.

In [37]:

def switch_guess(guesses,goatdoors):
    all = [0,1,2]
    i=0
    guess = np.empty(len(goatdoors),dtype=np.int)
    while i < len(guesses):
        chosen = [guesses[i],goatdoors[i]]
        guess[i] = np.setdiff1d(all,chosen)[0]
        i=i+1
    return guess
    
print(switch_guess(np.array([0, 1, 2]), np.array([1, 2, 1])))

[2 0 0]


Last function: write a `win_percentage` function that takes an array of `guesses` and `prizedoors`, and returns the percent of correct guesses

In [47]:

def win_percentage(guesses,prizedoors):
    return 100*np.equal(guesses,prizedoors).mean()


Now, put it together. Simulate 10000 games where contestant keeps his original guess, and 10000 games where the contestant switches his door after a  goat door is revealed. Compute the percentage of time the contestant wins under either strategy. Is one strategy better than the other?

In [48]:
nsim = 10000

#keep guesses
print("Win percentage when keeping original door")
print (win_percentage(simulate_prizedoor(nsim), simulate_guess(nsim)))

#switch
pd = simulate_prizedoor(nsim)
guess = simulate_guess(nsim)
goats = goat_door(pd, guess)

guess = switch_guess(guess, goats)

print("Win percentage when switching doors")
print (win_percentage(pd, guess).mean())

Win percentage when keeping original door
33.25
Win percentage when switching doors
67.77


Many people find this answer counter-intuitive (famously, PhD mathematicians have incorrectly claimed the result must be wrong. Clearly, none of them knew Python). 
