This repository presents three vectorised simulations to the “Monty Hall Problem” in R using:
- Base R.
- A tinyverse approach using the
data.table
package. - A tidyverse approach using
dplyr
andtidyr
packages.
There is a good write up and discussion about the Monty Hall Problem on Wikipedia from which I quote:
The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall.
Question
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
Standard Assumptions
- The host must always open a door that was not picked by the contestant.
- The host must always open a door to reveal a goat and never the car.
- The host must always offer the chance to switch between the originally chosen door and the remaining closed door.
Source: https://en.Wikipedia.org/wiki/Monty_Hall_problem
Use a vectorised approach to simulate the probability of winning for both, staying with the chosen door and switching door when offered.
The idea is to simulate multiple games (and outcomes) to evaluate whether staying or switching would be better.
We set the following parameters to make these simulations reproducible :
set.seed(663948)
simNum <- 100000 # Number of simulations
doorNum <- 3 # Number of doors (not strictly necessary)
Each of the simulations below will create the following variables:
sim
– simulation number.door
– door number.true
– behind true door: 1 = car & 0 = goat.guess
– contestant’s chosen door: 1 = car & 0 = goat.switch
– door offered to the contestant to switch to.
These are then summarised as follows:
stayWin
– proportion of winning by staying withguess
door.switchWin
– proportion of winning by switching toswitch
door.
The first simulation only uses Base R functions. I call it “R zeroverse”
as CRAN contributed packages are not allowed, just those that come with
R by default (e.g. base
, graphics
stats
).
# A function to simulate randomly chosen doors for the truth door and the guess.
rDoors <- function(sims){
unlist(lapply(1:sims, function(x) sample(c(0, 0, 1), replace = FALSE)))
}
# Note that for this method to work the data must be sorted by "sim" then the door
# (hence the order in the expand.grid() function).
mhbrGrid <- expand.grid(door=1:doorNum, sim=1:simNum)
mhbrGrid$true <- rDoors(simNum)
mhbrGrid$guess <- rDoors(simNum)
mhbr <- mhbrGrid[with(mhbrGrid, order(sim, -guess, true)), ]
mhbr$switch <- rep(c(0, 0, 1), simNum)
mhbr$stayWin <- with(mhbr, guess*true)
mhbr$switchWin <- with(mhbr, switch*true)
mhBaseR <- rbind(list(stayWin = sum(mhbr$stayWin)/simNum,
switchWin = sum(mhbr$switchWin)/simNum))
mhBaseR
## stayWin switchWin
## [1,] 0.3315 0.6685
A neat tinyverse simulation using
data.table
package.
library(data.table)
simdt <- CJ(sim=1:simNum, door=1:doorNum)
mhdt <- simdt[, `:=`(guess=sample(c(0, 0, 1)), true=sample(c(0, 0, 1)), switch=0), .(sim)][
order(sim, -guess, true)][
rowid(sim)==doorNum, switch:=1][
, .(stayWin = sum(guess*true)/simNum, switchWin = sum(switch*true)/simNum)]
mhdt
## stayWin switchWin
## 1: 0.33156 0.66844
A neat tidyverse simulation using dplyr
and
tidyr
packages.
library(dplyr)
library(tidyr)
mhtv <- expand_grid(sim=1:simNum, door=1:doorNum) %>%
group_by(sim) %>%
mutate(guess=sample(c(0, 0, 1)), true=sample(c(0, 0, 1))) %>%
arrange(sim, -guess, true) %>%
mutate(switch = as.numeric(row_number(sim)==doorNum)) %>%
ungroup %>%
summarise(stayWin = sum(guess*true)/simNum,
switchWin = sum(switch*true)/simNum)
mhtv
## # A tibble: 1 x 2
## stayWin switchWin
## <dbl> <dbl>
## 1 0.333 0.667
- All three simulations lead to the same conclusion – that probabilistically it is better to switch.
- I provide three vectorised solutions as the R community is diverse
and each member/group has different preferences.
- There are other ways to simulate this problem.
- Using a
for
loop based approach is perfectly fine. - I just wanted code it using a vectorised approach as it was bugging me ;)
Thanks for reading.
Thanks to the R core,
data.table
and
tidyverse authors, maintainers and
contributors.