Skip to content

saghirb/Monty_Hall_Vectorised_Simulations_R

Repository files navigation

Monty Hall Problem - Vectorised Simulations in R

Monty Hall Problem Image

This repository presents three vectorised simulations to the “Monty Hall Problem” in R using:

  1. Base R.
  2. A tinyverse approach using the data.table package.
  3. A tidyverse approach using dplyr and tidyr packages.

Monty Hall Problem

There is a good write up and discussion about the Monty Hall Problem on Wikipedia from which I quote:

The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall.

Question

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Standard Assumptions

  • The host must always open a door that was not picked by the contestant.
  • The host must always open a door to reveal a goat and never the car.
  • The host must always offer the chance to switch between the originally chosen door and the remaining closed door.

Source: https://en.Wikipedia.org/wiki/Monty_Hall_problem

Objective

Use a vectorised approach to simulate the probability of winning for both, staying with the chosen door and switching door when offered.

Simulations

The idea is to simulate multiple games (and outcomes) to evaluate whether staying or switching would be better.

We set the following parameters to make these simulations reproducible :

set.seed(663948)
simNum <- 100000  # Number of simulations
doorNum <- 3      # Number of doors (not strictly necessary)

Each of the simulations below will create the following variables:

  • sim – simulation number.
  • door – door number.
  • true – behind true door: 1 = car & 0 = goat.
  • guess – contestant’s chosen door: 1 = car & 0 = goat.
  • switch – door offered to the contestant to switch to.

These are then summarised as follows:

  • stayWin – proportion of winning by staying with guess door.
  • switchWin – proportion of winning by switching to switch door.

Base R (R Zeroverse) Simulation

The first simulation only uses Base R functions. I call it “R zeroverse” as CRAN contributed packages are not allowed, just those that come with R by default (e.g. base, graphics stats).

# A function to simulate randomly chosen doors for the truth door and the guess.
rDoors <- function(sims){
  unlist(lapply(1:sims, function(x) sample(c(0, 0, 1), replace = FALSE)))
}

# Note that for this method to work the data must be sorted by "sim" then the door 
# (hence the order in the expand.grid() function).
mhbrGrid <- expand.grid(door=1:doorNum, sim=1:simNum)
mhbrGrid$true <- rDoors(simNum)
mhbrGrid$guess <- rDoors(simNum)

mhbr <- mhbrGrid[with(mhbrGrid, order(sim, -guess, true)), ]
mhbr$switch <- rep(c(0, 0, 1), simNum)
mhbr$stayWin <- with(mhbr, guess*true)
mhbr$switchWin <- with(mhbr, switch*true)

mhBaseR <- rbind(list(stayWin = sum(mhbr$stayWin)/simNum, 
                      switchWin = sum(mhbr$switchWin)/simNum))
mhBaseR
##      stayWin switchWin
## [1,] 0.3315  0.6685

Tinyverse (data.table) Simulation

A neat tinyverse simulation using data.table package.

library(data.table)

simdt <- CJ(sim=1:simNum, door=1:doorNum)

mhdt <-  simdt[, `:=`(guess=sample(c(0, 0, 1)), true=sample(c(0, 0, 1)), switch=0), .(sim)][
    order(sim, -guess, true)][
    rowid(sim)==doorNum, switch:=1][
    , .(stayWin = sum(guess*true)/simNum, switchWin = sum(switch*true)/simNum)]

mhdt
##    stayWin switchWin
## 1: 0.33156   0.66844

Tidyverse (dplyr & tidyr) Simulation

A neat tidyverse simulation using dplyr and tidyr packages.

library(dplyr)
library(tidyr)

mhtv <- expand_grid(sim=1:simNum, door=1:doorNum) %>%
    group_by(sim) %>%
    mutate(guess=sample(c(0, 0, 1)), true=sample(c(0, 0, 1))) %>%
    arrange(sim, -guess, true) %>%
    mutate(switch = as.numeric(row_number(sim)==doorNum)) %>%
    ungroup %>%
    summarise(stayWin = sum(guess*true)/simNum,
              switchWin = sum(switch*true)/simNum)

mhtv
## # A tibble: 1 x 2
##   stayWin switchWin
##     <dbl>     <dbl>
## 1   0.333     0.667

Summary

  • All three simulations lead to the same conclusion – that probabilistically it is better to switch.
  • I provide three vectorised solutions as the R community is diverse and each member/group has different preferences.
    • There are other ways to simulate this problem.
    • Using a for loop based approach is perfectly fine.
    • I just wanted code it using a vectorised approach as it was bugging me ;)

Thanks for reading.

Acknowledgements

Thanks to the R core, data.table and tidyverse authors, maintainers and contributors.

About

Monty Hall Problem - Vectorised Simulations in R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages