Throughout my data science training, I was presented with several fun coding exercises related to building functions exploring probability.  Often these were brainteasers or sorts, but they did a great job of demonstrating how a few simple loops could demonstrate probabilitistic theory and formulas through extensive simulations.

Below are a few of my favorite exercises, which describe both the problems at hand, and the solution that I developed in the course of solving them.

In [1]:
import numpy as np

### Problem 1 - The Chip Bag Game

A bag contains a chip, known to be either white or black. A white chip is put in, the bag is shaken, and a chip is drawn out, which proves to be white. What is now the chance of drawing a white chip?

In [2]:
#determining the probability of drawing another white chip, given drawing a first white chip.
#the higher the number of games played, the closer our results should approximate the probability of drawing a second
#white chip, given a first white chip being drawn.

def prob_two_whites(num):    #num signifies the number of times this game is played
    counter = 0  #counts the number of iterations or games played
    first_white = 0  #counts how many times the first pull was white
    double_white = 0 #counts how many times the second pull was white, if the first was also white
    while counter <= num:    #runs a game until total number of games is reached
        chip_1 = "white"    #this is the chip we know is white
        chip_2 = np.random.choice(["white","black"])    #chip two has an equal chance of being white or black for each game
        result = np.random.choice([chip_1, chip_2], 2, replace = False) #drawing both chips without replacement
        if result[0] == "white":    #if the first chip pulled is white
            first_white += 1        # add one to the first white
            if result[1] == "white":    #if the second pull was white, given the first was also white
                double_white += 1       # add one to double_white
        counter +=1          #add one to the counter
    return float(double_white)/float(first_white) #returns the proportion of double whites given a first white pull


In [3]:
prob_two_whites(50000)

0.6657340048498415

Given a first white chip being pulled from the bag, we would expect a second white chip to be pulled approximately two-thirds of the time.


### Problem 2 - The Weighted Coin Flip Game

Two players are playing a game where they flip a not necessarily fair coin, starting with Player 1. The first person to flip heads wins. The probability that a coin flipped lands on heads is p. What is the probability that Player 1 will win the game?

In [9]:
#arguments are number of games, and the probability the coin flip will result in heads.
#we can set the default probability of flipping a heads to .5, but can alter this as we like to change the odds

def P1_win_prob_weighted_coin_game(num_games, prob_heads=.5):  
    player_one_wins = 0           #set global counter for number of player one wins
    for n in range(0,num_games):  #for every game:
        num_flips = 0     #start with number of flips at zero (whether even or odd determines which player most recently flipped)
        win = 0
        while win == 0:         #if the game has no winner...
            turn = np.random.uniform(0,1)     #a player flips the coin..
            num_flips += 1                    #one is added to the number of flips taken
            if turn <= prob_heads:   #if the turn results in a winning condition (e.g., prob is .6 and value of turn is <= .6)
                if num_flips % 2 != 0:  #if number of flips is odd:
                    player_one_wins += 1   #player one wins
                win += 1                   #indicate win condition was met to move onto the next game.
    return float(player_one_wins)/float(num_games)  #return the proportion of player 1 wins to total games played


In [11]:
P1_win_prob_weighted_coin_game(50000)

0.66588

With an even coin, we can see the clear advantage of being the first player to flip the coin.  Player one can be expected to win approximately two-thirds of games played, even when playing with a fair coin.

In [12]:
P1_win_prob_weighted_coin_game(50000, .25)

0.57402

In [14]:
P1_win_prob_weighted_coin_game(50000, .01)

0.50208

In order for the two players to have a roughly even chance of winning the coin flip game, the coin could only flip heads in approximately one out of every hundred turns.  While this would be brutal to test under real-life conditions (especially playing through 50,000 games under these odds), luckily we can have computers simulate these conditions in a fraction of the time!  Isn't technology grand?

### Problem 3 - The Monty Hall Problem

The "Monty Hall Problem" is a famous problem in statistics based on the game show "Let's Make a Deal." (Monty Hall was the original host of this game show.)

As part of "Let's Make a Deal," there are three doors labeled "A," "B," and "C." Contestants are informed that behind exactly one door, there is a new car. Behind the other two doors are goats. The goal of the contestant is to select the door with the car.

The game goes as follows:
1. The contestant selects a door.
2. The game show host, knowing which door hides the car, opens one of the unselected doors to reveal a goat. (If the contestant selected a door with a goat, the host picks the other door with a goat. If the contestant started by selecting the door with the car, the host picks from the remaining two doors at random.)
3. The host then asks the contestant if they would like to stick with the door originally picked, or switch their choice to the other remaining door.

The task here was to build a function that runs the Let's Make a Deal game by taking:
-  A numeric input as the number of games to be played
- 'A', 'B', or 'C' as the input for the door
- 'K' or 'S' as the input indicating "keep" or "switch" when asked

In [15]:
def monty_hall_game(num_games, door_pick, keep_switch):  
    wins = 0            #counter for wins, starts as zero
    door_pick = door_pick.lower()   #allow the input to be upper or lower case by converting input to lower case
    keep_switch = keep_switch.lower()  #allow the input to be upper or lower case by converting input to lower case
    door_set = ["a", "b", "c"]      #possible doors include a, b, and c
    for n in range(0,num_games):             #for each game played...
        open_door_set = ["a", "b", "c"]  #initially set all doors capable of being opened
        unchosen_door_set = ["a", "b", "c"] #establishing set of doors not chosen - start with all and...
        unchosen_door_set.remove(door_pick)  #remove the door that the player picked
        win_door = np.random.choice(door_set, 1)   #the win door is randomly chosen from the set each game
        if door_pick == win_door:       # if the user picks the win door...
            open_door_set.remove(win_door)  #remove only the win door from the set of doors to open
        else:                           # if the user doesn't pick the win door,
            open_door_set.remove(win_door)  # remove both the win door and the door pick from the set of doors to open
            open_door_set.remove(door_pick)
        open_door = np.random.choice(open_door_set, 1) #from any items left in the open door set, pick one to open
        unchosen_door_set.remove(open_door) #now there is one door left that hasn't been picked or opened.
        if keep_switch == "k":         #if the user decided to keep their first pick
            if door_pick == win_door:  # if the door pick equals the win door:
                wins += 1              # the user wins! Add one to win counter
        if keep_switch == "s":        # if the user picks switch
            if unchosen_door_set[0] == win_door:  #if the unchosen door equals the win door
                wins += 1           # the user wins! Add one to win counter
    return float(wins)/float(num_games)

In [16]:
monty_hall_game(50000, "b", "k")

0.3301

In [17]:
monty_hall_game(50000, "b", "s")

0.66614

Over 50,000 games, the contestant could expect to win about two-thirds of the time if they choose to switch their door pick every time, while only winning about one-third of the time if always choosing to keep their initial pick.  

### Problem 4 - The Monty Hall Problem Revisited [Weighted Probabilities]

While the standard Monty Hall Problem is well-known, it can also be interesting to see how the game plays out if the prize has uneven probabilties for being behind any of the three doors. 

Now, in addition to specifying the number of games, initial door pick, and keep/switch condition, we can pass a list of probabilities for the prize being behind any of the given doors.  We'll also build in a component to ensure that the passed probabilities sum to 1.0.

With this function, we can set the default probabilities to 1/3 for each door, but allow them to be altered through user input.

In [20]:
def weighted_monty_hall_game(num_games, door_pick, keep_switch, prob_list=[.333, .333, .333]):  
    wins = 0            #counter for wins, starts as zero
    door_pick = door_pick.lower()   #allow the input to be upper or lower case by converting input to lower case
    keep_switch = keep_switch.lower()  #allow the input to be upper or lower case by converting input to lower case
    door_set = ["a", "b", "c"]      #possible doors include a, b, and c
    new_probs = []   #ensure probabilities will retain proportionality while summing to 1.0
    new_probs.append(prob_list[0]/sum(prob_list))
    new_probs.append(prob_list[1]/sum(prob_list))
    new_probs.append(prob_list[2]/sum(prob_list))
    for n in range(0,num_games):             #for each game played...
        open_door_set = ["a", "b", "c"]  #initially set all doors capable of being opened
        unchosen_door_set = ["a", "b", "c"] #establishing set of doors not chosen - start with all and...
        unchosen_door_set.remove(door_pick)  #remove the door that the player picked
        win_door = np.random.choice(door_set, 1, p=new_probs)   #the win door is randomly chosen from the set each game
        if door_pick == win_door:       # if the user picks the win door...
            open_door_set.remove(win_door)  #remove only the win door from the set of doors to open
        else:                           # if the user doesn't pick the win door,
            open_door_set.remove(win_door)  # remove both the win door and the door pick from the set of doors to open
            open_door_set.remove(door_pick)
        open_door = np.random.choice(open_door_set, 1) #from any items left in the open door set, pick one to open
        unchosen_door_set.remove(open_door) #now there is one door left that hasn't been picked or opened.
        if keep_switch == "k":         #if the user decided to keep their first pick
            if door_pick == win_door:  # if the door pick equals the win door:
                wins += 1              # the user wins! Add one to win counter
        if keep_switch == "s":        # if the user picks switch (I know I could have written an else statement here...)
            if unchosen_door_set[0] == win_door:  #if the unchosen door equals the win door
                wins += 1           # the user wins! Add one to win counter
    return float(wins)/float(num_games)

In [19]:
weighted_monty_hall_game(50000, "b", "s")

0.66848

In [25]:
#setting impossible probabilities to ensure probability transformation is working properly.
weighted_monty_hall_game(50000, "a", "s", [1.0, 1.0, 1.0])

0.67058

In [21]:
weighted_monty_hall_game(50000, "b", "s", [.25, .5, .25])

0.5027

In [22]:
weighted_monty_hall_game(50000, "b", "k", [.25, .5, .25])

0.50014

In [23]:
weighted_monty_hall_game(50000, "a", "k", [.25, .5, .25])

0.24852

Passing in a list of probabilities really drives home the probabilistic nature of the door choice and keep/switch condition.  Effectively, the contestant's likelihood of winning if switching their choice approaches the sum of the two non-chosen doors' chances of holding the prize, while the likelihood of winning if keeping the initial pick approaches the singular probability initially associated with their chosen door.

### Problem 5 - Bayesian Cookie Jar

Consider there are two bowls of cookies, each containing separate number of vanilla and chocolate cookies. Assume that one is blindfolded a picks a single vanilla cookie out of one of the two jars.  Given what we know about the cookie distributions between the two jars, what is the probability that the cookie was selected from each of the bowls?

We'll build a function to address this problem.  The function will:
- take a list for bowl 1 as an input indicating the distribution of vanilla and chocolate cookies (i.e. [30, 10])
- take a list for bowl 2 as an input indicating the distribution of vanilla and chocolate cookies (i.e. [20, 20])
- take a list of probabilities as an input indicating how likely one is to select a cookie from bowl 1 or bowl 2 (e.g., it may be more likely to select from bowl 1 compared to bowl 2, irrespective of their actual cookie distributions).


In [15]:
def cookie_monster(bowl1, bowl2, bowl_probs=[.5, .5]):
    #ensure probabilities will retain proportionality while summing to 1.0
    new_probs = []    
    new_probs.append(float(bowl_probs[0])/float(sum(bowl_probs)))
    new_probs.append(float(bowl_probs[1])/float(sum(bowl_probs)))
    #ensure that interger inputs are still treated as float values for each bowl
    new_bowl_1 = [float(i) for i in bowl1]   
    new_bowl_2 = [float(i) for i in bowl2]
    # calculating the base probability for a vanilla cookie to be drawn from either bowl
    # this is influenced by the proportion of vanilla:chocolate cookies in each bowl as well as the probability of
    # selecting a cookie from each of the two bowls
    bowl1_prob = ((new_bowl_1[0]/sum(new_bowl_1)) * new_probs[0]) 
    bowl2_prob = ((new_bowl_2[0]/sum(new_bowl_2)) * new_probs[1])
    # to normalize the probabilities (so they sum to 1.0), we divide the base probability of each bowl by the
    # sum of the two bowls' probabilities.  This is normalizing probabilities using their "marginal likelihood"
    norm_bowl_1 = bowl1_prob/(bowl1_prob + bowl2_prob)
    norm_bowl_2 = bowl2_prob/(bowl1_prob + bowl2_prob)
    return norm_bowl_1, norm_bowl_2


In [16]:
cookie_monster([30, 10], [20, 20])

(0.6, 0.4)

In [17]:
cookie_monster([30, 10], [20, 20], [.2, .8])

(0.27272727272727276, 0.7272727272727273)

In [19]:
#setting impossible probabilities to ensure probability transformation is working properly.
cookie_monster([30, 10], [20, 20], [1, 1])

(0.6, 0.4)

In [22]:
cookie_monster([20, 20], [20, 20])

(0.5, 0.5)

In [21]:
cookie_monster([20, 20], [20, 20], [.333, .667])

(0.333, 0.667)

From the above outputs, we can see the interplay of both the cookie distributions within each bowl and the probability distribution of choosing a cookie from either bowl.  The more even the cookie distribution are, the more influence the likelihood of picking from either jar has on our outcomes. Similarly, the more even our likelihood of choosing from either bowl, the more influence our cookie distributions between jars influences our outcome.

We can understand this as the interplay between "likelihood" and "prior probability" from bayes theorum in influence our "posterior probabilities." 