# Intro to Bayesian Statistics Lab

Complete the following set of exercises to solidify your knowledge of Bayesian statistics and Bayesian data analysis.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 1. Cookie Problem

Suppose we have two bowls of cookies. Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. Bowl 2 contains 20 of each. You randomly pick one cookie out of one of the bowls, and it is vanilla. Use Bayes Theorem to calculate the probability that the vanilla cookie you picked came from Bowl 1?

In [2]:
def bayes_rule(priors, likelihoods):
    marg = sum(np.multiply(priors, likelihoods))
    post = np.divide(np.multiply(priors, likelihoods), marg)
    return post

priors = [1/2, 1/2]
likelihoods = [0.75, 0.5]

bayes_rule(priors, likelihoods)
#p1= 0.6

array([0.6, 0.4])

What is the probability that it came from Bowl 2?

In [3]:
#p2 = 0.4

What if the cookie you had picked was chocolate? What are the probabilities that the chocolate cookie came from Bowl 1 and Bowl 2 respectively?

In [4]:
likelihoods = [0.25, 0.5]

bayes_rule(priors, likelihoods)

#p1 = 1/3
#p2 = 2/3

array([0.33333333, 0.66666667])

## 2. Candy Problem

Suppose you have two bags of candies:

- In Bag 1, the mix of colors is:
    - Brown - 30%
    - Yellow - 20%
    - Red - 20%
    - Green - 10%
    - Orange - 10%
    - Tan - 10%
    
- In Bag 2, the mix of colors is:
    - Blue - 24%
    - Green - 20%
    - Orange - 16%
    - Yellow - 14%
    - Red - 13%
    - Brown - 13%
    
Not knowing which bag is which, you randomly draw one candy from each bag. One is yellow and one is green. What is the probability that the yellow one came from the Bag 1?

*Hint: For the likelihoods, you will need to multiply the probabilities of drawing yellow from one bag and green from the other bag and vice versa.*

In [10]:
priors = [1/2, 1/2]
likelihoods = [0.2*0.2, 0.1*0.14]

bayes_rule(priors, likelihoods)

array([0.74074074, 0.25925926])

What is the probability that the yellow candy came from Bag 2?

In [11]:
b1_g = 2 * 0.1 * 0.14
b1_y = 2 * 0.2 * 0.2
b1_given_y = b1_y/ (b1_g + b1_y) 

1 - b1_given_y


0.2592592592592592

What are the probabilities that the green one came from Bag 1 and Bag 2 respectively?

In [12]:
#b1_g = b2_y = 2 * 0.1 * 0.14
#b1_y = b2_g = 2 * 0.2 * 0.2
#b1_given_g = b1_g/ (b1_g + b2_g) 
#b2_given_g = b2_g/ (b2_g + b1_g) 

priors = [1/2, 1/2]
likelihoods = [0.1*0.14, 0.2*0.2]

bayes_rule(priors, likelihoods)

array([0.25925926, 0.74074074])

## 3. Monty Hall Problem

Suppose you are a contestant on the popular game show *Let's Make a Deal*. The host of the show (Monty Hall) presents you with three doors - Door A, Door B, and Door C. He tells you that there is a sports car behind one of them and if you choose the correct one, you win the car!

You select Door A, but then Monty makes things a little more interesting. He opens Door B to reveal that there is no sports car behind it and asks you if you would like to stick with your choice of Door A or switch your choice to Door C. Given this new information, what are the probabilities of you winning the car if you stick with Door A versus if you switch to Door C?

In [18]:
'''
The systems are different.
Tree diagram can show this.
Therefore, since trials are independent so 1/2.
'''

'\nTrials are independent so 1/2.\n'

## 4. Bayesian Analysis 

Suppose you work for a landscaping company, and they want to advertise their service online. They create an ad and sit back waiting for the money to roll in. On the first day, the ad sends 100 visitors to the site and 14 of them sign up for landscaping services. Create a generative model to come up with the posterior distribution and produce a visualization of what the posterior distribution would look like given the observed data.

In [20]:
#  uniform prior = uniform distribution represent every rate has an equal chance of being the actual rate of purchase
# indicates probability

n_draws = 100000
prior = pd.Series(np.random.uniform(0, 1, size=n_draws))
observed = 14

#  will randomly draw from our prior uniform distribution 
# a large number of times
# simulate 50 people coming to our website 

def generative_model(param):
    result = np.random.binomial(100, param)
    return result

#empty list that is going to contain our simulated results 
sim_data = list()

# populate list by appending the results

for p in prior:
    sim_data.append(generative_model(p))

#  see how many times we get a result that is in line with the result we have observed

posterior = prior[list(map(lambda x: x == observed, sim_data))]


Produce a set of descriptive statistics for the posterior distribution.

In [21]:
posterior.describe()


count    1050.000000
mean        0.147020
std         0.033614
min         0.043286
25%         0.123005
50%         0.144831
75%         0.168741
max         0.256592
dtype: float64

What is the 90% credible interval range?

In [23]:
# mean is 0.145 
# median is 0.145

#  90% credible interval for this distribution as follows.
print(posterior.quantile(.025), '|', posterior.quantile(.975))
# central tendency must be between these numbers with 90% confidence


0.0894708303978861 | 0.21969348968376157


What is the Maximum Likelihood Estimate?

In [31]:
# round the values in the posterior distribution 
#to whole percents and calculate the most likely one along with its probability.

rounded = posterior.round(2)
mode = rounded.mode()[0]
probability = list(rounded).count(mode)/len(rounded)
print('Maximum Likelihood Estimate: ', mode, '|',probability)

#print(list(rounded)) = probabilities rounded
#print(list(rounded).count(mode)) = number of times modal value (0.15) appears
#print(mode)

Maximum Likelihood Estimate:  0.15 | 0.12571428571428572
