#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Probability

Probabilistic functions are responsible for many algorithms used in both traditional programming, and machine learning applications.  This module discusses the differences between Frequentist and Bayesian approaches, and why probability is not always what it seems.

- Basic Probability
- Conditional Probability
- Frequentist vs. Bayesian
- Bayes' Theorem
- Monty Hall Problem
- Estimation


## Overview

### Learning Objectives

 * Understand what probability is, and how to use it as a Data Scientist.
 * How to design an experiment around independence.
 * Be able to code up simple experiments using ```random``` and ```scipy```
 * How do frequentists' measure of probability differ from Bayesians?
 * Know how to solve the cookie problem using Bayes' Rule or Theorem.
 * Apply Bayesian Statistics to solve the Monty Hall Problem.
 * Estimate the probability of a certain outcome of an experiment.

### Prerequisites

* Intermediate Python
* Pandas
* Visualizations

### Estimated Duration

90 minutes

### Grading Criteria

Each exercise is worth 3 points. The rubric for calculating those points is:

| Points | Description |
|--------|-------------|
| 0      | No attempt at exercise |
| 1      | Attempted exercise, but code does not run |
| 2      | Attempted exercise, code runs, but produces incorrect answer |
| 3      | Exercise completed successfully |

There are 8 exercises in this Colab so there are 24 points available. The grading scale will be 24 points.

### Load Packages

In [0]:
%matplotlib inline

import random
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

## Basic Probability

If probability is just a number between 0 and 1, what are the odds of getting heads when you flip a coin?  Most would say 50%, and that is just an estimate for an experiment in which you flip exactly one coin.

$P(Heads)=1/2$

In the case of a coin flip, the probability of the event heads is 50%.  We could then estimate that if we flip a coin 1000 times, we will get 500 heads. Most of the time we will not get 500, but the results of the coin flip experiment show that we come pretty close. Perhaps this deviation from our estimate could be due to some bias, or simply random noise.  We can however expect that most of the time our experiment will yield a fairly even split.




## Independence and Dependence

Two or more events can either be independent of, or dependent on one or more prior events. When thinking like a data scientist, it is important to take note of the sequence of events, especially when designing an experiment.  Assuming that two events are independent is a convenience that is used in statistics so we can test our hypotheses.  However, if there is significant evidence that two events are not independent, then it is imperative to determine their order of dependence, as well as their conditional independence. Some events happen in their own chronological order, while others happen as a result of some other trigger.  





## Conditional Probability 

When thinking about conditional probability, it is important to consider independence versus dependence.  Say you had a bag of marbles with 5 red and 5 blue.  To prove that conditional probability works, you can experiment by drawing marbles with and without replacement.

### Marbles with replacement
First experiment with independent events by using replacement.  What is the probability that you: 

1. Draw either a blue or red?
2. Draw a red followed by a blue?
3. Draw two blues in a row?


1. $P(Blue) = P(Red) =1/2$

2. $P(Blue \cap  Red) = P(Red) * P(Blue) = 1/4$

3. $P(Blue \cap Blue) = P(Blue) * P(Blue) = 1/4$

### Marbles without replacement

The probability of drawing a marble A, given that you've drawn a marble B is as follows:
1. $P(Blue \mid Red) = 5/(10-1) = 5/9$
2. $P(Blue \mid Blue) = (5-1)/(10-1) = 4/9$

We can then say the probability of both events A and B happening is the same as the probability of event A multiplied by the probability of B given A .

$P(A \cap B) = P(A) * P(B \mid A)$


What is the probability that you: 

1. Draw a blue followed by a red?
2. Draw two blues in a row?

1. $P(Blue)*P(Red \mid Blue) =\frac{1}{2}*5/9=5/18$
2. $P(Blue)*P(Blue \mid Blue) = \frac{1}{2}*4/9=2/9$

## Bayes' Theorem

Say the probability of A given B is unknown, but the probability of B given A is known. Then $P(A|B)$ can be calculated by multiplying the probability of A by the probability of B given A and dividing the result by the probability of B.

$P(A|B) =\frac{P(A)P(B|A)}{P(B)}$




## Frequentist vs. Bayesian

Not all probability was created equal.  Depending on your perspective, and the context in which you are modeling something, probability estimates can differ. Frequentists say that if an experiment were repeated over many trials, the probability of say a coin flip, would be 50%. Bayesians are actually looking at a different context.  For instance, a Bayesian sets up a 100 flip experiment, and wants to estimate the probability of getting at least 50 heads.

To solve this we can look at a [Probability Density Function](https://en.wikipedia.org/wiki/Probability_density_function).  Knowing that the *σ* is equal to the standard deviation for a normal distribution, we can see from the 3rd plot that there is a 68.27% chance of flipping 50 heads +- 1σ which is equivalent to a range of approximately 45 to 55 heads

<a title="Jhguch at en.wikipedia [CC BY-SA 2.5 (https://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Boxplot_vs_PDF.svg"><img width="364" alt="Boxplot vs PDF" src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Boxplot_vs_PDF.svg/512px-Boxplot_vs_PDF.svg.png"></a>


In [0]:
#@title #### Change the parameters to make the standard deviation lower than 4

n=100 #@param {type:"integer"} number of flips
p=0.5 #@param {type:"number"} probability of heads
size=10000 #@param {type:"number"} number of experiments
b = np.random.binomial(n=n, p=p, size=size)

sns.distplot(b)
print("Standard Deviation:",np.std(b))
plt.show()

#### How to interpret probability as a frequency distribution

If we conducted a 100 flip experiment 10000 times, we would expect to see 50 heads about 800 times, so the likelihood of seeing exactly 50 heads is about 8%

## Estimation

If we have a biased coin, how could we estimate the probability of getting, say, k heads?

The Probability Mass Function (PMF) allows us to calculate the probability of a distinct set of outcomes for an experiment with a predetermined number of trials. For example, we can use the PMF to estimate the probability of getting *k heads* over *n tosses*, given that P(heads) = p, i.e., the probability of heads is p.


The function states that the probability of having an outcome exactly equal to k given that the probability of that event is p and the number of trials is n is equal to: the combination of n,k multiplied by p to the kth power, where p is the known probability, multiplied by 1 minus p to the n minus kth power.

$P(k | p,n) = \binom{n}{k} (p)^{k}(1-p)^{n-k}$


## Resources

 <a href="http://www.greenteapress.com/thinkstats/"><img height="150px" src="http://www.greenteapress.com/thinkstats/think_stats_comp.png" align="left" hspace="100px" vspace="10px">
<a/>
 


 <a href="https://greenteapress.com/wp/think-bayes/"><img height="150px" src="http://www.greenteapress.com/thinkbayes/think_bayes_cover_medium.png" align="left" hspace="100px" vspace="10px">
<a/>




# Exercises

## Exercise 1

Create a function that executes a coin flip experiment,  and a second that visualizes the results with a barplot

### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
def barPlotter(series, title):
    """
    Take in a Pandas Series and a String,
    Create a horizontal bar plot with numbers printed
    """
    fig, ax = plt.subplots(1,1,figsize=(5,4))
    ax = series.sort_values().plot.barh()
    ax.set_facecolor('white')
    
    for loc in ['right', 'top', 'bottom']:
        ax.spines[loc].set_visible(False)
    for i, v in enumerate(series.sort_values(ascending=True).tolist()):
        ax.text(v + 1, i, str(v), fontweight='bold')
        
    fig.subplots_adjust(right=1)
    plt.xticks([])
    plt.suptitle(title)
    plt.show()
    
    return ax

def coinFlips(times=1000):
    """
    Execute a coin flip experiment and return the results as a Series
    """
    coin = ['H', 'T']  
    flips = []
    for _ in range(times):
        random.shuffle(coin)
        flips.append(coin[0])
    return pd.Series(flips).value_counts()
 

**Validation**

In [0]:
!pip install plotchecker

from plotchecker import BarPlotChecker
pc = BarPlotChecker(barPlotter(coinFlips(), 'coinflips'))
pc.assert_num_bars(2)

## Exercise 2

Design an experiment where you could test the theory of independence.  Minimum 100 words.

### Student Solution

In [0]:
# Your answer goes here
answer = """ your answer """

### Answer Key

**Solution**

In [0]:
answer = ('a '*100)

**Validation**

In [0]:
assert len(answer.split()) >= 100

## Exercise 3

Code a bag of marbles, draw 1000 marbles, and visualize the results with a barplot

### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
def createMarbles(colors=['R','B'], number=10):
    """
    take a list of colors and total bag count.
    Create a bag of evenly distributed marbles.
    """
    bag = []
    for c in colors:
        for _ in range(number//len(colors)):
            bag.append(c)
    
    random.shuffle(bag)
    return bag


def marbleDraws(times=1000):
    """
    Execute a marble draw experiment and return the results as a Series
    """
    draws = []
    m = createMarbles()
    for _ in range(times):
        draws.append(random.choice(m))
    return pd.Series(draws).value_counts()


**Validation**

In [0]:
from plotchecker import BarPlotChecker
pc = BarPlotChecker(barPlotter(marbleDraws(), 'marble draws'))
pc.assert_num_bars(2)

## Exercise 4

Code an experiment where you draw a first, then draw a second marble and replace them after, and visualize the results for 1000 draws with a barplot.

### Student Solution

In [0]:
# Your Answer Goes Here

### Answer Key

**Solution**

In [0]:
def marbleReplace(times=1000):
    """
    Execute a marble draw experiment and return the results as a Series
    """
    draws = []
    for i in range(times):
        m = createMarbles()
        m1 = m.pop()
        m2 = m.pop()
        draws.append((m1, m2))
    return pd.Series(draws).value_counts()

**Validation**

In [0]:
from plotchecker import BarPlotChecker
pc = BarPlotChecker(barPlotter(marbleReplace(), '2 marble draws'))
pc.assert_num_bars(4)

## Exercise 5

Use the plot below to visually estimate the probability of flipping exactly 50 heads.

In [0]:
random.seed(0)
sns.distplot(np.random.binomial(n=100, p=0.5, size=10000), hist=False)
plt.show()

### Student Solution

In [0]:
# Your Answer Goes Here
answer = 'some float'

### Answer Key

**Solution**

In [0]:
answer = .079

**Validation**

In [0]:
assert answer >= 0.075 and answer <= 0.085

## Exercise 6: The Cookie Problem


There are 2 jars of cookies, Jar A has 30 Chocolate, and 10 Raisin, and Jar B has 20 of each. You then select 1 cookie at random, it happens to be Chocolate. 

What is the probability the chocolate cookie came from Jar A?  Provide your answer as a fraction.

>***Hint: Use Bayes' Theorem***

### Student Solution

In [0]:
# Your answer goes here
answer = 'some number/some other number'

$P(J_{A}|C) =\frac{P(J_{A})P(C|J_{A})}{P(C)}=\frac{(\frac{1}{2})(\frac{3}{4})}{\frac{5}{8}}=3/5$

### Answer Key

**Solution**

In [0]:
answer = 3/5

**Validation**

In [0]:
assert answer == 3/5

## Exercise 7: Monty Hall Problem


Monty Hall was a gameshow host, and as such is the problem which is solvable only by using information in a Bayesian approach.  The problem states there are 3 doors to choose from. 1 of the doors has a prize, and the other two have duds.  You choose your a door, Monty then offers to reveal one of the doors which is not a prize, and is not the door you choose.  You then have the choice of sticking with your original door or switching.


- Do you switch doors?
- What is the probability of getting the prize if you switch doors?

<a href="https://commons.wikimedia.org/wiki/File:Monty_open_door.svg#/media/File:Monty_open_door.svg"><img align="center" src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Monty_open_door.svg/1200px-Monty_open_door.svg.png" alt="Monty open door.svg" height="100px"><a/>

### Student Solution

In [0]:
answer = ('yes or no', 'some number/some other number')

### Answer Key

**Solution**

In [0]:
answer = ('Yes', 2/3)

**Validation**

In [0]:
assert answer[0].lower() in ['y', 'yes'] and answer[1]==2/3

## Exercise 8: Probability Mass Function

$P(k | p,n) = \binom{n}{k} (p)^{k}(1-p)^{n-k}$

Code the formula above and calculate the likelihood of exactly 55 heads in a 100 coin toss experiment.

*** *hint look at [SciPy](https://docs.scipy.org)***
🔬🐍

### Student Solution

In [0]:
# Your answer goes here

answer = 'some float'

### Answer Key

**Solution**

In [0]:
from scipy.special import comb
N=100
k=55
p=0.5

answer = comb(N,k)*(.5**k)*(1-p)**(N-k)
print(answer)

**Validation**

In [0]:
assert answer >= 0.0484 and answer <= 0.0485