# <center>Discrete Probability Distributions</center>
## <center>PMF and CDF, Expected Value and Variance </center>


## <center>Learning Outcomes</center><br>
<center> Students know and can apply Probability Mass and Cumulative Density Functions to find likelihood of events occurring in the population, explain the relationship between statistics and parameters, use specific distribution functions for Bernoulli, Binomial, and Poisson scenarios to find likelihood of events. </center>

<center>Import libraries.

In [None]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import pandas as pd
from itertools import *

## One Die

In [None]:
die = [1,2,3,4,5,6]

What are the possible outcomes? How many ways can they occur?

In [None]:
outcomes = dict(Counter(die))
outcomes

What are the probabilities of each outcome?

In [None]:
probs = np.divide(list(outcomes.values()),len(outcomes))
probs

What if we want to plot the probabilities?

### PMF - Probability Mass Function

Function to plot PMF.

In [None]:
def plot_pmf(outcomes, probs, xlabel, title):
    plt.figure(figsize=(10,10))
    plt.plot(list(outcomes),probs, 'ro', ms=12, mec='g', color='green')
    plt.vlines(list(outcomes),0,probs)
    plt.ylabel('Prob')
    plt.xlabel(xlabel)
    plt.xticks(list(outcomes))
    plt.title(title)
    plt.show()

In [None]:
plot_pmf(die, probs, 'Die roll', 'PMF for One Die')

What are the odds of rolling a 1 or 2? What about rolling 4 or less?

### CDF - Cumulative Distribution Function

In [None]:
cumulative_probs = np.cumsum(probs)
print('Cumulative probabilities:', cumulative_probs)

In [None]:
def plot_cdf(outcomes, cumulative_probs, xlabel, title):
    plt.figure(figsize=(10,10))
    plt.plot(list(outcomes),cumulative_probs, 'ro', ms=12, mec='g', color='green')
    plt.step(list(outcomes), cumulative_probs, color='green', where='post')
    plt.ylabel('Prob')
    plt.xlabel(xlabel)
    plt.xticks(list(outcomes))
    plt.title(title)

In [None]:
plot_cdf(die, cumulative_probs, 'Die roll', 'One Die Roll CDF')

What can I expect to get on average when I roll one die?

### Expected Value

In [None]:
expected_value = np.sum(np.multiply(list(outcomes),probs))
print('Expected Value:',expected_value)

How close to the expected value will each dice roll be?

### Variance and Standard Deviation

In [None]:
squared_differences = np.subtract(die,expected_value)**2
variance = np.sum(squared_differences)/6
print('Variance:',variance)

In [None]:
np.var(die)

In [None]:
std = np.sqrt(variance)
print('Standard Deviation:', std)

Bringing it all together in a table.

In [None]:
## takes in a list of outcome values and the corresponding probablities
## outputs a probability distribution table
def distribution_stats(outcomes, probs):
    ev = np.sum(np.multiply(outcomes,probs))
    df = pd.DataFrame()
    for outcome,prob in zip(outcomes,probs):
        df[outcome]=[prob, outcome*prob, outcome-ev, (outcome-ev)**2, (outcome-ev)**2*prob]
    df.set_index(pd.Series(['P(x)', 'E(X)', 'X-E(X)', '(X-E(X))^2', '(X-E(X))^2*P(x)']), inplace=True)
    df['Sum']=df.sum(axis=1)
    df_sum = pd.DataFrame()
    df_sum['Variance'] = [df.iloc[-1]['Sum']]
    df_sum['Standard Deviation'] = [np.sqrt(df_sum['Variance'][0])]
    df_sum.reset_index(drop=True, inplace=True)
    display(df.round(3),df_sum.round(3))

In [None]:
distribution_stats(die, probs)

What if the odds aren't all equal?

## Loaded Die

In [None]:
loaded_die_probs = [1/12, 1/12, 4/12, 4/12, 1/12, 1/12]

In [None]:
plot_pmf(die, loaded_die_probs, 'Die roll', 'Loaded Die Roll PMF')

In [None]:
loaded_cumulative_probs = np.cumsum(loaded_die_probs)
loaded_cumulative_probs

In [None]:
plot_cdf(die, loaded_cumulative_probs, 'Die roll', 'Loaded Die Roll CDF')

In [None]:
distribution_stats(die, loaded_die_probs)

Same expected value, difference variance.

## Two Dice

In [None]:
[x for x in product(die,repeat=2)]

In [None]:
two_dice_sums = [np.sum(i) for i in [x for x in product(die,repeat=2)]]
two_dice_sums

In [None]:
two_dice_outcomes = dict(Counter(two_dice_sums))
two_dice_outcomes

In [None]:
two_dice_probs = np.divide(list(two_dice_outcomes.values()),len(two_dice_sums))
two_dice_probs

In [None]:
plot_pmf(two_dice_outcomes, two_dice_probs, 'Dice roll sum', 'Two Dice Roll PMF')

In [None]:
two_dice_cum_probs = np.cumsum(two_dice_probs)
print('Cumulative probabilities:', two_dice_cum_probs)

In [None]:
plot_cdf(two_dice_outcomes, two_dice_cum_probs, 'Die roll sum', 'Two Dice Roll CDF')

In [None]:
distribution_stats(list(two_dice_outcomes), two_dice_probs)

## Three Dice

In [None]:
three_dice_sums = [np.sum(i) for i in [x for x in product(die,repeat=3)]]
three_dice_outcomes = dict(Counter(three_dice_sums))
three_dice_outcomes

In [None]:
three_dice_probs = np.divide(list(three_dice_outcomes.values()),len(three_dice_sums))
three_dice_probs

In [None]:
plot_pmf(three_dice_outcomes, three_dice_probs, 'Dice roll sum', 'Three Dice Roll PMF')

In [None]:
three_dice_cum_probs = np.cumsum(three_dice_probs)
print('Cumulative probabilities:', three_dice_cum_probs)

In [None]:
plot_cdf(three_dice_outcomes, three_dice_cum_probs, 'Dice roll sum', 'Three Dice Roll CDF')

In [None]:
distribution_stats(list(three_dice_outcomes), three_dice_probs)

Suppose now we roll a die. If we roll a one, then we roll a second time. What would this probability distribution look like?

By hand.

In [None]:
outcomes = [2,3,4,5,6,7]
probs = [2/11,2/11,2/11,2/11,2/11,1/11]
print(probs)
distribution_stats(outcomes,probs)

In [None]:
plot_pmf(outcomes, probs, 'Roll', 'PMF')

In [None]:
cum_probs = np.cumsum(probs)
plot_cdf(outcomes, cum_probs, 'Roll', 'CDF')

# Activity

You flip a coin. If it's heads, you pick a prize from box A. If it's tails, you pick a prize from box B. <br><br>
- Box A has two bags inside. One with 3 coins and one with 5 coins.<br>
- Box B has four bags inside. The bags have 1, 2, 3, and 4 coins. <br><br>

1) Build a distribution table by hand to calculate variance and standard deviation.
       Use the `distribution_stats` function to check your answer. <br> <br>
2) Graph the PMF and CDF for the outcomes.<br> <br>
3) What are the odds you get between 2 and 4 (inclusive) coins? <br><br>

<b>Bonus:</b> You now get two chances to pick a prize. What is the distribution of total tokens you will recieve assuming you don't replace the first bag you select?