# Discrete Probability Distributions

**OBJECTIVES**
- Model with discrete probability distributions
- Use `scipy.stats` to create discrete distributions
- Use `.pdf, .cdf` methods of distributions

### Widgets

In a terminal please run the following

```
conda install -c conda-forge nodejs
jupyter labextension install @jupyter-widgets/jupyterlab-manager
```

Restart your JupyterLab instance and run the cell below.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
import seaborn as sns

from ipywidgets import interact
import ipywidgets as widgets

### Descriptive Statistics Review

**REVIEW**

Write a function that takes in a list and returns the arithmetic mean of that list. (no `numpy`!)

In [None]:
#function for mean


In [None]:
list_1 = [5, 5, 5, 5, 5]
list_2 = [3, 4, 5, 6, 7]
list_3 = [1, 3, 5, 7, 9]

In [None]:
#list comprehension to apply your function


### Variance

**IN WORDS**: 

<center>
    Find the difference between each data point and the mean, square that value, find the average of these values.
</center>

**IN SYMBOLS**: $$\frac{1}{n}\sum_{i = 1}^{n} (x_i - \mu)^2$$

In [None]:
list_1

In [None]:
#mean of list 1


In [None]:
#variance of list 1


In [None]:
#function for variance


In [None]:
#find the variance of our lists above


In [None]:
#interpret these values


### Standard Deviation

The square root of the variance -- puts things back in terms of the original unit.

In [None]:
#function for square root


In [None]:
#evaluate on our lists


**PROBLEMS**

1. Use the list of player ages below to compute the mean and standard deviation of the data.  
2. Determine the age range within 1.5 standard deviation of the mean. 

In [None]:
player_ages = [21, 21, 22, 23, 24, 24, 25, 25, 28, 29, 29, 31, 32, 33, 33, 34, 35, 36, 36, 36, 36, 38, 38, 38, 40]

### Probability Mass Functions

$$f(\text{some outcome}) = \text{probability of that outcome}$$

We will care about matching the right probability distribution with a given scenario.  Today we introduce some primary distributions with discrete value inputs.

### Example I: Bernoulli Trial

One event with a binary outcome and a probability of success (and failure).

| outcome | probability |
| --------  | --------  |
|  Heads  | 0.3. |
| Tails |  0.7  |

In [None]:
import scipy.stats as stats

In [None]:
#distribution to model


In [None]:
#probability of failure


In [None]:
#probability of success 


In [None]:
#variance of the trial


In [None]:
#standard deviation of the trial


In [None]:
# #plot
# x = np.arange(2)
# plt.plot(x, bernoulli_dist.pmf(x), 'o')
# plt.vlines(0, 0, .7)
# plt.vlines(1, 0, .3)

### An Old Game: Sennet

We have some number of popsicle sticks colored blue or red on different sides.  We drop them and explore the possible outcomes.  Imagining each outcome is equally likely, please determine the following:

- Drop 1 stick, $P(R)$
- Drop 1 stick, $P(B)$
- Drop 2 sticks, what are all possible outcomes? $P(\text{one red one blue})$?
- Drop 3 sticks, what are all the possible outcomes? $P(\text{BBB})$?

In [None]:
#define combinations
#from 3 sticks, how many ways are there
#to land all blue
from scipy.special import comb
#from 3 sticks how many ways are there for 3 "successes"


In [None]:
#examine outcomes for 3 coins


In [None]:
#determine probabilities for each


In [None]:
#make a bar plot of probabilities


### Binomial Distribution

Used to model repeated Bernoulli trials.  For example, toss a coin four times.  Its probability mass function is given by:

$$\displaystyle f(k,n,p)=\Pr(k;n,p)=\Pr(X=k)={\binom {n}{k}}p^{k}(1-p)^{n-k}$$



In [None]:
#define binomial


In [None]:
#probability of 2 heads


In [None]:
#probability of 3 heads


In [None]:
#define range of all possible outcomes


In [None]:
#plot pmf


In [None]:
#probability of no more than 2 heads
###p(0)
p0 = binom.pmf(0)
###p(1)
p1 = binom.pmf(1)
###p(2)
p2 = binom.pmf(2)
p0 + p1 + p2

### Cumulative Distribution Function

Evaluates the cumulative probability up to a given value.  Formally it would be the sum or integral of probabilities until some value $x_i$. 

In [None]:
###Cumulative distribution function
binom.cdf(2)

In [None]:
###plot side by side pmf and cdf
fig, ax = plt.subplots(nrows = 1, ncols = 2, figsize = (15, 4))
ax[0].bar(x, binom.pmf(x))
ax[0].set_title('PMF')

ax[1].plot(x, binom.cdf(x))
ax[1].set_title('CDF')

**PROBLEMS**

[source](https://openstax.org/books/introductory-statistics/pages/4-3-binomial-distribution)

Here are some good old fashioned math problems.  Use our `scipy` distributions to solve each below.  For extra bonus, add a plot and highlight the area or areas of interest.

1. A trainer is teaching a student to do tricks. The probability that the student successfully performs the trick is 35%, and the probability that the student does not successfully perform the trick is 65%. Out of 20 attempts, you want to find the probability that the student succeeds 12 times.

2. A fair, six-sided die is rolled ten times. Each roll is independent. You want to find the probability of rolling a one more than three times.

3. Approximately 70% of statistics students do their homework in time for it to be collected and graded. Each student does homework independently. In a statistics class of 50 students, what is the probability that at least 40 will do their homework on time?



### Problems with Data

Return to the titanic data.  

1. How many men were on the titanic?
2. Plot a binomial distribution with $n = \text{number of men on titanic}$, and probability of $\frac{1}{2}$.  
3. How many men died on the titanic?
4. Locate this outcome on your probability distribution.  Is it unlikely?  Why?