# Probability distributions: a quick refresher

<div class="alert alert-warning">
<h3>Goal of this session:</h3>

The following tasks are aimed to refresh your mind on some of the key statistical distributions that you will encounter in this course. The insight gained from this session will help you understand:
- the concept of probability distributions
- how distributions are changed by different parameters
- how some distributions are related to each other

Your observations here will help you to have an smooth learning experience when you get to Bayesian statistics in the next session.
</div>

You will use [SciPy statistical function](https://docs.scipy.org/doc/scipy/reference/stats.html) module and work with several probability distributions through the following steps: 

- Generating the mean, variance, skewness, and kurtosis of a distribution
- Drawing sample from a distribution and plotting them

Let's start by importing the necessary libraries.

In [None]:
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import uniform, beta, bernoulli, norm, t, gamma, expon, binom

In [None]:
# define some style for the plots
pvars = dict(histtype='step', bins='auto', density=True, linewidth=1.5)

## Example: Normal distribution
> Use `norm` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm)

In [None]:
# set the mean and standard deviation
m, s = 0, 1 

# find the theoretical properties
mean, var, skew, kurt = norm.stats(m, s, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from normal(m,s) distribution 
sample = norm.rvs(m, s, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Normal PDF')
plt.show()

## Exercises

Run the following codes by filling in the empty spaces indicated by `< >`. Then answer the questions in each Task.

### Students' T-distribution
> Use `t` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html#scipy.stats.t)

In [None]:
# set the degrees of freedom (d>0)
d = < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from Students' T-distribution
sample = < >.rvs(< >, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Students-T PDF')
plt.show()

<div class="alert alert-info">
<h4>Task 1</h4>

Let's vary the `degrees of freedom` and see how the distribution changes.
In particular, compare the outcome with the normal distribution.<br>
</div>

**Answer**

write your answer here

### Uniform distribution
> Use `uniform` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html#scipy.stats.uniform)

In [None]:
# set the boundaries (-inf < a < b < inf)
a, b = < >, < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >, < >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from uniform(a,b) distribution 
sample = < >.rvs(< >, < >, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Uniform PDF')
plt.show()

### Beta distribution
> Use `beta` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html#scipy.stats.beta)

In [None]:
# set the parameters (a>0, b>0)
a, b = < >, < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >, < >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from beta(a,b) distribution 
sample = < >.rvs(< >, < >, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Beta PDF')
plt.show()

<div class="alert alert-info">
<h4>Task 2</h4>
Let's vary the `a` and `b` parameters and see how the distribution changes:

1. Try `a=b=1`, and compare the distribution with the Uniform distribution.
2. Try `a > b`, what do you learn from the distribution? 
3. Try `a < b`, what do you learn from the distribution?
4. Try `a = b`, what do you learn from the distribution? 
<br>
</div>

**Answer**

write your answer here

<div class="alert alert-info">
<h4>Task 3</h4>

Redo the uniform distribution for `a = 0, b = 1`.  What is the range of values on the x-axis that both Uniform(0,1) and Beta distributions can take? 
<br>
</div>

**Answer**

write your answer here

### Exponential distribution
> Use `expon` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html#scipy.stats.expon)

In [None]:
# set the scale parameter (s>0)
s = < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(loc=0, scale=< >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from Exponential(s) distribution 
sample = < >.rvs(loc=0, scale=< >, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Exponential PDF')
plt.show()

### Gamma distribution
> Use `gamma` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html#scipy.stats.gamma)

In [None]:
# set the shape and scale parameters (a>0, s>0)
a, s = < >, < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >, loc=0, scale=< >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from gamma(a,s) distribution 
sample = < >.rvs(< >, loc=0, scale=< >, size = (10000,1)) 

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Gamma PDF')
plt.show()

<div class="alert alert-info">
<h4>Task 4</h4>

1. Vary `a` and `s` parameters and see how the Gamma distribution changes.

2. Try `a=1` in the Gamma distribution, and set `s` to a value of your choice in both Exponential and Gamma distributions. Do both distributions look similar?
<br>
</div>

**Answer**

write your answer here

<div class="alert alert-info">
<h4>Task 5</h4>


What is the range of values on the x-axis that both Exponential and Gamma distributions can take?
<br>
</div>

**Answer**

write your answer here

### Bernoulli distribution
> Use `bernoulli` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bernoulli.html#scipy.stats.bernoulli)

In [None]:
# set the probability of success (0<=p<=1)
p = < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from bernoulli(p) distribution 
sample = < >.rvs(< >,size = (10000,1))

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Bernoulli PMF')
plt.show()

### Binomial distribution
> Use `binom` function. Learn more about it [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html#scipy.stats.binom)

In [None]:
# set the number of trials and probability of success (n>0, 0<=p<=1)
n, p = < >, < >

# find the theoretical properties
mean, var, skew, kurt = < >.stats(< >,< >, moments='mvsk')
display(mean, var, skew, kurt)

In [None]:
# drawing 10000 samples from Binomial(n,p) distribution 
sample = < >.rvs(< >,< >,size = (10000,1))

# plot probability distribution
plt.hist(sample, **pvars)
plt.xlabel('sample values')
plt.title('Binomial PMF')
plt.show()

<div class="alert alert-info">
<h4>Task 6</h4>

1. Vary the `n` and `p` parameters and see how the Binomial distribution changes.

2. Try `n=1` in the Binomial distribution, and set `p` to a value of your choice in both Bernoulli and Binomial distributions. Do both distributions look similar?
<br>
</div>

**Answer**

write your answer here

<div class="alert alert-info">
<h4>Task 7</h4>

What is the range of values on the x-axis that both Bernoulli and Binomial distributions can take?
<br>
</div>

**Answer**

write your answer here

<div class="alert alert-info">
<h4>Bonus Question</h4>

1. Write a code that empirically show the sum of five idential and independent Bernoulli random variables with `p=0.5` is a Binomial random variable with `n=5` and `p=0.5`.

<br>
</div>

<div class="alert alert-info">

2. Write a code that returns the $\alpha$ and $\beta$ in the Beta distribution based on a given mean and variance of this distribution.

<br>
</div>