In [None]:
# Import some basic libraries
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_context('paper')

# Hands-On Activity 3: Discrete Random Variables

## Objectives
+ To develop intuition about the probability mass function.
+ To learn about the Binomial distribution.
+ To learn about the Poisson distribution.

## The Bernoulli Distribution

The Bernoulli distribution generalizes the concept of a coin toss.
You can think of it as the result of an experiment with two possible outcomes $0$ and $1$.
One just needs to specify the probability of one of the outcomes, typically the probability of zero.
So, how do we denote mathematically a Bernoulli random variable $X$ that takes the value $1$ with probability $\theta$ in $[0,1]$?
We can write:
$$
X = \begin{cases} 1,\;\text{with probability}\;\theta,\\
0,\;\text{otherwise}.
\end{cases}
$$
Notice that in defining this random variable we are ignoring the mechanism that is giving rise to it.
This is ok. It just means that we have decided to not look into it.
The other way we can write this is as follows:
$$
X \sim \operatorname{Bernoulli}(\theta).
$$
Let's use the functionality of ``scipy.stats`` to define a Bernoulli random variable and sample from it.

In [None]:
# Import the scipy.stats library
import scipy.stats as st
# This is the probability of 1:
theta = 0.6
# Define the random variable, Bernoulli(theta)
X = st.bernoulli(theta)

In [None]:
# Here is the **support** of the random variable. It tells you which variables it takes:
print('X takes values in', X.support())

In [None]:
# Evaluate the probability mass function at every point of the support
for x in X.support():
    print('p(X={0:d}) = {1:1.2f}'.format(x, X.pmf(x)))

In [None]:
# The expectation of the Bernoulli:
print('E[X] = {0:1.2f}'.format(X.expect()))

In [None]:
# The variance of the Bernoulli:
print('V[X] = {0:1.2f}'.format(X.var()))

In [None]:
# Sample the random variable 10 times:
xs = X.rvs(size=10)
print(xs)

In [None]:
# Let's plot the histogram of these samples (simply counts how many samples are zero or one)
fig, ax = plt.subplots(dpi=150)
ax.hist(xs)
ax.set_xlabel('$x$')
ax.set_ylabel('Counts')

In [None]:
# Compare the histogram to the graph of the probability mass function:
fig, ax = plt.subplots(dpi=150)
ax.bar(X.support(), X.pmf(X.support()))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

### Questions
+ Modify the code above to take $1000$ samples from the random variable instead of just 10.
+ Rerun the code above for $\theta = 0.8$.

## The Categorical Distribution

This is a generalization of the Bernoulli also known as *multinulli*.
It is the probability distribution assigned to a random variable taking $K$ different values each one with a given, but different, probability.
It is:
$$
p(X=k) = p_k.
$$
For example, if all the different values are equally probably, then we could have:
$$
p(X=k) = \frac{1}{K}.
$$
Let's see how we can sample from it.

In [None]:
# Just pick some probabilities
ps = [0.1, 0.3, 0.4, 0.2] # this has to sum to 1
# And here are the corresponding values
xs = np.array([1, 2, 3, 4])
# Here is how you can define a categorical rv:
X = st.rv_discrete(name='Custom Categorical', values=(xs, ps))

In [None]:
# You can sample from it
print(X.rvs(size=10))

In [None]:
# You can get the pmf:
print(X.pmf(2))

In [None]:
# You can get expectations:
print(X.expect())

In [None]:
# You can get the variance:
print(X.var())

In [None]:
# Let's plot the PMF
fig, ax = plt.subplots(dpi=150)
ax.bar(xs, X.pmf(xs))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

Let's now compute the expectation of a function of $X$.
Say, $Y = e^X$.
Of course, theoretically we have:
$$
\mathbb{E}[e^X] = \sum_x e^x p(x).
$$
Here is how you can do this sumation mannually:

In [None]:
E_eX = np.sum(np.exp(xs) * X.pmf(xs))
print('E[exp(X)] = {0:1.2f}'.format(E_eX))

What if we wanted to find the variance of $Y = e^X$.
We have to use this formula:
$$
\mathbb{V}[e^X] = \mathbb{E}[e^{2X}] - (\mathbb{E}[e^X])^2.
$$
Let's do it:

In [None]:
E_e2X = np.sum(np.exp(2.0 * xs) * X.pmf(xs))
V_eX = E_e2X - E_eX ** 2
print('V[exp(X)] = {0:1.2f}'.format(V_eX))

### Questions

+ Rerun all code segements above for the Categorical $X\sim \operatorname{Categorical}(0.1, 0.1, 0.4, 0.2, 0.2)$ taking values $1, 2, 3, 4$ and $5$.
+ Write code that finds the expectation of $\sin(X)$.
+ Write code that finds the variance of $\sin(X)$.

## The Binomial Distribution

Suppose that you tossing $n$ times a coin with probability of heads $\theta$ and let $X$ be the number of heads.
The random variable $X$ is called the binomial random variable.
We write:
$$
X\sim B(n, \theta).
$$
It is easy to show that its pmf is:
$$
p(X = k) = {n\choose{k}}\theta^k(1-\theta)^{n-k},
$$
where ${n\choose{k}}$ is the number of $k$ combinations out of $n$ elements:
$$
{n\choose{k}} = \frac{n!}{k!(n-k)!}.
$$

In [None]:
# Here is how to define the binomial in scipy.stats:
n = 5       # Performing the experiment n times
theta = 0.6 # Probability of sucess its time
X = st.binom(n, theta) # Number of successes

In [None]:
# Here are some samples
print(X.rvs(100))

In [None]:
# Here is the expectation
print('E[X] = {0:1.2f}'.format(X.expect()))

In [None]:
# Here is the variance
print('V[X] = {0:1.2f}'.format(X.var()))

In [None]:
# Let's draw the pmf
fig, ax = plt.subplots(dpi=100)
xs = range(n)
ax.bar(xs, X.pmf(xs))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

### Questions

+ Start increasing the number of trials $n$. Gradually take it up to $n=100$. How does the resulting pmf look like?
This starts to look like a bell curve. And indeed it is!

## The Poisson Distribution

The Poisson distribution models the number of times an event occurs in an interval of space or time.
For example, a Poisson random variable $X$ may be:

+ The number earthquakes greater than 6 Richter occuring over the next 100 years.
+ The number of major floods over the next 100 years.
+ The number of patients arriving at the emergency room during the night shift.
+ The number of electrons hitting a detector in a specific time interval.

The Poisson is a good model when the following assumptions are true:
+ The number of times an event occurs in an interval takes values $0,1,2,\dots$.
+ Events occur independently.
+ The probability that an event occurs is constant per unit of time.
+ The average rate at which events occur is constant.
+ Events cannot occur at the same time.

When these assumptions are valid, we can write:
$$
X\sim \operatorname{Pois}(\lambda),
$$
where $\lambda>0$ is the rate with each the events occur.
The pmf of the Poisson is:
$$
p(X=k) = \frac{\lambda^ke^{-\lambda}}{k!}.
$$
The expectation of the Poisson is:
$$
\mathbb{E}[X] = \sum_{k=0}^\infty k p(X=k) = \lambda.
$$
The variance is:
$$
\mathbb{V}[X] = \dots = \lambda.
$$

Let's look at a specific example.
Historical data show that at a given region a major earthquake occurs once every 100 years on average.
What is the probability that $k$ such earthquakes will occur within the next 100 years.
Let $X$ be the random variable corresponding to the number of earthquakes over the next 100 years.
Assuming the Poisson model is valid, the rate parameter is $\lambda = 1$ and we have:
$$
X\sim \operatorname{Pois}(1).
$$
The probabilities are:

In [None]:
X = st.poisson(1.0)
ks = range(6)
fig, ax = plt.subplots(dpi=150)
ax.bar(ks, X.pmf(ks))
ax.set_xlabel('Number of major earthquakes in next 100 years')
ax.set_ylabel('Probability of occurance');

### Questions

+ How would the rate parameter $\lambda$ change if the rate with each major earthquakes occured in the past was 2 every 100 years? Plot the pmf of the new Poisson random variable. You may have to add more points in the x-axis.