In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# reset defalult plotting values
plt.rcParams['figure.figsize'] = (15, 5)
plt.rc('font', family='sans-serif')
plt.rc('axes', labelsize=14)
plt.rc('axes', labelweight='bold')
plt.rc('axes', titlesize=16)
plt.rc('axes', titleweight='bold')
plt.rc('axes', linewidth=2)
plt.rc('xtick',labelsize=14)
plt.rc('ytick',labelsize=14)

# Statistical Distributions
## 

![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/2-Dice-Icon.svg/200px-2-Dice-Icon.svg.png)

### Prof. Robert Quimby
&copy; 2020 Robert Quimby

## In this tutorial you will...

- Consider how the randomness of nature can lead identical experiments to different outcomes
- Calculate the odds for different outcomes of random trials
- Compare the binomial, Poisson, and Gaussian distributions

For introductions to basic concepts of statistics used in astronomy see:
- "Data Reduction and Error Analysis for the Physical Sciences," (Bevington & Robinson 1992)
- "An Introduction to Error Analysis," (Taylor 1997)

## How Many Grains of Sand Are There in a Teaspoon?

![](media/sand.png)

In [None]:
samples = np.genfromtxt('media/sand.dat')
plt.plot(samples, 'ro')
plt.xlabel('Sample Number')
plt.ylabel('Grains of Sand');

In [None]:
m = ????
std = ????
print("a teaspoon of sand contains {:.0f} +/- {:.0f} grains".format(round(m, -1), round(std, -1)))

## Measurement Uncertainty

 * Experiments do not produce exact results
 * If you repeat the same experiment over and over, you are bound to get different results
 * Therefore results must always be accompanied with an uncertainty measurement so we can determine if two different measurements are consistent, or if a measurement is consistent with a theoretical prediction

## Lets play dice

![](https://upload.wikimedia.org/wikipedia/commons/7/77/Nuvola_apps_atlantik.png)

If I have a normal 6-sided die, what is the probability that I will roll a three?

![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/2-Dice-Icon.svg/200px-2-Dice-Icon.svg.png)

If I roll two regular, 6-sided dice, what is the probability that I will roll exactly **two** threes (one on each die)?

What is the probability that I will roll exactly **one** three?

In [None]:
prob = ????
print("probability of rolling exactly one three is {:0.3}".format(prob))

If I have $n$ dice what is the probability that I will roll $k$ threes?

## Binomial Distribution

The probability of $k$ successes in $n$ trials is

$$b_{n, p}(k) = \left({ n! \over k!(n-k)!}\right)p^{k}(1-p)^{n-k}$$

where:
 * $p$ is the probability of success in 1 trial
 * $n$ and $k$ are integers
 * $n \geq k \geq 0$

### Odds of rolling 0, 1, 2, or 3 sixes on 3 dice

In [None]:
# number of trials (dice)
n = ???
# probability of success for one trial (on each die)
p = ???
# binomial probability
from scipy.stats import binom
k = ???
b_np = ???

In [None]:
# plot the distribution
plt.bar(k, b_np, width=0.5)
plt.title("n={}; p={:.4f}".format(n, p))
plt.xlabel('Number of Sixes'.format(n))
plt.ylabel('Fraction of Times Rolled')
for x, y in zip(k, b_np):
    plt.annotate("{:.1f}%".format(100 * y), xy=(x, y), va='bottom', ha='center')

## Expectation Value

#### If you run an experiment a large number of times, the average result is the expectation value.

The expectation value, $E[f(x)]$, is a weighted average of all possible outcomes.

$$E[f(x)] = \sum f(x) P(x)$$

where:
 * $f(x)$ gives the value of the function at $x$
 * $P(x)$ gives the probability for each value of $x$

For the binomial distribution in $k$:
 * $f(x) = k$
 * $P(x) = \left({ n! \over k!(n-k)!}\right)p^{k}(1-p)^{n-k}$

**Example**: If you have a one in six chance of rolling a six ($p=1/6$) and you roll three dice ($n=3$), how many sixes do you expect to get?

In [None]:
# expectation value
print("Expected number of sixes = {}".format(????))

In [None]:
# actual average from 10,000 tests
bdist = ????
print("Average number of sixes rolled: {:.3f}".format(bdist.mean()))

## Binomial Distributing for large $n$ (and small $p$)

### Consider a warm cloud of gas

In [None]:
# number of excited atoms in a gas cloud
n = 1e6
# probability that any given atom will decay during some time interval
p = 2.5e-6 
k = np.arange(n + 1)
b_np = ????

In [None]:
# plot the distribution
w = b_np > 1e-4
plt.bar(k[w], b_np[w], width=0.5)
plt.title("n={:.2e}; p={:.2e}".format(n, p))
plt.xlabel('Total Number of Photons Detected in Time Interval'.format(n))
plt.ylabel('Probability')
for x, y in zip(k[w], b_np[w]):
    plt.annotate("{:.1f}%".format(100 * y), xy=(x, y), va='bottom', ha='center')

### Expected number of photons

Recall, the number of excited atoms in a gas cloud is $n = 10^6$ and the probability that any given atom will decay during some time interval is $p = 2.5 \times 10^{-6}$.

So what is the expected number of photons in the given time interval?

In [None]:
print("Expected number of Photons = {}".format(????))
bdist = ????
print("Average number of Detected in 10,000 trials: {:.3f}".format(bdist.mean()))

## Poisson Distribution

* Binomial distribution with
  * $n \rightarrow \infty$

$$P_{\mu}(k) = {e^{-\mu} \mu^k \over k! } $$

where
 * $\mu = np$ (a positive, real number) is the expected mean number of counts in the given time interval
 * $k$ is an integer

In [None]:
# re-plot the distribution
plt.bar(k[w], b_np[w], width=0.5)
plt.ylabel('Probability')
plt.xlabel('Total Number of Photons Detected in Time Interval')
plt.title("n={:.2e}; p={:.2e}".format(n, p))

# show a Poisson distribution
from scipy.stats import poisson
p_np = ????
plt.plot(p_np, 'r--', linewidth=5, label='Poisson Distribution')
plt.legend();

## Poisson Distribution for Large $\mu$

In [None]:
# increase the expectation value by a factor of 100
mu = 100 * n * p
p_mu = ????
plt.plot(p_mu, 'r--', linewidth=5, label='Poisson Distribution')
plt.title("$\mu$ = {:.2f}".format(mu))
plt.ylabel('Probability')
plt.xlabel('Total Number of Photons Detected in Time Interval')
plt.legend();

In [None]:
plt.plot(p_mu, 'r--', linewidth=5, label='Poisson Distribution')
plt.title("$\mu$ = {:.2f}".format(mu))
plt.ylabel('Probability')
plt.xlabel('Total Number of Photons Detected in Time Interval')

from scipy.stats import norm
x = np.linspace(1, 500, 1000)
y = ????
plt.plot(x, y, 'g', linewidth=3, label='Gaussian Distribution')
plt.legend();

## Gaussian Distribution

$$ G_{\mu \sigma}(x) = { 1 \over \sigma \sqrt{2\pi}} e^{-{(x - \mu)^2 \over 2\sigma^2}}$$

where:
  * $x$ is a real number (positive or negative)
  * $\mu$ is the expectation value (center of the distribution)
  * $\sigma$ is the standard deviation of the distribution