#### Programming notes
The option `edgecolor='k'` defines the edge of each bar with a black line; the default behavior is to omit this. Delete this option and run the cell again to see how the appearance changes.

### The standard normal distribution
NumPy offers random number generators that produce numbers with many other distributions, but the most important of these is [`randn`](https://www.numpy.org/doc/1.16/reference/generated/numpy.random.randn.html), which produces samples from the *standard normal distribution*,
$$\mathcal{N}(x;\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right],$$
in which $\mu = 0$ and $\sigma^2=1$. The cell below uses `randn` to create an array of 1000 normally distributed random numbers and plot them as a histogram.

In [1]:
N = 1000
random.seed(0)
x = random.randn(N)

plt.hist(x)
plt.xlabel('x')
plt.ylabel('Occurrence');

NameError: name 'random' is not defined

## Python basics

Review the following documentation pages to familiarize yourself with the basic elements of the Python language, integrated development environment (IDE) and Python libraries for data analysis.

* [Python Lectures](https://sfu.syzygy.ca/jupyter/hub/user-redirect/git-pull?repo=https://github.com/nleehone/PythonLectures.git&branch=master)

Additional resources: 
* [Getting Started With Jupyter Notebook for Python](https://medium.com/codingthesmartway-com-blog/getting-started-with-jupyter-notebook-for-python-4e7082bd5d46)

For MATLAB users:

* [NumPy for Matlab users](https://www.numpy.org/devdocs/user/numpy-for-matlab-users.html)

Python online documentation: 
* [Python 3.7.4 documentation](https://docs.python.org/3/tutorial/)

Sections in the worksheet are numbered to be consistent with the text. Please read the corresponding section of the book before beginning the associated section of the worksheet and then do the questions, using the book as a reference as you work. 

 The *standard error*, denoted by $\alpha$ in *Measurements and their uncertainties,* provides an estimate of the standard deviation 

### Expectation and consistency
For consistency, we should expect that as the number of measurements $N$ increases, $\lim_{N\rightarrow\infty}\bar{x}_N = \mu$ (including the subscript $N$ to make its role explicit) and $\lim_{N\rightarrow\infty}\sigma_{N-1} = \sigma$, since the sample distribution should converge to the parent distribution in this limit.

Starting with $\bar{x}_N$, we wish to evaluate

$$ \lim_{N\rightarrow\infty}\bar{x}_N = \lim_{N\rightarrow\infty}\frac{1}{N}\sum_{i=1}^{N}x_i. $$

To accomplish this, we note that in the $N\rightarrow\infty$ limit we expect $x_i$ to take on *all possible values* for $x$, with a frequency given by the probability density $P_\text{DF}(x)$. This allows us to replace the sum over random $x_i$ (in this limit) with an integral over the domain of $P_\text{DF}(x)$ that we can readily evaluate.

\begin{align}
\lim_{N\rightarrow\infty}\frac{1}{N}\sum_{i=1}^{N}x_i
&= \int_{-\infty}^{\infty}\text{d}x\,x P_\text{DF}(x)\\
&= \frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^{\infty}\text{d}x\,x\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]\\
&= \frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^{\infty}\text{d}u\,u\exp\left(-\frac{u^2}{2\sigma^2}\right) + \frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^{\infty}\text{d}x\,\mu\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]\quad(\text{with}\ u = x - \mu)\\
&= \frac{\mu}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^{\infty}\text{d}x\,\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]\\
&= \mu.
\end{align}

As expected, we find that the sample mean converges to the mean of the parent distribution as the number of samples increases.

We call the integral $\int_{-\infty}^{\infty}\text{d}x\,x P_\text{DF}(x)$ the *expectation* of $x$ and denote it by $\text{E}[X]$, where again the upper-case $X$ refers to the set of all values that $x$ can have.

Next we check that $\lim_{N\rightarrow\infty}\sigma^2_{N-1} = \sigma^2$, where we focus on the *variance* $\sigma^2$ instead of $\sigma$ to avoid the square root in the [definition](#std-pop) of $\sigma_{N-1}$. Converting the finite sum into an expectation, we have

$$\lim_{N\rightarrow\infty}\sigma^2_{N-1} = \lim_{N\rightarrow\infty}\sigma^2_{N} = \lim_{N\rightarrow\infty}\frac{1}{N}\sum_{i=1}^{N}(x_i - \bar{x}_n)^2 = \text{E}[(X - \bar{X}_N)^2] = \text{E}[(X - \mu)^2],$$

where $\bar{X}_N$ denotes the set of all averages $\bar{x}_N$ of $N$ numbers, and $\text{E}[\bar{X}] = \mu$. Evaluating the final expectation in the above expression yields the desired result,

\begin{align}
\text{E}[(X - \mu)^2] &= \int_{-\infty}^{\infty}\text{d}x\,(x-\mu)^2 P_\text{DF}(x)\\
&= \frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^{\infty}\text{d}x\,(x-\mu)^2 \exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]\\
&= \frac{\sigma^2}{\sqrt{2\pi}}\int_{-\infty}^{\infty}\text{d}u\,u^2 e^{-u^2/2}\quad[\text{with}\ u = (x - \mu)/\sigma]\\
&= \frac{\sigma^2}{\sqrt{2\pi}}\left\{\left[u(-e^{-u^2/2}\right|_{-\infty}^{\infty} + \int_{-\infty}^{\infty}\text{d}u\,e^{-u^2/2}\right\}\quad(\text{integrating by parts})\\
&= \sigma^2.
\end{align}

This equivalence is of course no accident—in general, we *define* the variance as $\text{Var}(X) = \text{E}[(X-\mu)^2]$ for an arbitrary probability distribution, so we also associate the variance with the parameter $\sigma^2$ in the normal distribution.

These expectation values justify our use of $\bar{x}_N$ as an estimate for $\mu$ and $\sigma_{N-1}$ for $\sigma$.

To determine how much variation we *expect* to see in $\bar{x}$, we can calculate the expectation $\text{Var}(\bar{X}_N) = \text{E}[(\bar{X}_N-\mu)^2]$ (now for fixed, finite $N$) just as we calculated expectations for $X$ and $(X-\mu)^2$. We'll do this in steps, demonstrating some useful properties of the variance along the way. 

First, we derive a common alternative expression for the variance,

\begin{align}
\text{Var}(X) &= \text{E}[(X-\mu)^2] \\
&= \text{E}[X^2 - 2\mu X + \mu^2] \\
&= \text{E}[X^2] - 2\mu\,\text{E}[X] + \mu^2 \\
&= \text{E}[X^2] - \mu^2 \\
&= \text{E}[X^2] - \mu^2.
\end{align}

The difference between $\text{E}[X^2]$ and $\mu^2 = (\text{E}[X])^2$ arises because the contributions to $\text{E}[X^2]$ from $x>\mu$ pull the expectation up much more than the contributions from $x<\mu$ pull it down.

Next, we derive the important result that the variance of a sum $x_1 + x_2$ is equal to the sum of their variances, $\text{Var}(X_1) + \text{Var}(X_2)$, so long as $x_1$ and $x_2$ are uncorrelated.

\begin{align}
\text{Var}(X_1 + X_2) &= \text{E}\{[(X_1 + X_2) - (\mu_1 + \mu_2)]^2\}\\
&=\text{E}[(X_1 + X_2)^2] - (\mu_1 + \mu_2)^2\\
&=\text{E}[X_1^2 + 2X_1X_2 + X_2^2] - \mu_1^2 -2\mu_1\mu_2 - \mu_2^2\\
&=\text{E}[X_1^2] - \mu_1^2 + 2\text{E}[X_1X_2] - 2\mu_1\mu_2 + \text{E}[X_2^2] - \mu_2^2\\
&=\text{E}[X_1^2] - \mu_1^2 + 2\mu_1\mu_2 - 2\mu_1\mu_2 + \text{E}[X_2^2] - \mu_2^2\\
&=\text{E}[X_1^2] - \mu_1^2 + \text{E}[X_2^2] - \mu_2^2\\
&=\text{Var}[X_1] + \text{Var}[X_2].
\end{align}

\begin{align}
\text{Var}(\bar{X}_N) &= \text{E}\left\{\left[\left(\frac{1}{N}\sum_{i=1}^NX_i\right) - \mu\right]^2\right\}\\
&= \text{E}\left\{\frac{1}{N^2}\left[\left(\sum_{i=1}^N X_i\right) - N\mu\right]^2\right\}\\
&= \frac{1}{N^2}\text{E}\left\{\left[\left(\sum_{i=1}^N X_i\right) - N\mu\right]^2\right\}\\
&= \frac{1}{N^2}\text{E}\left\{\left[\sum_{i=1}^N \left(X_i - \mu\right)\right]^2\right\}\\
\end{align}

In [1]:
import numpy as np
np.log10(2)

0.3010299956639812