In [1]:
from datascience import *
from prob140 import *
import numpy as np
from scipy import stats

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

# Week 10 Part 9 #

This covers Sections 18.1 and 18.2. However, for this term I'm making some of the derivations in 18.1 optional. So this notebook just has the basics that everyone should know. There's no video, as the majority is a string of facts that you have already been using without derivation, and then there's a simulation that used to be run in class just as you will run it here.

Recall that the normal $(\mu, \sigma^2)$ density function is given by

$$
f(x) ~ = ~ \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{1}{2}\big{(}\frac{x-\mu}{\sigma}\big{)}^2}, ~~~ -\infty < x < \infty
$$

The most important member of this family is the standard normal, for which $\mu = 0$ and $\sigma = 1$. The standard normal density is denoted by $\phi$:

$$
\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}z^2}, ~~~ -\infty < z < \infty
$$

The standard normal cdf is denoted by $\Phi$:

$$
\Phi(x) ~ = ~ \int_{-\infty}^x \phi(z)dz, ~~~~ -\infty < x < \infty
$$

# <span style="color: darkblue">Normal Distribution Facts</span> #

## Total Integral = 1 ##
This needs a proof, but accept it. Some of you might have seen it in a calculus class by polar coordinates. A probabilistic derivation is [here](http://prob140.org/textbook/Chapter_18/01_Standard_Normal_Basics.html#The-Constant-of-Integration) but it's optional.

## Standard Units ##
Understanding the standard normal is important. All other normals are linear transformations. We showed this earlier:

- If $X$ is normal $(\mu, \sigma^2)$ then $X = \sigma Z + \mu$ where $Z$ is $X$ measured in standard units and is standard normal.

## Expectation $= \mu$ ##

This seems obvious by symmetry but the existence of the expectation needs a proof, as you saw in the Cauchy distribution exercise in homework.

In another homework exercise, you found $E(\vert Z \vert)$. Since it's finite, $E(Z)$ exists, and then by symmetry is must be 0.

It follows (see Standard Units above) that if $X$ is normal $(\mu, \sigma^2)$ then $E(X) = \mu$.

## Variance $= \sigma^2$ ##

The only thing that remains to be proved is $E(Z^2) = 1$. It will follow that $Var(Z) = 1$ and then $Var(X) = \sigma^2$ by linear transformation.

The textbook has a [probabilistic argument](http://prob140.org/textbook/Chapter_18/01_Standard_Normal_Basics.html#The-Constant-of-Integration) that is helpful for understanding squares of normals, but it's optional.

Instead, write the integral below, substitute $u = z^2$, and use gamma facts.

$$
\begin{align*}
E(Z^2) ~ &= ~ 2 \int_0^\infty z^2 \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}z^2}dz \\
&= ~ \frac{2}{\sqrt{2\pi}} \int_0^\infty z^2 e^{-\frac{1}{2}z^2}dz \\
&= ~ \frac{2}{\sqrt{2\pi}} \cdot \frac{1}{2} \int_0^\infty u e^{-\frac{1}{2}u} \frac{1}{\sqrt{u}} du \\
&= ~ \frac{2}{\sqrt{2\pi}} \cdot \frac{1}{2} \cdot \frac{\Gamma(3/2)}{(1/2)^{3/2}}\\
&= ~ 1
\end{align*}
$$

because $\Gamma(3/2) = \Gamma(1/2 + 1) = (1/2)\Gamma(1/2) = (1/2)\sqrt{\pi}$.

## Sums of Independent Normals are Normal ##

This is hugely important both in probability and data science. It needs a proof, which we will do next week by moment generating functions. For now, run the code below to notice the following:

- If $X$ and $Y$ are independent normal variables, then $X+Y$ is also normal.

You already know the mean and variance of $X+Y$. The point of this result is that it's telling you the shape of the distribution.

### Simulation Study ###
The code below generates $(X,Y)$ pairs 10,000 times and draws the empirical histogram of $X+Y$. Assumptions:

- $X$ is normal $(\mu_X, \sigma_X^2)$
- $Y$ is normal $(\mu_Y, \sigma_Y^2)$
- $X$ and $Y$ are independent.

Note that the parameters you are setting at the top are means and standard deviations, not means and variances. Standard deviations are what you can see on the graphs. As you know, variance is on the wrong scale.

In [None]:
# Change these parameters as you wish, a few times

mu_X = 10
sigma_X = 2
mu_Y = 15
sigma_Y = 3

# Don't edit beyond this point
x = stats.norm.rvs(mu_X, sigma_X, size=10000)
y = stats.norm.rvs(mu_Y, sigma_Y, size=10000)
s = x+y
Table().with_column('S = X+Y', s).hist(bins=20)

Each time you run the cell, you should check that the center and points of inflection look right. For this, use properties of mean and variance to find $E(X+Y)$ and $SD(X+Y)$. Here's a code cell for you to do this, using `mu_X`, `sigma_X`, `mu_Y`, and `sigma_Y` you defined above.

In [None]:
ev_sum = ...
sd_sum = ...

points_of_inflection = ev_sum + make_array(-sd_sum, sd_sum)

ev_sum, sd_sum, points_of_inflection

## Reading: Applications ##

**Do not** skip this.

Two examples containing crucial moves: [here](http://prob140.org/textbook/Chapter_18/02_Sums_of_Independent_Normal_Variables.html#The-Difference-of-Two-Independent-Normal-Variables) and [here](http://prob140.org/textbook/Chapter_18/02_Sums_of_Independent_Normal_Variables.html#Comparing-Two-Sample-Proportions).

## Vitamins ##

**1.** $X$ is normal $(0, 10^2)$. Let $W = 8X^2$. Find $E(W)$.


<details>
    <summary>Answer</summary>
800

</details>

**2.** $X$ is normal $(20, 5^2)$ and $Y$ is normal $(75, 10^2)$. Assume $X$ and $Y$ are independent. 

Sketch the density of $Y - 2X$. Mark the numerical values of the expectation and SD appropriately on your sketch. You can use the code cell below to calculate them.

<details>
    <summary>Answer</summary>
normal curve, centered at 35, points of inflection at 35 $\pm$ 14.14

</details>

In [None]:
#scratch work for Vitamin 2




## Break time. Take a long break. This section is really important; give it a chance to sink in. ##