# Normal Distribution

### Theory

Setting: 
- Used to model basically anything, especially if true distribution is unknown
- For instance, the outcome of exams

Properties:
- Probability density function: $\displaystyle f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac12}(x-\mu)^2/\sigma^2}$
- Mean: $\mu$
- Variance: $\sigma^2$

Notation: $X \sim \displaystyle \mathcal{N}(\mu, \sigma^2)$

The "famous" values: Probability of event within ...
- $\mu \pm \sigma$: 68.27%
- $\mu \pm 2 \sigma$: 95.45%
- $\mu \pm 3 \sigma$: 99.73%

For an illustration, see e.g. [here](https://en.wikipedia.org/wiki/Normal_distribution)

Potentially the **most important distribution** of all for several reasons:

1. **Central Limit Theorem**: Sum of a large number of independent, identically distributed random variables will be approximately normally distributed, regardless of the original distribution of the variables

2. **Ubiquity in Nature**: Natural phenomena often exhibit distributions that are approximately normal: Heights, blood pressure, measurement errors, etc.

3. **Mathematical Properties**: Convenient mathematical properties: Symmetric, mean = median = mode, sum of Gaussian again Gaussian, etc.

4. **Parameter Estimation**: Efficient estimators exist due to its properties, e.g. least squares method for regression

Often a good approximation of reality:

1. **Predictive Distribution**: Residuals often assumed to be normal $\Rightarrow$ simplifies making predictions and computing confidence intervals

2. **Control Processes**: Quality control in industrial processes often model variations as normal, allowing to efficiently establish control limits

3. **Signal Processing**: Gaussian noise is a common and good assumption

But yet not always:

1. **Finance**: Asset returns typically modeled as normal, yet real-world returns often exhibit heavier tails than the Gaussian distribution




### Central limit theorem (CLT)

CLT: The sampling distribution of the mean will always follow a normal distribution.

More accurately:
- Let ${X_{1},X_{2},\dots ,X_{n}}$ be $n$ random samples with expected value $\mu$ and variance $\sigma ^{2} < \infty$
- Let $\bar{X}_{n} = \frac 1 n \sum_i X_i$ be their sample mean
- Then for $n\to \infty:$ $\;\;$ $\displaystyle
  \frac{\bar{X}_{n}-\mu}{\sigma_{{\bar{X}}_{n}}} \sim \mathcal{N}(0, 1)$ $\;\;$ and
  $\;\;$ $\displaystyle \sigma_{{\bar{X}}_{n}}={\frac {\sigma }{\sqrt {n}}}$

Assumptions:
- Sample size is sufficiently large, in practice $n \geq 30$; and
- Samples are independent and identically distributed (i.i.d.) random variables; and
- Population distribution has finite variance, true for most distributions (counter-example: Cauchy distribution)


### Analytical approach


Settings: 
- $X \sim \text{U}(17, 27)$
- $n = 30$
- $\mu = (a + b) / 2 = 22$
- $\sigma = \sqrt{(b - a)^2 / 12} = 10 / \sqrt{12} \approx 2.887$

Hence:
- $\mu_{\bar{X}_n} = \mu = 22$
- $\sigma_{\bar{X}_n} = \sigma / \sqrt{n} = 10 / \sqrt{12} / \sqrt{30} \approx 0.527$


### Simulation

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
n = 30

# population: uniform distribution
a, b = 17, 27
mu = (a + b) / 2
std = np.sqrt((b - a)**2 / 12)

print(std)

# sample from population
def run_experiment():
    xs = np.random.uniform(a, b, n)
    mu_x = np.mean(xs)
    return mu_x
mu_xs = [run_experiment() for _ in range(1000)]

# compute claim of CLT
mu_xs_clt = mu
std_xs_clt = std / np.sqrt(n)

print(f"mu_xs (sampled): {np.mean(mu_xs):7.5f}")
print(f"mu_xs (CLT):     {mu_xs_clt:7.5f}")

print(f"std_xs:       {np.std(mu_xs):7.5f}")
print(f"std_xs (CLT): {std_xs_clt:7.5f}")


# plot
p_normal = lambda x, mu, std: 1 / (std * np.sqrt(2 * np.pi)) * np.exp(-1/2 * ((x - mu) / std)**2)
zs = np.linspace(min(mu_xs), max(mu_xs), 1000)
plt.plot(zs, p_normal(zs, mu_xs_clt, std_xs_clt), label='CLT distr', color='lightgreen', lw=6)
sns.kdeplot(mu_xs, label='Sample mean distr', color='blue')
plt.legend()
plt.plot(mu_xs, [.03]*len(mu_xs), '.', color='blue', markersize=10, alpha=0.03)
plt.grid(False)
plt.show()
