### Statistics: CDFs & Percentiles

In [22]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
np.random.seed(0)

In [23]:
mu = 170
sd = 7

#### We'll generate a dataset using the **Box-Muller** transform, a standard and highly efficient **sampling method** used to **create samples** from a **Standard Normal Distribution $N(0, 1)$**, which is crucial for statistical modeling.
- ###### e.g. Our Random Variable $X$ (100 samples)
- ###### $X = \{X_1=169, X_2=172, \dots, X_{100}=168\}$

In [24]:
x=norm.rvs(loc=mu, scale=sd, size=100)

#### Sample Mean $\bar{X}$
###### $$\bar{X} = \frac{1}{n}\sum_{i=1}^{n} X_i$$

In [25]:
x.mean()

np.float64(170.41865610874137)

#### Sample Variance $\hat{\sigma}^2$
###### $$\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n} (X_i - \bar{X})^2$$

In [26]:
x.var()

np.float64(49.77550434153163)

In [27]:
((x - x.mean())**2).mean()

np.float64(49.77550434153163)

In [28]:
((x - x.mean())**2).sum() / len(x)

np.float64(49.77550434153163)

#### Sample Standard Deviation $\hat{\sigma}$

In [29]:
x.std()

np.float64(7.055175713016057)

#### Unbiased Sample Variance $\hat{\sigma}^2$
###### $$\hat{\sigma}^{2} = \frac{1}{n-1}\sum_{i=1}^{n}(X_{i} - \bar{X})^{2}$$

In [30]:
x.var(ddof=1)

np.float64(50.278287213668314)

In [31]:
((x-x.mean())**2).sum() / (len(x)-1)

np.float64(50.278287213668314)

#### Sample Unbiased Standard Deviation $\hat{\sigma}$

In [32]:
x.std(ddof=1)

np.float64(7.090718384879512)

#### CDFs & Percentiles
###### 1. Normal Distribution (Probability Density Function, PDF):
###### $$f(x | \mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}$$
###### 2. Cumulative Distribution Function (CDF):
###### $$F(x) = P(X \le x) = \int_{-\infty}^{x} f(t | \mu, \sigma) dt$$





###### Examples:

- ###### At what height are you in the 95th percentile?

In [33]:
norm.ppf(0.95, loc=mu, scale=sd)

np.float64(181.5139753886603)

- ###### You are 165 cm tall, what percentile are you in?
###### $$F(165) = P(X \le 165)$$

In [34]:
norm.cdf(165, loc=mu, scale=sd)

np.float64(0.2375252620269765)

- ###### You are 175 cm tall, what is the probability that someone is taller than you?
###### $$P(X > 175) = 1 - P(X \le 175)$$
###### $${P(X > 175) = 1 - F(175)}$$

In [35]:
1 - norm.cdf(175, loc=mu, scale=sd)

np.float64(0.23752526202697655)

In [36]:
norm.sf(175, loc=mu, scale=sd)

np.float64(0.2375252620269765)