# Random number generation and statistics

## Random number generators

1.  The **new** programming interface implemented in NumPy
2.  The **legacy** programming interface implemented in NumPy
3.  The Python standard library (don't use this)

***
### Simple random data generation with NumPy

- Get PRNG instance with `default_rng()`
- Set `seed` for reproducibility
- [`random()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.random.html): random numbers on interval `[0,1)` 
- [`integers()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html): random integers
- Use `size` argument to control shape of output

***
### Legacy interface in NumPy

- `seed()`
- `random_sample()`: random numbers on interval  `[0,1)`
- `randint()`: random integers

***
### Drawing random numbers from distributions

[`numpy.random`](https://numpy.org/doc/stable/reference/random/generator.html#distributions)
supports numerous distributions, e.g.:

-   [`binomial()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.binomial.html)
-   [`exponential()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.exponential.html)
-   [`normal()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.normal.html)
-   [`lognormal()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.lognormal.html)
-   [`multivariate_normal()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.multivariate_normal.html)
-   [`uniform()`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.uniform.html)
-   Many others, see [official documentation](https://numpy.org/doc/stable/reference/random/generator.html#distributions)

#### Example: Drawing from a normal distribution

- Use `loc` and `scale` arguments to set mean and standard deviation

#### Example: Drawing from a bivariate normal distribution

$$
\mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \qquad 
\Sigma=\begin{bmatrix} \sigma_1^2 & \rho \sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{bmatrix}
$$

***
## SciPy functions for probability distributions

[`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) 
contains numerous distributions. Useful functions include:

-   `pdf()`: probability density function
-   `cdf()`: cumulative distribution function
-   `ppf()`: percent point function (inverse of `cdf`)
-   `moment()`: non-central moment of some order $n$
-   `expect()`: expected value of a function (of one argument) with
    respect to the distribution

#### Example: Plotting a normal probability density function (PDF)

1. Generate draws from normal distribution
2. Create histogram using [`plt.hist()`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) 
3. Add PDF of normal distribution

***
## Statistics functions in NumPy and SciPy

In NumPy, the most useful routines include:

-   [`np.mean()`](https://numpy.org/doc/stable/reference/generated/numpy.mean.html),
    [`np.average()`](https://numpy.org/doc/stable/reference/generated/numpy.average.html): sample mean;
    the latter variant can also compute weighted means.
-   [`np.std()`](https://numpy.org/doc/stable/reference/generated/numpy.std), 
    [`np.var()`](https://numpy.org/doc/stable/reference/generated/numpy.var.html): 
    sample standard deviation and variance
-   [`np.percentile()`](https://numpy.org/doc/stable/reference/generated/numpy.percentile.html), 
    [`np.quantile()`](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html): 
    percentiles or quantiles of a given array
-   [`np.corrcoef()`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html): 
    Pearson correlation coefficient
-   [`np.cov()`](https://numpy.org/doc/stable/reference/generated/numpy.cov.html): 
    sample variance-covariance matrix
-   [`np.histogram()`](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html): 
    histogram of data. This only bins the data,
    as opposed to Matplotlib's `hist()` which plots it.

In addition, there are the variants 
[`np.nanmean()`](https://numpy.org/doc/stable/reference/generated/numpy.nanmean.html), 
[`np.nanstd()`](https://numpy.org/doc/stable/reference/generated/numpy.nanstd.html), 
[`np.nanvar()`](https://numpy.org/doc/stable/reference/generated/numpy.nanvar.html) 
[`np.nanpercentile()`](https://numpy.org/doc/stable/reference/generated/numpy.nanpercentile.html) and 
[`np.nanquantile()`](https://numpy.org/doc/stable/reference/generated/numpy.nanquantile.html)
which ignore `NaN` values.

#### Example: Pairwise correlations

#### Example: Descriptive statistics

- Use [`scipy.stats.describe()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.describe.html) 