# Random Number Generation

#### Table of Contents

* Summary
* Bad Generators
* Linear Congruential Generators
* Tausworthe Generator
* Combined Generators
* RNG Theory
* Statistical Tests for Randomness

## Summary

Key ideas of pseudo-random numbers:

1. Algorithms to produce PRNs are relatively deterministic
2. We can choose PRNs to mimic true random uniform which statistical tests can pass
3. PRNs are the building blocks of simulations; we can transform PRNs to any distribution

Properties of PRN algos:

* Appears to be i.i.d uniform (e.g. Unif(0,1))
* fast (may need to generate trillions)
* reproducable (e.g. via seeds)

__Goal__: Give an algorithm that produces a sequence of pseudo random numbers that appear to be i.i.d uniform.

In [5]:
import pandas as pd
import numpy as np

## Bad Methods

* Random Devices - not reproducible
* Random Number Tables - slow, time consuming, no longer random once seen
* Mid-Square Method - serial correlation
* Fibonacci & Additive congruential generators - small numbers follow small numbers

In [71]:
# BAD: Random devices, not reproducible
import psutil
psutil.cpu_percent()

6.9

## Linear Congruential Generator (LCG)

LCGs are most popular generators defined as:

$$
X_i = (a*X_{i-1}+c)\mod {m} \\
R_i = \frac{X_i}{m}
$$

Need to carefully choose a, c, m to ensure long period/cycle length before starts to repeat. Some notes:
* c=0 is a multiplicative generator
* A full period generator is one with cycle length = m

Some good, but not so good examples of LCGs:

* Desert Island Generator - issue is serial correlation
* RANDU - issue is most points land on same 15 hyperplanes

## Tausworthe Generator  (LFSR)

Use binary numbers to generator PRNs. Define a sequency of binary numbers: $B_1, B_2 ..$ using $B_i = (B_{i-r} + B_{i-q}) \mod 2$. If $B_{i-r} = B_{i-q}$ then $B_i=0$ else $B_i=1$)

Notes:

* Period of bits is always $2^q-1$
* Use (L-bits in base 2)/2^L and convert to base 10 to get PRN.

## Combined generators:

Really good and modern combined generators:

* L'Ecuyer (1999), cycle length of $2^{191}$
* Mersenne Twister with period length of $2^{19937}-1$

## RNG Theory

For a RNG to be full cycle, all the following should be true:
* c & m should be relatively prime
* (a-1) is a multiple of every prime which divides m
* (a-1) is a multiple of 4 if 4 divides m

#### Geometric Considerations

* Min number of hyperplanes in all directions, find multiplier that maximizes this
* Max distance between hyperplanes. Find multiplier that minimizes this
* Min euclidean distance b/w n-adjacent tuples. Find multiplier that maximizes this

Randu algo messes this up by laying in 15 hyperplanes

#### Serial Correlation

$$
Corr(R_1, R_2) = 1/a((1-(6c/m)+(6(c/m)^2)+((a+6)/m))
$$

Desert island messes up here because the upper bound is small due to a=16807 and m 2B.

## Statistical Tests

* Goodness-of-fit tests - Are PRNs approximately Unif(0,1) 
* Independence tests - Are the PRNs approximately independent?

### Revisiting hypothesis testing

* Innocent until proven guilty. Unless we have ample evidence that H_a is the case, we keep H_0 (e.g. saying good fit or independent)
* Type 1 error: Reject H_0 given that H_0 is true
* Type 2 error: ACcept H_0 given that H_0 is false

### Chi-Squared Goodness of Fit

* Test H_0 : R0, R1, .. ~ Unif(0,1)
* Divide R into k equiprobable cells
* Tally how much actually falls into each cell
* Large $X_0^2$ value indicates bad fit
* If $X_0^2 \leq X^2_{a,k-1}$, we fail to reject H0

The goodness of fit statistic is as follows:

$$
X_0^2 = \frac{(O_i - E_i)^2}{E^i}
$$

### Run Tests for Independence

* Runs Tests "Up and Down"
* Runs Tests "Above and Below the Mean"