In [1]:
import numpy as np
import pandas as pd
import scipy.stats

import bokeh_catplot

import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()

# Random number generation

You can't strictly draw random numbers from the infinite number line.

The most common constraint: the uniform distribution between 0 and 1.

Random number generation: make pseudo-random numbers--given a seed, compute deterministically the series of numbers, which has the properties of a group of random numbers.

In [2]:
np.random.uniform(low=0, high=1, size=10)

array([0.57072397, 0.75047958, 0.14639906, 0.57860186, 0.51337113,
       0.81466708, 0.42449937, 0.64970819, 0.41995732, 0.43420534])

In [3]:
np.random.seed(3252)
np.random.uniform(low=0, high=1, size=10)

array([0.72363619, 0.80625687, 0.75507222, 0.47529264, 0.21808614,
       0.10797734, 0.8419304 , 0.26319505, 0.56976174, 0.51605265])

Seed your RNG if you need reproducibility, e.g. for testing purposes.

NumPy has several RNGs built in. What you would normally do:

In [11]:
rg = np.random.default_rng()

rg.uniform(low=0, high=1, size=10)

array([0.34675445, 0.79809972, 0.06253396, 0.19696656, 0.643456  ,
       0.33601516, 0.4674799 , 0.59888747, 0.82852352, 0.17377475])

In [5]:
# If you want to seed it

rg = np.random.default_rng(3252)

rg.uniform(low=0, high=1, size=10)

array([0.18866535, 0.04418857, 0.02961285, 0.22083971, 0.43341773,
       0.13166813, 0.42112164, 0.43507845, 0.61380912, 0.30627603])

Let's draw some numbers out of a normal distribution.

In [14]:
mu = 10
sigma = 1

x = rg.normal(mu, sigma, size=100000)

p = bokeh_catplot.ecdf(
    x,
    y_axis_label='approximate CDF'
)

x_theor = np.linspace(6, 14, 400)
y_theor = scipy.stats.norm.cdf(x_theor, mu, sigma)

p.line(
    x=x_theor,
    y=y_theor,
    line_width=2,
    line_color='orange',
)

bokeh.io.show(p)

The uniform and normal distributions are continuous.

What about discrete distributions? e.g. Poisson, binomial

How many coin flips will land heads in 10 flips?

In [18]:
rg.binomial(10, 0.5)

4

Drawing cards:

In [19]:
rg = np.random.default_rng(3252)
rg.integers(0, 51, size=20)

array([38,  9,  2,  2, 37,  1, 33, 11,  7, 22, 23,  6, 13, 21, 10, 22, 11,
       31, 17, 15])

We get repeats.

In [25]:
np.sort(rg.choice(np.arange(52), size=20, replace=False))

array([ 2,  3,  4,  5,  7,  8,  9, 13, 14, 22, 23, 26, 27, 28, 30, 37, 40,
       43, 46, 47])

`rg.choice` with `replace=False` kwarg lets us draw cards without replacement.

In [26]:
''.join(rg.choice(list('ATGC'), replace=True, size=70))

'TGCGCTTATGGATCGCGTCCGCGGACAAAGTACGGGGAACGAAGTCTCGGATAATGACGAAGATGGATCC'

In [28]:
# Shuffle a deck of cards.

rg.permutation(np.arange(52))

array([14, 39, 29, 10, 18, 33, 50, 28, 45,  3, 51, 47,  5, 48, 27,  2, 11,
       30, 36, 20, 19, 32, 21, 22, 43, 42,  4, 26,  9, 12, 23, 25, 16, 24,
       46, 31, 49,  6, 38,  0, 44, 35, 13, 34, 17, 40, 41,  1, 15,  7, 37,
        8])

In [29]:
# Draw from gamma distribution

rg.gamma(2, 1.1, size=10)

array([6.78146141, 4.68526369, 1.58538564, 1.31926412, 2.959668  ,
       2.50788533, 1.41484837, 2.3016223 , 4.38140308, 1.17119687])

In [30]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,bokeh_catplot,jupyterlab

CPython 3.7.7
IPython 7.13.0

numpy 1.18.1
scipy 1.4.1
pandas 0.24.2
bokeh 2.0.2
bokeh_catplot 0.1.7
jupyterlab 1.2.6
