# Randomness and Reproducibility
---

This notebook covers the following:

- Randomness and its uses in Python
- Utilizing Python seeds to reproduce analysis
- Generating random variables from a probability distribution
- Random sampling from a population

### What is Randomness?

In Python, we refer to Randomness as the ability to generate data, strings, or more generally, numbers at random.

This can be accomplished by using pseudo-random number generators (PRNGs). PRNGs start with a random number, known as the seed, and then uses an algorithm to generate a psuedo-random sequence based on it.

This means that we can replicate the output of a random number generator in python simply by knowing which seed was used.



In [2]:
import random

In [5]:
random.seed(1234)

random.random()

0.9664535356921388

In [6]:
random.random()

0.4407325991753527

In [7]:
random.seed(1234)

random.random()

0.9664535356921388

Entering the .seed() method later in the code with a number used previously resets the random number generator

### Random Numbers from Real-Valued Distributions

Uniform Distribution

In [11]:
random.uniform(10,20)

19.39268997363764

In [17]:
uniform_list = [random.uniform(25,50) for x in range(10)]           # Using List Comprehension to generate a list of uniformly distributed numbers

uniform_list

[28.71386596770719,
 29.57726618524829,
 27.860324242217192,
 25.365469512172726,
 37.16878851511896,
 49.12253902290539,
 26.614057024429652,
 38.52720463877826,
 36.64746397520774,
 40.036586239026285]

Normal Distribution

In [18]:
mu = 5
sigma = 2

random.normalvariate(mu,sigma)

1.6498408211746494

In [20]:
normal_list = [random.normalvariate(mu,sigma) for x in range(25)]

normal_list

[4.063554351507818,
 9.174061148656357,
 6.82030631532424,
 2.564809069928037,
 2.826280783952868,
 1.786511456266934,
 6.001603028867782,
 4.102184864847474,
 4.683181103583636,
 3.798498920495022,
 7.66081717238724,
 6.711009247790164,
 7.537647229940512,
 5.841500375245619,
 5.4152895848833955,
 7.305141591810461,
 4.993753243580292,
 5.252501289083341,
 1.6276696253817264,
 2.812094245437756,
 6.959785035905158,
 3.4586927036689508,
 4.12387588357314,
 1.757680532025887,
 8.412601658731251]

### Random Sampling from a Population

Simple Random Sampling (SRS) has the following properties:

* Start with a known list of N population units, and randomly select n units from the list
* Every unit has equal probability of selection = (n/N)
* All possible samples of size n are equally likely
* Estimates of means, proportions, and totals based on SRS are unbiased

In [21]:
import numpy as np

In [22]:
mu = 0
sigma = 1

population = [random.normalvariate(mu,sigma) for j in range(10000)]

In [23]:
sampleA = random.sample(population,500)
sampleB = random.sample(population,500)

In [25]:
meanA = np.mean(sampleA)

print(meanA)
mu - meanA

-0.024612741061785848


0.024612741061785848

In [26]:
sdA = np.std(sampleA)

sdA

1.0109562852834748