<a href="https://colab.research.google.com/github/qazalkz/NHANES/blob/main/random_sample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Randomness and Reproducibility¶
As we learned in the beginning of this week, the concept of randomness is a cornerstone for statistical inference when drawing samples from larger populations.

In this tutorial, we are going to cover the following:

Randomness and its uses in python.

Utilizing python seeds to reproduce analysis.

Generating random variables from a probability distribution.

Random sampling from a population.

What is Randomness?
In the beginning of this week's lectures, we touched on the significance of randomness when it comes to performing statistical inference on population samples. If we have complete randomness, our estimates of means, proportions, and totals are ubiased. This means our estimates are equal to the population values on average.

In Python, we refer to randomness as the ability to generate data, strings, or, more generally, numbers at random.

However, when conducting analysis it is important to consider reproducibility. If we are creating random data, how can we enable reproducible analysis?

We do this by utilizing pseudo-random number generators (PRNGs). PRNGs start with a random number, known as the seed, and then use an algorithm to generate a psuedo-random sequence based on it.

This means that we can replicate the output of a random number generator in python simply by knowing which seed was used.

We can showcase this by using the functions in the python library random.

Setting a Seed and Generating Random Numbers

In [1]:
import random
random.seed(1234)

random.random()


0.9664535356921388

In [2]:
random.seed(1234)

random.random()

0.9664535356921388

Random Numbers from Real-Valued Distributions¶
Uniform

In [3]:
random.uniform(25,50)

36.01831497938382

In [5]:
unifNumbers = [random.uniform(0,1) for _ in range(1000)]

In [6]:
unifNumbers

[0.29568688861378534,
 0.3154662976424718,
 0.4495636346286318,
 0.23788198954553252,
 0.760782064096187,
 0.5451979628358582,
 0.43542427937193884,
 0.13391215823727187,
 0.31798334545694995,
 0.1384662575556831,
 0.8052689019764517,
 0.33091378468188515,
 0.17408067158027618,
 0.20924753778922134,
 0.2597532013462456,
 0.3608174225378401,
 0.8055003693807395,
 0.6374736865961922,
 0.2844211325609758,
 0.9793307379875307,
 0.5338735535084044,
 0.0806373235735206,
 0.39550468816698325,
 0.3975027456035918,
 0.2252087011381787,
 0.8396728631824533,
 0.35872302835435266,
 0.25564989570132823,
 0.1361596776723818,
 0.13677599605328916,
 0.965484270832286,
 0.8594233061813019,
 0.45697878504608436,
 0.43557667171906733,
 0.39343253492529473,
 0.39185351936711377,
 0.06978900219801909,
 0.30761283428998143,
 0.6030641842819198,
 0.05462321609862608,
 0.04604763607678508,
 0.6983096303231778,
 0.06800493213575587,
 0.7178099174727228,
 0.23023028117334798,
 0.18628794772279178,
 0.6964716063

#### Normal

In [7]:
mu = 0

sigma = 1

random.normalvariate(mu, sigma)

0.46578087175894056

In [8]:
mu = 5

sigma = 2

random.normalvariate(mu, sigma)

6.428652097904887

In [9]:
mu = 0

sigma = 1

[random.normalvariate(mu, sigma) for _ in range(10000)]

[-0.0075793554743260145,
 -1.7807179609521795,
 0.16515502868251528,
 0.6667542748123468,
 -0.6871529346099333,
 -0.03458684910565041,
 0.39075815923976365,
 -1.1609001886275179,
 1.0025600145002982,
 0.10050874426390079,
 -0.49531795660270417,
 0.7357329904214688,
 0.06683848685074721,
 -0.0978032613327617,
 -1.2400790705425764,
 0.32255405829301553,
 0.008258283674373043,
 -0.33021128919414117,
 -0.09174351087706682,
 -2.655598644947513,
 1.4761373139803071,
 -0.8434756141716114,
 0.03773941334934189,
 -0.880569277035666,
 1.193147230777026,
 0.13065182219648597,
 0.40582650067317944,
 0.9389429616646158,
 -1.392332141877377,
 0.5902413266000184,
 -0.4426132614988029,
 -0.9509477439944335,
 0.1387268040367014,
 -1.2004604844365834,
 -0.7595003090582371,
 -0.17136436586455298,
 -1.6855929221166237,
 0.3197279133914426,
 1.2734800317815236,
 0.03219808693196488,
 -0.6561705969921644,
 -0.7248424488022353,
 1.0269063982140023,
 0.5274696245724194,
 1.3480796308385818,
 -0.01635060864147

Random Sampling from a Population¶
From lecture, we know that Simple Random Sampling (SRS) has the following properties:

Start with known list of N* population units, and randomly select *n units from the list
Every unit has equal probability of selection = n/N
All possible samples of size n are equaly likely
Estimates of means, proportions, and totals based on SRS are UNBIASED (meaning they are equal to the population values on average)

In [10]:
import random
import numpy as np

In [11]:
mu = 0
    
sigma = 1

Population = [random.normalvariate(mu, sigma) for _ in range(10000)]

In [12]:
SampleA = random.sample(Population, 500)
SampleB = random.sample(Population, 500)

In [13]:
np.mean(SampleA)

0.00021485623845816893

In [14]:
np.std(SampleA)

1.0556311850708089

In [15]:
np.mean(SampleB)

-0.03747882824157548

In [16]:
np.std(SampleB)

1.0645385147696727

In [17]:
means = [np.mean(random.sample(Population, 1000)) for _ in range(100)]

np.mean(means)

0.005790170822912183

In [21]:
standarddevs = [np.std(random.sample(Population, 1000)) for _ in range(100)]

np.mean(standarddevs)

1.0017095132263747