<h1>Topic 4.2 Law of Large Numbers Simulation</h1>

The probability of an event occuring does not guarantee us a perfect distribution of outcomes in a random process. Instead, probability gives us an expected average proportion of specific outcomes over a large number of trials. Probability does not tell us what will occur on any one trial, but what we can expect over many trials. As such, when we repeat a random process many times, we expect the proportion of each outcome to approximately equal the probability of that outcome. 

This idea is called the Law of Large Numbers -- The more times a random process is repeated, the relative frequency of each outcome should approach the true probability of that outcome. This allows us to approximate probabilities for outcomes in a random process using simulation.

This Notebook runs a simulation to demonstrate the Law of Large Numbers.

With a fair die (six-sided number cube), the probability any number, 1 - 6, landing face up on a roll is 1/6. However, rolling the die six times does not guarantee each number will occur exactly once. But, if we roll the die six times, we get a simulated probability of each number landing face up. Roll a die six times and record the number that lands face up each time. Did each number appear exactly one time? If not, why not?

In [1]:
#import necessary libraries
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import random as rnd

In [5]:
# define die roll function
def roll():
    die = rnd.randint(1,6)
    return die

In [27]:
t = 6
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
1,1,1,0.166667
2,2,1,0.166667
3,3,1,0.166667
0,5,2,0.333333
4,6,1,0.166667


In [28]:
t = 60
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
2,1,9,0.15
5,2,7,0.116667
3,3,9,0.15
4,4,8,0.133333
1,5,12,0.2
0,6,15,0.25


In [29]:
t = 600
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
3,1,98,0.163333
1,2,106,0.176667
4,3,93,0.155
2,4,99,0.165
5,5,83,0.138333
0,6,121,0.201667


In [30]:
t = 6000
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
2,1,993,0.1655
5,2,960,0.16
4,3,988,0.164667
1,4,995,0.165833
3,5,993,0.1655
0,6,1071,0.1785


In [31]:
t = 60000
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
1,1,10038,0.1673
4,2,9996,0.1666
5,3,9849,0.16415
3,4,10001,0.166683
0,5,10098,0.1683
2,6,10018,0.166967


In [32]:
t = 600000
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
2,1,99938,0.166563
5,2,99659,0.166098
1,3,100136,0.166893
3,4,99840,0.1664
4,5,99830,0.166383
0,6,100597,0.167662


In [33]:
t = 6000000
rolls = [0]*t
for k in range(0,t):
    rolls[k] = roll()
DIE_ROLLS = pd.DataFrame(rolls)
RESULTS = DIE_ROLLS.value_counts().rename_axis('ROLL').to_frame('FREQUENCY')
RESULTS = RESULTS.reset_index()
RESULTS['PROBABILITY'] = RESULTS['FREQUENCY']/t
RESULTS = RESULTS.sort_values('ROLL')
RESULTS

Unnamed: 0,ROLL,FREQUENCY,PROBABILITY
3,1,999832,0.166639
1,2,1000881,0.166814
2,3,1000651,0.166775
5,4,998274,0.166379
4,5,999277,0.166546
0,6,1001085,0.166848


The theoretical probability of rolling a 1, 2, 3, 4, 5, or 6 on any one roll of a fair die is 1/6. In the dataframe output, 1/6 appears as 0.166667. So it is easy to see the probabilities are approaching that theoretical probability as the number of trials increases. 