<a href="https://colab.research.google.com/github/erinetaylor/AChemLecture/blob/main/CHEM211_RollingDice_NormalDistribution_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎲 Rolling Dice Normal Distribution

In this Colab Notebook, we will explore the relationship between sample and population normal distributions.


## Let's get started! 😀

In [None]:
#Import all necessary packages

import pandas as pd
import io
import scipy as sc
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

In the following cell, all of the sum of the dice rolls has been entered as "data".
From there, the average and standard deviation are calculated and will print after running the code.

In [None]:
# Initialize an empty list for the datasets
data = []
rolls=0

# Ask the user what the smallest number is that they want to enter
min_value = int(input("What is the smallest number you want to input?"))

# Ask the user how many different values they want to enter
num_values = int(input("How many different values do you want to input? "))

# Calculating the max value
max_value = min_value+num_values-1
# Iterate to get each value and its count

for i in range(num_values):
    value = int(min_value+i)
    count = int(input(f"Enter the count for {value}: "))
    data.extend([value] * count)  # Add the value 'count' times to the dataset
    rolls=rolls+count

# Print the resulting dataset
print("The resulting dataset is:", data)

In [None]:
#This will convert our data into a more calculatable form
data = pd.DataFrame(data)
sum_of_rolls=data[0]

#Here we are going to calculate the average sum of rolls
average=sum_of_rolls.mean()

#Here we are going to calculate the standard deviation for the sum of rolls
standard_deviation=sum_of_rolls.std()

# This is going to print out the average and standard deviation in a nice, readable format.
print('After{:3.0f} total rolls, the sample average is {:3.4f} with a standard deviation of {:3.4f}.'.format(rolls, average, standard_deviation))

In the next cell, a histogram will be generated which shows the number of observations of each sum of two dice.

`n_bins` has been set to be the number of different sums we expect. For adding together two six-sided dice, we expect 11 different possibilities (2,3,4,5,6,7,8,9,10,11, and 12).

In [None]:
# Now we are going to plot a histogram.
plt.hist(sum_of_rolls, bins=range(min_value-1,max_value+2), color = "teal", edgecolor='black')

# This sets the tick marks range and interval
plt.xticks(range(0, 13, 1))

# This sets the figure title
plt.title("Sum of Rolling Two Six-Sided Dice")

# This sets the x-axis label
plt.xlabel("Sum of Dice")

# This sets the y-axis label
plt.ylabel("Number of Observations");

Next, we can plot the normal distribution on top of the histogram (with a red line), by assuming the sample average and standard deviation are equal to the population average and standard deviation.

In [None]:
counts, bins, _ = plt.hist(sum_of_rolls, bins=range(0,14), color = "teal", edgecolor='black')

# This creates a list of regularly spaced x-values so that we can generate a normal distribution based on our roll data
x=np.linspace(min_value-1,min_value+num_values+1,100)

# This is the function for a normal distribution that includes our roll data average and standard deviation
normal_distribution= 1.0/(standard_deviation*(np.sqrt(2*np.pi)))*np.exp((-0.5*((x-average)/(standard_deviation))**2))

#This scales our roll data normal distribution function so that we can overlay the histogram and normal distribution
normal_distribution_scaled=normal_distribution * max(counts) / max(normal_distribution)

plt.xticks(range(0, 13, 1))
plt.title("Sum of Rolling Two Six-Sided Dice")
plt.xlabel("Sum of Dice")
plt.ylabel("Number of Observations")

# This plots the normal distribution based on our roll data
plt.plot(x,normal_distribution_scaled,'r')

For taking the sum of rolling dice, we actually know what the average and standard deviation of the population should be, using the following equations.

<font color='green'>$$\text{Population Average}=\mu=n\times\left(\frac{y+1}{2}\right)$$

<font color='green'>$$\text{Population Standard Deviation}=\sigma=\sqrt{n\times\left(\frac{(y+1)(2y+1)}{6}-\left(\frac{y+1}{2}\right)^2\right)}$$

<font color='green'> Where $y$ is the number of sides on the dice and $n$ is the number of dice that are being summed together. For example, in this example we summed together 2 six-sided dice, so, `y=6` and `n=2`.

The following cell, will calculate the theoretical average and standard devation of the population.

In [None]:
# This is a calculator that allows us to calculate the theoretical average and standard deviation of the sum from rolling certain die
y=6 #type of dice (6=six-sided dice)
n=2 #number of dice being rolled

theoretical_average=((y+1)/2)*n
theoretical_standard_deviation=np.sqrt(n*(((y+1)*(2*y+1)/6)-((y+1)/2)**2))
print('The theoretical population average is {:3.4f} with a standard deviation of {:3.4f}.'.format(theoretical_average, theoretical_standard_deviation))

This next cell will plot the theoretical population normal distribution (in blue) over our histogram and sample normal distribution.

In [None]:
counts, bins, _ = plt.hist(sum_of_rolls, bins=range(0,13), color = "teal", edgecolor='black')

# This is the function for the theoretical normal distribution that uses the theoretical average and standard deviation from the calculator above
theoretical_normal_distribution= 1.0/(theoretical_standard_deviation*(np.sqrt(2*np.pi)))*np.exp((-0.5*((x-theoretical_average)/(theoretical_standard_deviation))**2))

# This scales the theoretical normal distribution function so that we can overlay the histogram and normal distribution
theoretical_normal_distribution_scaled=theoretical_normal_distribution * max(counts) / max(theoretical_normal_distribution)

plt.xticks(range(0, 13, 1))
plt.title("Sum of Rolling Two Six-Sided Dice")
plt.xlabel("Sum of Dice")
plt.ylabel("Number of Observations")
plt.plot(x,normal_distribution_scaled,'r',label='Sample Normal Distribution')
plt.plot(x,theoretical_normal_distribution_scaled,'b', label='Theoretical Normal Distribution')

# This adds a legend to our graph. Change loc to 0-best, 1-upper right, 2-upper left, 3-lower left, and 4-lower right.
plt.legend(loc=0);