# Central Limit Theorem (CLT)

This notebook displays the effect of CLT as more data points are sampled. The Central Limit Theorem states that when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed. (https://en.wikipedia.org/wiki/Central_limit_theorem)
<br> Run the code chunk below for setup. In this applet, we will sample data points from an exponential distribution.

In [1]:
import numpy as np
import math 
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

def CLT(mean, N):
    
    ld = 1.0/mean
    
    print ("First, the data would be sampled from the following exponential distribution.")
    print ("The mean of your choice is shown with the vertical green line.")
    x = np.linspace(0, 7, 100)
    y = ld * np.exp(-ld * x)
    plt.plot(x, y)
    plt.xlabel("x")
    plt.ylabel("P(x)")
    plt.title("exponential distribution with lambda: " + str(round(ld, 2)))
    plt.axvline(x=1 / float(ld), color='g', label='mean(1/lambda) = ' + str(round(1/float(ld), 2)))
    plt.legend()
    plt.show()    
    
    rds = np.random.exponential(mean, size=400)
    avg5s = []
    for i in range(400/5):
        avg5s.append(np.mean(rds[5*i:5*(i+1)+1]))

    print ("The histogram of individual samples and the histogram of their averages (by groups of 5) will be compared.")
    print ("400 data points were sampled in total, and each group of 5 produced an average, for example:")
    for i in range(5):
        print ("samples", str(5*i+1), "~", str(5*(i+1)), ": ", rds[5*i:5*(i+1)], "-> average:", avg5s[i])
    print ("...")
    print ("samples 396 ~ 400 :", rds[395:], "-> average:", avg5s[-1])
    print ("\nThe histograms of individual samples and the 5-averages are shown below.")    
    print ("Observe that while the individual samples correctly resemble the true exponential distribution,")
    print ("and that the distribution of averages look closer to a 'bell curve'.")
    print ("You can adjust N with the slidebar to see that the 'bell curve' becomes more apparent as N gets larger.")
    
    rds = np.random.exponential(mean, size=400*N)
    avgNs = []
    for i in range(400):
        avgNs.append(np.mean(rds[N*i:N*(i+1)])) 
    
    plt.figure(figsize=(12, 5))
    plt.hist([rds, avg5s, avgNs], 
             label=["individual samples", "averages of 5 samples", "averages of N=" + str(N) + " samples"], 
             bins=60, normed=True)
    plt.legend()
    plt.xlim(0, 7)
    plt.xlabel("values of samples/averages")
    plt.ylabel("relative frequency")
    plt.show()

Running the code chunk below, setting the parameters with slidebars and pressing "Run Interact" will plot graphs to display the effect of CLT.
<br>
mean: true mean of the exponential distribution
<br>
N: the number of points to get each average value with. (e.g. N=5 averages over 5 points each)

In [2]:
interact_manual(CLT, mean=(0.5, 3), N=(1, 50))

<function __main__.CLT>