# Central Limit Theorem

The Central Limit Theorem (CLT) allows us to specifically describe the sampling distribution of the mean.

The CLT states that the sampling distribution of the mean is normally distributed as long as the population is not too skewed or the sample size is large enough. Using a sample size of n > 30 is usually a good rule of thumb, regardless of what the distribution of the population is like. If the distribution of the population is normal, the sample size can be smaller than that.

The CLT not only establishes that the sampling distribution will be normally distributed, but it also allows us to describe that normal distribution quantitatively. Normal distributions are described by their mean μ (mu) and standard deviation σ (sigma).

Let’s break this up:

- We take samples of size n from a population (that has a true population mean μ and standard deviation of σ) and calculate the sample mean x.

- Given that n is sufficiently large (n > 30), the sampling distribution of the means will be normally distributed with:
  - mean x approximately equal to the population mean μ
  - standard deviation equal to the population standard deviation divided by the square root of the sample size. We can write this out as:
  `Sampling Distribution St.Dev= σ / ∫n `


We’ll focus on the first point in this exercise and the second point in the next exercise.

#### In the workspace, we’ve set up a simulation of a population that has a mean of 10 and a standard deviation of 10. We’ve set a sample size of 50. According to the CLT, we should have a sampling distribution of the mean that is normally distributed and has a mean that is close to the population mean.

Run the code once. Does what you see align with the CLT?

In [2]:
! pip install seaborn



In [1]:
import seaborn as sns

ModuleNotFoundError: No module named 'seaborn'

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns

# Set the population mean & standard deviation:
population_mean = 10
population_std_dev = 10
# Set the sample size:
samp_size = 50

# Create the population
population = np.random.normal(population_mean, population_std_dev, size = 100000)

# Simulate the samples and calculate the sampling distribution
sample_means = []
for i in range(500):
    samp = np.random.choice(population, samp_size, replace = False)
    sample_means.append(np.mean(samp))

mean_sampling_distribution = round(np.mean(sample_means),3)

# Plot the original population
sns.histplot(population, stat = 'density')
plt.title(f"Population Mean: {population_mean} ")
plt.xlabel("")
plt.show()
plt.clf()

## Plot the sampling distribution
sns.histplot(sample_means, stat='density')
# calculate the mean and SE for the probability distribution
mu = np.mean(population)
sigma = np.std(population)/(samp_size**.5)
# plot the normal distribution with mu=popmean, sd=sd(pop)/sqrt(samp_size) on top
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)

plt.plot(x, stats.norm.pdf(x, mu, sigma), color='k', label = 'normal PDF')
plt.title(f"Sampling Dist Mean: {mean_sampling_distribution}")
plt.xlabel("")
plt.show()

ModuleNotFoundError: No module named 'seaborn'