## Codio Activity 2.3: The Law of Large Numbers

**Expected Time: 45 Minutes**

**Total Points: 10**

In this activity, you will draw increasingly sized samples from a given distribution.  These samples means will be tracked, and compared to that which is known from the larger distribution.  You are to use the `scipy.stats` module and its distribution objects to produce these distributions and their samples.

## Index:

- [Problem 1](#Problem-1:-A-Uniform-Distribution)
- [Problem 2](#Problem-2:-Loop-of-Samples)
- [Problem 3](#Problem-3:-Comparing-the-sample-means-to-actual)
- [Problem 4](#Problem-4:-Distribution-of-Sample-Means)
- [Problem 5](#Problem-5:-Repeat-with-Gaussian-Distribution)


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import uniform, norm

[Back to top](#Index:) 

### Problem 1: A Uniform Distribution

**2 Points**


Create a uniform distribution with `loc = 5` and `scale = 10`.  Assign your results as a distribution object to `dist1` below. Done correctly, this code will produce uniformly distributed points between 5 and 15 with mean at 10.

In [2]:
### GRADED

dist1 = ''

### BEGIN SOLUTION
dist1 = uniform(loc = 5, scale = 10)
### END SOLUTION

# Answer check
print(type(dist1))

<class 'scipy.stats._distn_infrastructure.rv_frozen'>


In [3]:
### BEGIN HIDDEN TESTS
dist1_ = uniform(loc = 5, scale = 10)
#
#
#
assert type(dist1_) == type(dist1)
assert dist1.mean() == dist1_.mean(), 'Make sure the mean is 10.'
### END HIDDEN TESTS

###  Problem 2: Loop of Samples

**2 Points**

Now, you are to generate samples of size 1 - 500 using `dist1`'s `.rvs` method including the `random_state = 22`.  For each sample, append the sample mean to the list `sample_means`.  

In [4]:
### GRADED

sample_means = []

#loop over values 1 - 500

    #generate samples 
    #remember the random_state
    
    #find sample mean
    
    #append mean to sample_means

### BEGIN SOLUTION
dist1 = uniform(loc = 5, scale = 10)
sample_means = []
for i in range(1, 501):
    sample = dist1.rvs(i, random_state = 22)
    sample_mean = np.mean(sample)
    sample_means.append(sample_mean)
### END SOLUTION

# Answer check
print(type(sample_means))
print(sample_means[:5], '\n', sample_means[-5:])

<class 'list'>
[7.084605373588426, 8.450707995611044, 8.702265448121944, 9.924654082394818, 9.2820463731392] 
 [9.917152492762177, 9.92702320427582, 9.923037839990696, 9.921343577252099, 9.926403777871194]


In [5]:
### BEGIN HIDDEN TESTS
dist1_ = uniform(loc = 5, scale = 10)
sample_means_ = []
for i in range(1, 501):
    sample_ = dist1_.rvs(i, random_state = 22)
    sample_mean_ = np.mean(sample_)
    sample_means_.append(sample_mean_)
#
#
#
assert type(dist1_) == type(dist1)
assert len(sample_means) == len(sample_means_), 'Make sure you have 500 sample means'
assert sample_means == sample_means_, 'Make sure you have determined the mean of the samples.'
### END HIDDEN TESTS

### Problem 3: Comparing the sample means to actual

**2 Points**

Note that the actual mean of our data is 10.  The code below will generate a plot that compares the actual mean to the sampled means.  Do the sample means approximate the true mean with error less than .1 if the sample size is 400?  Assign your answer as a boolean value to `ans3` below -- True for yes, False for no.  Plot the results using the code below.

```python
plt.plot(range(1, 501), sample_means, label = 'sample mean', color = 'purple')
plt.axhline(10, label = 'true mean', color = 'green')
plt.xlabel('Sample Size')
plt.legend();
```

In [6]:
### GRADED

ans3 = ''

### BEGIN SOLUTION
ans3 = abs(sample_means[399] - 10) < .1
### END SOLUTION

# Answer check
print(type(ans3))

<class 'numpy.bool_'>


In [7]:
### BEGIN HIDDEN TESTS
ans3_ = abs(sample_means[399] - 10) < .1
#
#
#
assert ans3_ == ans3, 'Check sample_means[399] against 10.'
### END HIDDEN TESTS

### Problem 4: Distribution of Sample Means

**2 Points**

As you notice in the plot above, there is larger variation in the small sample sizes.  As the lectures suggest, consider only those samples of size 30 or more and compute their mean and standard deviation.  Assign the samples as a list to `samples_30_or_more` below and compute the mean of these samples and thier standard deviation.  Assign the mean and deviation to `samples_mean` and `samples_std` below.  Uncomment the code to see a histogram of these sample means.  

HINT: remember that Python starts counting at 0, so using 'sample_means[30:]' is actually incorrect.

In [8]:
### GRADED

samples_30_or_more = ''
samples_mean = ''
samples_std = ''

### BEGIN SOLUTION
samples_30_or_more = sample_means[29:]
samples_mean = np.mean(samples_30_or_more)
samples_std = np.std(samples_30_or_more)
### END SOLUTION


#plt.hist(samples_30_or_more, edgecolor = 'black', alpha = 0.3)
#plt.title('Distribution of Sample Means');

# Answer check
print(type(samples_30_or_more))
print(samples_mean)
print(samples_std)

<class 'list'>
9.954879356818662
0.07586422746453239


In [9]:
### BEGIN HIDDEN TESTS
samples_30_or_more_ = sample_means_[29:]
samples_mean_ = np.mean(samples_30_or_more_)
samples_std_ = np.std(samples_30_or_more_)
#
#
#
assert samples_30_or_more == samples_30_or_more_, 'Samples of 30 or more.'
assert samples_mean == samples_mean_
assert samples_std_ == samples_std
### END HIDDEN TESTS

### Problem 5: Repeat with Gaussian Distribution

**2 Points**

Now, you are to repeat the above exercise using samples from a Normal distribution centered at 5 with standard deviation 10.  Draw samples size 30 - 500 and examine their mean and deviation.  Assign the ditribution object to `gauss_dist`, the sample to the list `sample_means_gauss`, the sample mean to `gauss_mean` and standard deviation to `gauss_standard_deviation` below. Use `random_state = 22`

In [10]:
### GRADED

gauss_dist = ''
sample_means_gauss = []

#create the distribution object

#loop over values 30 - 500

    #generate samples 
    #remember the random_state
    
    #find sample mean
    
    #append mean to sample_means

gauss_mean = ''
gauss_standard_deviation = ''


### BEGIN SOLUTION
gauss_dist = norm(loc = 5, scale = 10)
sample_means_gauss = []
for i in range(30, 501):
    sample = gauss_dist.rvs(i, random_state = 22)
    sample_mean = np.mean(sample)
    sample_means_gauss.append(sample_mean)
    
gauss_mean = np.mean(sample_means_gauss)
gauss_standard_deviation = np.std(sample_means_gauss)
### END SOLUTION

# Answer check
print(type(sample_means_gauss))
print(sample_means_gauss[:5], '\n', sample_means_gauss[-5:])
print(gauss_mean)
print(gauss_standard_deviation)

<class 'list'>
[5.711989769815023, 6.495208371007466, 5.670348611610735, 5.578864596890512, 5.5331440323908465] 
 [5.85212242634483, 5.842782955969858, 5.840843359217326, 5.834153785572032, 5.841178380882507]
5.568508473194311
0.5371819419572758


In [11]:
### BEGIN HIDDEN TESTS
gauss_dist_ = norm(loc = 5, scale = 10)
sample_means_gauss_ = []
for i in range(30, 501):
    sample_ = gauss_dist_.rvs(i, random_state = 22)
    sample_mean_ = np.mean(sample_)
    sample_means_gauss_.append(sample_mean_)
    
gauss_mean_ = np.mean(sample_means_gauss_)
gauss_standard_deviation_ = np.std(sample_means_gauss_)
#
#
#
assert gauss_mean_ == gauss_mean, 'Check that youve computed the sample means'
assert gauss_standard_deviation == gauss_standard_deviation_, 'Standard Deviations do not match'
assert type(gauss_dist) == type(gauss_dist_)
### END HIDDEN TESTS