# Tutorial 11 - Introduction to Statistical Inference

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and tutorial work, you will be able to:
- Describe real world examples of questions that can be answered with the statistical inference methods.
- Name common population parameters (e.g., mean, proportion, median, variance, standard deviation) that are often estimated using sample data, and use computation to estimate these.
- Define the following statistical sampling terms (population, sample, population parameter, point estimate, sampling distribution).
- Explain the difference between a population parameter and sample point estimate.
- Use computation to draw random samples from a finite population.
- Use computation to create a sampling distribution from a finite population.
- Describe how sample size influences the sampling distribution.

In [None]:
### Run this cell before continuing.
import altair as alt
import numpy as np
import pandas as pd

alt.data_transformers.disable_max_rows()

###  Virtual sampling simulation

In this tutorial you will study samples and sample means generated from different distributions. In real life, we rarely, if ever, have measurements for our entire population. Here, however, we will make simulated datasets so we can understand the behaviour of sample means.

Suppose we had the data science final grades for a large population of students. 

In [None]:
# run this cell to simulate a finite population
np.random.seed(20201)  # DO NOT CHANGE
students_pop = pd.DataFrame({"grade": (np.random.normal(size=10000, loc=70, scale=8))})
students_pop.head()

**Question 1.0** 
<br> {points: 1}

Visualize the distribution of the population (`students_pop`) that was just created by plotting a histogram using `step = 1` in the `mark_bar` argument. Name the plot `pop_dist` and give x axis a descriptive label.

In [None]:
# ___ = (
#     alt.Chart(___, title="Population distribution")
#     .mark___()
#     .encode(x=___, y=___)
# )

# your code here
raise NotImplementedError
pop_dist

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_dist.encoding.x.field)).encode("utf-8")+b"57c4484ffec133ac").hexdigest() == "beef8c15c2757c2aad64b00358ded3284c3470bc", "type of pop_dist.encoding.x.field is not str. pop_dist.encoding.x.field should be an str"
assert sha1(str(len(pop_dist.encoding.x.field)).encode("utf-8")+b"57c4484ffec133ac").hexdigest() == "3bcef8126d57603d83f1596eaa8c792cf7522b26", "length of pop_dist.encoding.x.field is not correct"
assert sha1(str(pop_dist.encoding.x.field.lower()).encode("utf-8")+b"57c4484ffec133ac").hexdigest() == "09a18fdcb525cd71143982f7e7e97d0e540c72ef", "value of pop_dist.encoding.x.field is not correct"
assert sha1(str(pop_dist.encoding.x.field).encode("utf-8")+b"57c4484ffec133ac").hexdigest() == "09a18fdcb525cd71143982f7e7e97d0e540c72ef", "correct string value of pop_dist.encoding.x.field but incorrect case of letters"

assert sha1(str(type(pop_dist.mark)).encode("utf-8")+b"572bef2d3f36d2ce").hexdigest() == "d8a356d2f19beed8384d10b8b48e2e9af81ae23b", "type of pop_dist.mark is not str. pop_dist.mark should be an str"
assert sha1(str(len(pop_dist.mark)).encode("utf-8")+b"572bef2d3f36d2ce").hexdigest() == "a768691ade5d574536bc0f7a48b63d0b02d6a0f6", "length of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark.lower()).encode("utf-8")+b"572bef2d3f36d2ce").hexdigest() == "b0b46a701554007f3c475c3dbb7b184529d372f8", "value of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark).encode("utf-8")+b"572bef2d3f36d2ce").hexdigest() == "b0b46a701554007f3c475c3dbb7b184529d372f8", "correct string value of pop_dist.mark but incorrect case of letters"

assert sha1(str(type(pop_dist.data.shape[0])).encode("utf-8")+b"aafe2e9d22271fae").hexdigest() == "d6dd408d52f429b0f976dd970ba857ea3204ad47", "type of pop_dist.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_dist.data.shape[0]).encode("utf-8")+b"aafe2e9d22271fae").hexdigest() == "839547aeadead6bb14a3eace74c3d2594656f679", "value of pop_dist.data.shape[0] is not correct"

assert sha1(str(type(round(sum(pop_dist.data.grade), 2))).encode("utf-8")+b"03275eca1e690966").hexdigest() == "0c12e6f657c935fe0424996792b1aca438c361fc", "type of round(sum(pop_dist.data.grade), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(pop_dist.data.grade), 2), 2)).encode("utf-8")+b"03275eca1e690966").hexdigest() == "584ee92b5a58334b904bf59a1dc05d308d7d096b", "value of round(sum(pop_dist.data.grade), 2) is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 1.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.2** 
<br> {points: 1}

Use `describe` to calculate the following population parameters from the `students_pop` population:
- mean (`mean`)
- median (`50%`)
- standard deviation (`std`)

Name this data frame `pop_parameters` and rename the column names to `pop_mean`, `pop_med` and `pop_sd`.

In [None]:
# your code here
raise NotImplementedError
pop_parameters

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_parameters.shape[0])).encode("utf-8")+b"900b74ef134fc9b3").hexdigest() == "da9d338850733213015e7a3c5a162005a8523dc3", "type of pop_parameters.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[0]).encode("utf-8")+b"900b74ef134fc9b3").hexdigest() == "5debbeed4819251a3db9931136be10f48b37227e", "value of pop_parameters.shape[0] is not correct"

assert sha1(str(type(pop_parameters.shape[1])).encode("utf-8")+b"38fedc1cb8fd757b").hexdigest() == "8e12576b9cbd6c33a2cb7d0d38339d6fe0002411", "type of pop_parameters.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[1]).encode("utf-8")+b"38fedc1cb8fd757b").hexdigest() == "43a5bdd184845d5c0a86777ac5c461f84d01f1df", "value of pop_parameters.shape[1] is not correct"

assert sha1(str(type(pop_parameters.pop_mean)).encode("utf-8")+b"de99ed8e81600650").hexdigest() == "a5a6e32dc198df9dc28be2551fd0efc786ed143a", "type of pop_parameters.pop_mean is not correct"
assert sha1(str(pop_parameters.pop_mean).encode("utf-8")+b"de99ed8e81600650").hexdigest() == "ec8301f60c64f61f997e629fdbd1d54988ed6a4f", "value of pop_parameters.pop_mean is not correct"

assert sha1(str(type(pop_parameters.pop_sd)).encode("utf-8")+b"06ad16ccef301650").hexdigest() == "d7a165875d264be01f6441e47ad23f00da8364ee", "type of pop_parameters.pop_sd is not correct"
assert sha1(str(pop_parameters.pop_sd).encode("utf-8")+b"06ad16ccef301650").hexdigest() == "1b6eb8fa065b931f87ef0243adb585b63f7ecffb", "value of pop_parameters.pop_sd is not correct"

print('Success!')

### Exploring the sampling distribution of the sample mean for different populations
We will create the sampling distribution of the sample mean by taking 1500 random samples of size 5 from this population and visualize the distribution of the sample means. 


**Question 1.3** 
<br> {points: 1}

Draw 1500 random samples from our population of students (`students_pop`). Each sample should have 5 observations. Name the data frame `samples`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE
# samples = []
# for rep in range(___):
#     sample = students_pop.sample(___)
#     sample = sample.assign(replicate=rep)
#     samples.append(___)
# samples = pd.concat([___[i] for i in range(len(___))])

# your code here
raise NotImplementedError
print(samples.head())
print(samples.tail())
print(samples.shape)

In [None]:
from hashlib import sha1
assert sha1(str(type(samples.shape[0])).encode("utf-8")+b"5d4c73c9bd42ecdc").hexdigest() == "b491807382dd97feb46cafe29d675e1a62936a52", "type of samples.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[0]).encode("utf-8")+b"5d4c73c9bd42ecdc").hexdigest() == "18b2fabb9918932a9b7640c534bcdf8215df6e6f", "value of samples.shape[0] is not correct"

assert sha1(str(type(samples.shape[1])).encode("utf-8")+b"77e71465c5aa88cd").hexdigest() == "bcd3f866eb4bfb54351ce35d9c2bbff82d3465fc", "type of samples.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[1]).encode("utf-8")+b"77e71465c5aa88cd").hexdigest() == "179b09bb3173598a1e6fd90c000395590ddc7ac2", "value of samples.shape[1] is not correct"

assert sha1(str(type("".join(samples.columns))).encode("utf-8")+b"e7f7b3e09f3e46be").hexdigest() == "50e07f0a72c1748e3149e21c598c023aaf15191f", "type of \"\".join(samples.columns) is not str. \"\".join(samples.columns) should be an str"
assert sha1(str(len("".join(samples.columns))).encode("utf-8")+b"e7f7b3e09f3e46be").hexdigest() == "a58534ef6f64af8ad09ba5efa9533e3d44c831ac", "length of \"\".join(samples.columns) is not correct"
assert sha1(str("".join(samples.columns).lower()).encode("utf-8")+b"e7f7b3e09f3e46be").hexdigest() == "98e9fe18648562688b305b0eee6d91284a66b909", "value of \"\".join(samples.columns) is not correct"
assert sha1(str("".join(samples.columns)).encode("utf-8")+b"e7f7b3e09f3e46be").hexdigest() == "98e9fe18648562688b305b0eee6d91284a66b909", "correct string value of \"\".join(samples.columns) but incorrect case of letters"

assert sha1(str(type(sum(samples.replicate.unique()))).encode("utf-8")+b"5df01b90ac3cd985").hexdigest() == "4ec6166ec0cb5626be34f1fada266aef6e7fd947", "type of sum(samples.replicate.unique()) is not correct"
assert sha1(str(sum(samples.replicate.unique())).encode("utf-8")+b"5df01b90ac3cd985").hexdigest() == "ee04cd2e68c4f249d79a61be1c682c7454938d09", "value of sum(samples.replicate.unique()) is not correct"

print('Success!')

**Question 1.4** 
<br> {points: 1}

Group by the sample replicate number, and then for each sample, calculate the mean. Name the data frame `sample_estimates`. The data frame should have the column names `replicate` and `sample_mean`.

In [None]:
# your code here
raise NotImplementedError
print(sample_estimates.head())
print(sample_estimates.tail())

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_estimates.shape[0])).encode("utf-8")+b"31b3565242e372ca").hexdigest() == "5b45b96b4e5cdcb81820854fd3e56e8eb6860cd3", "type of sample_estimates.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[0]).encode("utf-8")+b"31b3565242e372ca").hexdigest() == "e32b82187b40e8117b12f5cb335acf17d275635a", "value of sample_estimates.shape[0] is not correct"

assert sha1(str(type(sample_estimates.shape[1])).encode("utf-8")+b"746168d76930a427").hexdigest() == "12ebcd562654f00ebca192b3e3ff3f0d76d6d011", "type of sample_estimates.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[1]).encode("utf-8")+b"746168d76930a427").hexdigest() == "db0c694f5c80b3fc3af41f7ccef3787431074477", "value of sample_estimates.shape[1] is not correct"

assert sha1(str(type("".join(sample_estimates.columns))).encode("utf-8")+b"eb9c669af6147550").hexdigest() == "6eabe1ca3fde5c7706632db4f01482bfc81b4924", "type of \"\".join(sample_estimates.columns) is not str. \"\".join(sample_estimates.columns) should be an str"
assert sha1(str(len("".join(sample_estimates.columns))).encode("utf-8")+b"eb9c669af6147550").hexdigest() == "00c2232bdf1a68d77e174cf103ef877de1818387", "length of \"\".join(sample_estimates.columns) is not correct"
assert sha1(str("".join(sample_estimates.columns).lower()).encode("utf-8")+b"eb9c669af6147550").hexdigest() == "d1e0970160b78aa1a194dd144f00aa96eb950d75", "value of \"\".join(sample_estimates.columns) is not correct"
assert sha1(str("".join(sample_estimates.columns)).encode("utf-8")+b"eb9c669af6147550").hexdigest() == "d1e0970160b78aa1a194dd144f00aa96eb950d75", "correct string value of \"\".join(sample_estimates.columns) but incorrect case of letters"

print('Success!')

**Question 1.5** 
<br> {points: 1}

Visualize the distribution of the sample estimates (`sample_estimates`) you just calculated by plotting a histogram using `step = 1` in the `mark_bar` argument. Name the plot `sampling_distribution` and give the plot and the x axis a descriptive label.

In [None]:
# your code here
raise NotImplementedError
sampling_distribution_5

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_5.encoding.x.field)).encode("utf-8")+b"34c43fc6a9428569").hexdigest() == "a1acdd3c9ddc7507cc3360ac27740742dc8ee7e7", "type of sampling_distribution_5.encoding.x.field is not str. sampling_distribution_5.encoding.x.field should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.x.field)).encode("utf-8")+b"34c43fc6a9428569").hexdigest() == "7052e6895ff9410d50afccd88f635a8a6d7ec674", "length of sampling_distribution_5.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_5.encoding.x.field.lower()).encode("utf-8")+b"34c43fc6a9428569").hexdigest() == "83692ed0c80392e4c674967e0307a83bdf7de8cf", "value of sampling_distribution_5.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_5.encoding.x.field).encode("utf-8")+b"34c43fc6a9428569").hexdigest() == "83692ed0c80392e4c674967e0307a83bdf7de8cf", "correct string value of sampling_distribution_5.encoding.x.field but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.mark)).encode("utf-8")+b"b99f07b879f18706").hexdigest() == "4f0b0c67718a9735447addbb97545d488ae1e467", "type of sampling_distribution_5.mark is not str. sampling_distribution_5.mark should be an str"
assert sha1(str(len(sampling_distribution_5.mark)).encode("utf-8")+b"b99f07b879f18706").hexdigest() == "2c453d241613a2bea16baf04797b46ed8c00a57f", "length of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark.lower()).encode("utf-8")+b"b99f07b879f18706").hexdigest() == "eee88699156021b082403fdf5fa3fa3e83ea3e43", "value of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark).encode("utf-8")+b"b99f07b879f18706").hexdigest() == "eee88699156021b082403fdf5fa3fa3e83ea3e43", "correct string value of sampling_distribution_5.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.data.shape[0])).encode("utf-8")+b"d78c7a9cb8fc1200").hexdigest() == "417ed684e7884864ac35461fbc05fc2872d2d6d4", "type of sampling_distribution_5.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_5.data.shape[0]).encode("utf-8")+b"d78c7a9cb8fc1200").hexdigest() == "303653266a2d2a9697c6d402a73d462f063cfb47", "value of sampling_distribution_5.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_5.data.sum()), 2))).encode("utf-8")+b"897f9565406de362").hexdigest() == "8f278a4d9c420cb5f356a4000e19003ec463ce29", "type of round(sum(sampling_distribution_5.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_5.data.sum()), 2), 2)).encode("utf-8")+b"897f9565406de362").hexdigest() == "81cb9189988785b12ffa811a5de5c7bbc6592de7", "value of round(sum(sampling_distribution_5.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title)).encode("utf-8")+b"866a41451ad522d3").hexdigest() == "9e839471173d4c23af36c19fbe928fff6b3a8591", "type of sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title is not bool. sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title should be a bool"
assert sha1(str(sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title).encode("utf-8")+b"866a41451ad522d3").hexdigest() == "184734ad64070c4cca66fb94c855ac14ee75d02c", "boolean value of sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title is not correct"

assert sha1(str(type(sampling_distribution_5.title != None)).encode("utf-8")+b"352ea5fb36701bea").hexdigest() == "f39fb7b7da45a01b84e24d8b803a67fcbf8da923", "type of sampling_distribution_5.title != None is not bool. sampling_distribution_5.title != None should be a bool"
assert sha1(str(sampling_distribution_5.title != None).encode("utf-8")+b"352ea5fb36701bea").hexdigest() == "334f47d663ec1476a7a0cc231dadd8cb006f740d", "boolean value of sampling_distribution_5.title != None is not correct"

print('Success!')

**Question 1.6** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution of students' grades above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.7**
<br> {points: 1}

Let's create a simulated dataset of the number of cups of coffee drunk per week for our population of students. 
Describe in words the distribution, comment on the shape, center and how spread out the distribution is. 

In [None]:
# run this cell to simulate a finite population
coffee_data = pd.DataFrame({"cups": np.random.exponential(size=2000, scale=1 / 0.34)})

pop_dist = (
    alt.Chart(coffee_data, title="Population distribution")
    .mark_bar()
    .encode(
        x=alt.X("cups:Q", title="Cups of coffee per week", bin=alt.Bin(maxbins=50)),
        y="count()",
    )
    .properties(width=400, height=400)
    .configure_title(fontSize=20)
    .configure_axis(labelFontSize=18, titleFontSize=15)
)
pop_dist

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.8**
<br> {points: 1}

Repeat the steps in questions 1.3 - 1.5 with sample size 5 for this population. Assign the mean number of cups to an object called `sample_mean`. You should end up with a plot of the sampling distribution called `sampling_distribution_5`, set the `maxbins` of this plot to be 10.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE!

# your code here
raise NotImplementedError
sampling_distribution_5

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_5.encoding.x.field)).encode("utf-8")+b"70ee81992454fb62").hexdigest() == "c6da408805eb3deee1589a66b093d4b701ec9a6e", "type of sampling_distribution_5.encoding.x.field is not str. sampling_distribution_5.encoding.x.field should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.x.field)).encode("utf-8")+b"70ee81992454fb62").hexdigest() == "beea607e35da98d88a2610d0ae5ab7424e653394", "length of sampling_distribution_5.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_5.encoding.x.field.lower()).encode("utf-8")+b"70ee81992454fb62").hexdigest() == "056612c52c7bd627aa4967c5a718e5ebc32d5f06", "value of sampling_distribution_5.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_5.encoding.x.field).encode("utf-8")+b"70ee81992454fb62").hexdigest() == "056612c52c7bd627aa4967c5a718e5ebc32d5f06", "correct string value of sampling_distribution_5.encoding.x.field but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.mark)).encode("utf-8")+b"e793de420f0f1dac").hexdigest() == "79d4f5f8e1f8a02735f2a15a4062b386ccc66d84", "type of sampling_distribution_5.mark is not str. sampling_distribution_5.mark should be an str"
assert sha1(str(len(sampling_distribution_5.mark)).encode("utf-8")+b"e793de420f0f1dac").hexdigest() == "e7b0ee95e237c8a755fa30630fe4625840b58d49", "length of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark.lower()).encode("utf-8")+b"e793de420f0f1dac").hexdigest() == "313dd83a422fa1dcc02635db07eb0559ba0d9cc5", "value of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark).encode("utf-8")+b"e793de420f0f1dac").hexdigest() == "313dd83a422fa1dcc02635db07eb0559ba0d9cc5", "correct string value of sampling_distribution_5.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.data.shape[0])).encode("utf-8")+b"71e1551b63ca315c").hexdigest() == "793f89bb41d26d60b0184da879305d4d5cde8302", "type of sampling_distribution_5.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_5.data.shape[0]).encode("utf-8")+b"71e1551b63ca315c").hexdigest() == "0e88bd78bf7b3c7ca6bb3ec88928121766908ba5", "value of sampling_distribution_5.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_5.data.sum()), 2))).encode("utf-8")+b"87fe814526c522bc").hexdigest() == "c2310b5aa5256a9f2b1fd1a00074b95daa999d91", "type of round(sum(sampling_distribution_5.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_5.data.sum()), 2), 2)).encode("utf-8")+b"87fe814526c522bc").hexdigest() == "76abe295c11b22d4ec3c2f8a36339a3ba5010084", "value of round(sum(sampling_distribution_5.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title)).encode("utf-8")+b"900704accdd7b1d6").hexdigest() == "ad927e34d083f39a967f6e41cf30e915c655bd8e", "type of sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title is not bool. sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title should be a bool"
assert sha1(str(sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title).encode("utf-8")+b"900704accdd7b1d6").hexdigest() == "fbc2526027d98cfce88c10080d59485c206a9c10", "boolean value of sampling_distribution_5.encoding.x.field != sampling_distribution_5.encoding.x.title is not correct"

assert sha1(str(type(sampling_distribution_5.title is not None)).encode("utf-8")+b"ed717c793339bf97").hexdigest() == "0bc3afe9ed1f2348fddca364e8cf0c01e27c06b0", "type of sampling_distribution_5.title is not None is not bool. sampling_distribution_5.title is not None should be a bool"
assert sha1(str(sampling_distribution_5.title is not None).encode("utf-8")+b"ed717c793339bf97").hexdigest() == "6177541dc56a234a9afb3dc32032907223b0ea35", "boolean value of sampling_distribution_5.title is not None is not correct"

assert sha1(str(type(sum(samples.replicate.unique()))).encode("utf-8")+b"ebcdb7d62b6d45a8").hexdigest() == "2a492c9488136ce8af23e8bb99afbfd55fa098c4", "type of sum(samples.replicate.unique()) is not correct"
assert sha1(str(sum(samples.replicate.unique())).encode("utf-8")+b"ebcdb7d62b6d45a8").hexdigest() == "49b55505f5fbdf4ac6ab4be263f986c522050b83", "value of sum(samples.replicate.unique()) is not correct"

print('Success!')

**Question 1.9** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 2.0** 
<br> {points: 1}

Repeat the steps in questions 1.3 - 1.5 using a sample size of 30 for this coffee population. Remember to assign the mean number of cups to an object called `sample_mean`. You should end up with a plot of the sampling distribution called `sampling_distribution_30`, set the `maxbins` of this plot to be 10.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE!

# your code here
raise NotImplementedError
sampling_distribution_30

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_30.encoding.x.field)).encode("utf-8")+b"c64a5f443fef11d0").hexdigest() == "89a399be917c7734a5541d049939ad7ea220b26f", "type of sampling_distribution_30.encoding.x.field is not str. sampling_distribution_30.encoding.x.field should be an str"
assert sha1(str(len(sampling_distribution_30.encoding.x.field)).encode("utf-8")+b"c64a5f443fef11d0").hexdigest() == "65c891eb722a37d4bd32b8b464906a2686bbb079", "length of sampling_distribution_30.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_30.encoding.x.field.lower()).encode("utf-8")+b"c64a5f443fef11d0").hexdigest() == "b6bf4adf8d1ef6c6fd919ac9ae2462c4a54e6e5b", "value of sampling_distribution_30.encoding.x.field is not correct"
assert sha1(str(sampling_distribution_30.encoding.x.field).encode("utf-8")+b"c64a5f443fef11d0").hexdigest() == "b6bf4adf8d1ef6c6fd919ac9ae2462c4a54e6e5b", "correct string value of sampling_distribution_30.encoding.x.field but incorrect case of letters"

assert sha1(str(type(sampling_distribution_30.mark)).encode("utf-8")+b"7e05dde0aab27dc7").hexdigest() == "c99c600aee837a4cff9272ba7f200d7e4b7ae8a8", "type of sampling_distribution_30.mark is not str. sampling_distribution_30.mark should be an str"
assert sha1(str(len(sampling_distribution_30.mark)).encode("utf-8")+b"7e05dde0aab27dc7").hexdigest() == "4d0b22fcd44c82eab24e59bd4cbf52745840b5d8", "length of sampling_distribution_30.mark is not correct"
assert sha1(str(sampling_distribution_30.mark.lower()).encode("utf-8")+b"7e05dde0aab27dc7").hexdigest() == "53917ebd51b7874e1aa3d5934262bda5f4fea6e8", "value of sampling_distribution_30.mark is not correct"
assert sha1(str(sampling_distribution_30.mark).encode("utf-8")+b"7e05dde0aab27dc7").hexdigest() == "53917ebd51b7874e1aa3d5934262bda5f4fea6e8", "correct string value of sampling_distribution_30.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_30.data.shape[0])).encode("utf-8")+b"4678c94648c91544").hexdigest() == "e9f99eb8251b3d0229402b7975904b4e1c2ebba8", "type of sampling_distribution_30.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_30.data.shape[0]).encode("utf-8")+b"4678c94648c91544").hexdigest() == "9f3d4e77b50b9985d3d66c4fe878b44db6eb6d4e", "value of sampling_distribution_30.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_30.data.sum()), 2))).encode("utf-8")+b"94cacc9a79c1afd3").hexdigest() == "079f748c30ee0d5b612053cf28f6b87fded8f171", "type of round(sum(sampling_distribution_30.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_30.data.sum()), 2), 2)).encode("utf-8")+b"94cacc9a79c1afd3").hexdigest() == "7016ddd701d1dcc927bc41ef29a4e344baef4018", "value of round(sum(sampling_distribution_30.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(sampling_distribution_30.encoding.x.field != sampling_distribution_30.encoding.x.title)).encode("utf-8")+b"b344b0f27ebaf1b2").hexdigest() == "7cb1a4d760ce9b2c86ed595fb911eed593750373", "type of sampling_distribution_30.encoding.x.field != sampling_distribution_30.encoding.x.title is not bool. sampling_distribution_30.encoding.x.field != sampling_distribution_30.encoding.x.title should be a bool"
assert sha1(str(sampling_distribution_30.encoding.x.field != sampling_distribution_30.encoding.x.title).encode("utf-8")+b"b344b0f27ebaf1b2").hexdigest() == "1e5b14c699f4fcce48769a458f4db3c3af355f2a", "boolean value of sampling_distribution_30.encoding.x.field != sampling_distribution_30.encoding.x.title is not correct"

assert sha1(str(type(sampling_distribution_30.title is not None)).encode("utf-8")+b"cde7bd676d7c6472").hexdigest() == "9f34714081d183cdb719cdf550c400fffc8ee361", "type of sampling_distribution_30.title is not None is not bool. sampling_distribution_30.title is not None should be a bool"
assert sha1(str(sampling_distribution_30.title is not None).encode("utf-8")+b"cde7bd676d7c6472").hexdigest() == "07e66f097a85cfa52c3601846b0a395a7bc20bdc", "boolean value of sampling_distribution_30.title is not None is not correct"

print('Success!')

**Question 2.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution with samples of size 30 to the sampling distribution with samples of size 5. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.