# Chapter 18 - Sampling Distribution Models

## The Central Limit Theorem for Sample Proportions

* **sampling distribution**: a representation of all values of a summary statistic obtained from _all possible samples_ of a population
* a **sampling distribution model** for how a proportion (summary statistic) varies from sample to sample allows us to quantify that variation and to talk about how likely it is that we'd observe a sample proportion in any particular interval.
* **sampling error**: variability that you would expect to see from one sample to another; aka "sampling variability" (not really _error_)

## How Good is the Normal Model?

* The sampling distribution of a sample proportion can be modeled weel by a Normal model
* This representation improves as sample size increases.

## Assumptions and Conditions

* Two assumptions must be made to use the sampling distribution model for sample proportions:
    * Independence:  The sample value smust be independent of each other.
    * Sample Size: The sample size, $n$, must be large enough.
    
* Check the following conditions that provide evidence to support the assumptions:
    * Randomization Condition:
      - for an experiment: random assignment to treatments,
      - for a survey: SRS,
      - etc.
    * 10% Condition: Sample size, $n$, must be no larger than 10% of the population.
    * Success / Failure Condition: Sample size must support expectation of at least 10 successes and 10 failures.

## A Sampling Distribution Model for a Proportion

* We can think of the sample proportion as a random variable taking on a different value in each random sample.
* We now see it as a random variable quantity that has a probability distribution, with a model for that distribution: the **sampling distribution model** for the proportion:

> Provided that the sampled values are independent and the sample size is large enough, the sampling distribution of $\hat{p}$ is modeled by a Normal model with mean $\mu(\hat{p})$ and standard deviation $SD(\hat{p}) = \sqrt{\frac{pq}{n}}$

## Step-by-Step Example: Working with Sampling Distribution Models for Proportions

* Plan: State what we want to know
* Model: Think about the assumptions and check the conditions
* State the parameters and the sampling distribution model
* Plot: make a picture; sketch the model and shade the area we're interested in
* Mechanics: use the standard deviation as a ruler to find the z-score of the cutoff proportion
* Find the resulting probability
* Conclusion: interpret the probability in the context of the question

## What About Quantitative Data?

### Simulating the Sampling Distribution of a Mean

## The Central Limit Theorem: The Fundamental Theorem of Statistics

The sampling distribution of _any_ mean becomes more nearly normal as the sample size grows.  All we need is for the observations to be independent and collected with randomization.  We aren't concerned about the shape of the population distribution.  


> **The Central Limit Theorem (CLT)**
> The mean of a random sample is a random variable whose sampling distribution can be approximated by a Normal model.  The larger the sample, the better the approximation will be.

## Assumptions and Conditions

Essentially the same as those for modelling proportions:

* Independence
* Sample Size

## But Which Normal?

> **The Sampling Distribution Model for a Mean (CLT)**
> When a random sample is drawn from any population with mean $\mu$ and standard deviation $\sigma$, its sample mean, $\bar{y}$, has a sampling distribution with the same mean $\mu$ but whose standard deviation is $\frac{\sigma}{\sqrt{n}}$ (and we write $\sigma(\bar{y}) = SD(\bar{y}) = \frac{\sigma}{\sqrt{n}}$).  No matter what population the random sample comes from, the _shape_ of the sampling distribution is approximately Normal as long as the sample size is large enough.  The larger the sample used, the more closely the Normal approximates the sampling distribution for the mean.

## Step-by-Step Example: Working with the Sampling Distribution Model for the Mean

* Plan: state what we want to know
* Model: Think about the assumptions and check the conditions.
* State the parameters and the sampling model.
* Plot: Make a picture.  Sketch the model and shade the area we're interested in.
* Mechanics: Use the standard deviation as a ruler to find the z-score of the cutoff mean.
* Find the resulting probability
* Conclusion: interpret your result in the proper context

## About Variation

* The $\sqrt{n}$ in the denominator highlights the intuition that the variability of sample means will decrease as the sample size increases.

## The Real World and the Model World

* The Central Limit Theorem doesn't talk about the distribution of the data from the sample.  It talks about the sample _means_ and sample _proportions_ of many different random samples drawn from the same population.

## Sampling Distribution Models

## What Can Go Wrong?

* Don't confuse the sampling distribution with the distribution of the sample.
* Beware of observations that are not independent.
* Watch out for small samples from skewed populations.

## What Have We Learned?

* [p. 449]