# Chapter 23 - Inferences About Means

## Getting Started

### The Central Limit Theorem

When a random sample is drawn from any population with mean $\mu$ and standard deviation $\sigma$, its sample mean, $\bar{y}$, has a sampling distribution with the same _mean_ $\mu$ but whose _standard deviation_ is $\frac{\sigma}{\sqrt{n}}$ (and we write

\begin{equation}
\sigma(\bar{y}) = SD(\bar{y}) = \frac{\sigma}{\sqrt{n}}
\end{equation}
).

No matter what population the random sample comes from, the _shape_ of the sampling distribution is approximately Normal as long as the sample size is large enough.  The larger the sample used, the more closely the Normal approximates the sampling distribution for the mean.

## Gosset's _t_

Student's _t_-models form a whole _family_ of related distributions that depend on a parameter known as **degrees of freedom**.

## A Confidence Interval for Means

### A Practical Sampling Distribution Model for Means

When certain assumptions and conditions are met, the standardized sample mean, 

\begin{equation}
t = \frac{\bar{y} - \mu}{SE(\bar{y})},
\end{equation}

follows a Student's _t_-model with $n - 1$ degrees of freedom.  We estimate the standard deviation with 

\begin{equation}
SE(\bar{y}) = \frac{s}{\sqrt{n}}
\end{equation}

### One-Sample _t_-Interval for the Mean

When the assumptions and conditions are met, we are ready to find the confidence interval for the population mean, $\mu$.  The confidence interval is 

\begin{equation}
\bar{y} \pm t^*_{n - 1} \times SE(\bar{y}),
\end{equation}

where the standard error fo the mean is $SE(\bar{y}) = \frac{s}{\sqrt{n}}$.

The critical value $t^*_{n-1}$ depends on the particular confidence level, $C$, that you specify and on the number of degrees of freedom, $n-1$, which we get from the sample size.

## Assumptions and Conditions

### Independence Assumption

* the data values should be independent
  * **randomization condition**: the data arise from a random sample or suitably randomized experiment
  * **10% condition**: the sample is no more than 10% of the population

### Normal Population Assumption

* **Nearly Normal Condition**: the data come from a distribution that is unimodal and symmetric

## Step-by-Step Example: A One-Sample _t_-Interval for the Mean

* Plan: 
  - State what we want to know.  Identify the parameter of interest
  - Identify the variables and review the W's
  - Make a picture.  Check the distribution shape and look for skewness, multiple modes, and outliers
* Model
  - Think about the assumptions and check the conditions
  - State the sampling distribution model for the statistic
  - Choose your method
* Mechanics
  - Construct the confidence interval
  - Include units along with the statistics
* Conclusion
  - Interpret the confidence interval in the proper context

## More Cautions About Interpreting Confidence Intervals

* don't say:
  * "95% of x are between i and j" -- the confidence interval is about the _mean_, not about individual values
  * "We are 95% confident that a randomly selected entity will have a value between i and j" -- as above, this is a statement about individuals while our confidence interval is about the _mean_
  * "The mean value of the entities is i 95% of the time." -- This implies that the mean varies, when in fact the confidence interval is what varies and the mean stays the same.
  * "95% of all samples will have a mean value between i and j." -- This implies that the current sample sets the standard for all other samples.
* do say:
  * "95% of intervals that could be found in this way would cover the true value.", OR
  * "I am 95% confident that the true mean is between i and j."


## Make a Picture

* the only reasonable way to check the Nearly Normal Condition is with graphs of the data
* make a histogram of the data and verify that:
  * its distribution is unimodal,
  * its distribution is symmetric, and
  * that it has no outliers  

## A Test for the Mean

* a hypothesis test called the **one-sample $t$-test for the mean**

### One-Sample $t$-Test for the Mean

The assumptions and conditions for the one-sample $t$-test for the mean are the same as for the one-sample $t$-interval.  We test the hypothesis $H_0: \mu = \mu_0$ using the statistic

\begin{equation}
t_{n-1} = \frac{\bar{y} - \mu_0}{SE(\bar{y})}
\end{equation}

The standard error of $\bar{y}$ is $SE(\bar{y}) = \frac{s}{\sqrt{n}}$.

When the conditions are met and the null hypothesis is true, this statistic follows a Student's $t$-model with $n - 1$ degrees of freedom.  We use that model to obtain a P-value.

## Step-by-Step Example:  A One-Sample $t$-Test for the Mean

* Plan:
  * State what we want to know.
  * Make clear what the population and parameter are.
  * Identify the variables and review the W's.
* Identify the hypotheses
* Make a picture
  * check the distribution for skewness, multiple modes, and outliers
* Model
  * Think about the assumptions and check the conditions
  * State the sampling distribution model.
  * Choose your method.
* Mechanics
  * Be sure to include the units
* Conclusion
  * Link the P-value to your decision about $H_0$, and state your conclusion in context

## Finding $t$-Values by Hand

## Significance and Importance

## Intervals and Tests

* a level $C$ confidence interval contains _all_ of the plausible null hypothesis values that would _not_ be rejected by a two-sided hypothesis test at alpha level $1 - C$.

## Sample Size

## Degrees of Freedom

## *The Sign Test -- Back to Yes and No

## Step-by-Step : *A Sign Test

* Plan 
  - state what we want to know
  - identify the parameter of interest
  - identify the variables and review the W's
* Hypothesis
  - write the null and alternative hypotheses
* Model
  - think about the assumptions and check the conditions
  - choose your method
    - sign test is just a one-proportion $z$-test for $p_0 = 0.5$
* Mechanics
  - use the null model to find the P-value
* Conclusion
  - link the P-value to your decision, then state your conclusions in the proper context

## What Can Go Wrong?

* Don't confuse proportions and means.
* Beware of multimodality.
  - if the data appears to have multiple modes, attempt to separate into groups and perform the analysis against each
* Beware of skewed data.
  - one way of dealing with skewed data is attempting to re-express the data, possibly leading to a more symmetric distribution
* Set outliers aside.
  - consider doing analysis twice (w/ and w/o the outliers) to see how they affect the results
* Watch out for bias.
  - be sure to think about possible sources of bias in your measurements
* Make sure cases are independent.
* Make sure that data are from an appropriately randomized sample.
* Interpret your confidence interval correctly.

## What Have We Learned?

* [p. 570]