# Chapter 27 - Inferences for Regressions

## The Population and the Sample

* imagine an idealized regressions line
* model assumes that the _means_ of the distributions of the $y$ variable for each $x$ value fall along the line, even though the individuals are scattered around it
* we write the idealized line with Greek letters and consider the coefficients (the slope and intercept) to be _parameters_: $\beta_0$ is the intercept and $\beta_1$ is the slope.  Corresponding to our fitted line of $\hat{y} = \beta_0 + \beta_1 x$, we write

\begin{equation}
\mu_y = \beta_0 + \beta_1 x
\end{equation}

## Assumptions and Conditions

1. Linearity Assumption
    * **straight enough condition**
2. Independence Assumption
    * **randomization condition**
3. Equal Variance Assumption
    * **does the plot thicken? condition**
4. Normal Population Assumption
    * **nearly normal condition**
    * **outlier condition**

## Which Come First: The Conditions or the Residuals?

* Work through the following steps, moving to the subsequent step only when the prior steps "pass":


1. make a scatterplot fo the data; check straight enough condition; re-express if needed
2. Fit a regression and find the residuals, $e$, and predicted values, $\hat{y}$
3. Make a scatterplot of the residuals against $x$ or against the predicted values
4. If data are measured over time, plot the residuals over time to check for evidence of patterns that might suggest they are not independent.
5. Make a histogram and Normal probability plot fo the residuals to check the nearly normal condition.
6. Continue with inference.

## Step-by-Step Example : Regression Inference

* Plan:
  * specify the question of interest
  * name the variables and report the W's
  * identify the parameters you want to estimate
* Model
  * think about the assumptions and check the conditions
  * make pictures: scatterplot, residuals plot, either histogram or Normal probability plot of the residuals
  * choose your method
* Mechanics
  * use tech to generate the regression
  * write the regression equation
* Conclusion:
  * interpret your results in context

## Intuition About Regression Inference

What aspects of the data affect how much the slope (and intercept) vary from sample to sample?

* the spread around the line is measured with the **residual standard deviation**, $s_e$

\begin{equation}
s_e = \sqrt{
  \frac{
    \sum{
    (y - \hat{y})^2
    }
  }{n - 2}
}
\end{equation}

* spread of the x's
* sample size

## Standard Error for the Slope

* formula for standard error:

\begin{equation}
SE(b_1) = \frac{
  s_e
}{
  \sqrt{n - 1 s_x}
}
\end{equation}

* when we standardize the slopes by subtracting the model mean and dividing by their standard error, we get a Student's $t$-model, this time with $n-2$ degrees of freedom:

\begin{equation}
\frac{
b_1 - \beta_1 
}
{
SE(b_1)
}
\sim t_{n - 2}
\end{equation}

### A Sampling Distribution for Regression Slopes

When the conditions are met, the standardized estimated regression slope,

\begin{equation}
t = \frac{
b_1 - \beta_1 
}
{
SE(b_1)
}
\end{equation}

follows a Student's $t$-model with $n-2$ degrees of freedom.  We estimate the stanard error with

\begin{equation}
SE(b_1) = \frac{
  s_e
}{
  \sqrt{n - 1 s_x}
}
\end{equation}

where

\begin{equation}
s_e = \sqrt{
  \frac{
    \sum{
    (y - \hat{y})^2
    }
  }{n - 2}
}
\end{equation}

$n$ is the number of data values, and $s_x$ is the ordinary standard deviation of the $x$-values.

## What About the Intercept?

## Regression Inference

To test $H_0: \beta_1 = 0$, we find

\begin{equation}
t_{n-2} = \frac {b_1 - 0}  {SE(b_1)}
\end{equation}

so a 95% **confidence interval for $\beta$** is

\begin{equation}
b_1 \pm t^*_{n-2} \times SE(b_1)
\end{equation}

## Another Example

## Step-by-Step Example: A Regression Slope $t$-Test

* Plan
  * state what you want to know
  * identify the _parameter_ you wish to estimate 
  * identify the variables and review the W's
* Hypotheses
  * write your null and alternative hypotheses
* Model
  * think about the assumptions and check the conditions
  * make pictures; plot the residuals against predicted values; for time series, plot residuals against time
  * state the sampling distribution model
  * choose your method
* Mechanics
  * run regression via tech
  * get the P-values generated
* Conclusion
  * link the P-value to your decision and state your concolusion in the proper context
* Show
  * create a confidence interval for the true slope
* Tell
  * interpret the interval

## Standard Errors for Predicted Values

* for predicting mean

\begin{equation}
SE(\hat{\mu}_{\nu}) =
\sqrt{
SE^2(b_1)\cdot(x_\nu - \bar{x})^2 + \frac{s^2_e}{n}
}
\end{equation}

* for predicting values of individuals

\begin{equation}
SE(\hat{y}_{\nu}) =
\sqrt{
SE^2(b_1)\cdot(x_\nu - \bar{x})^2 + \frac{s^2_e}{n} + s^2_e
}
\end{equation}
