**SM339 &#x25aa; Applied Statistics &#x25aa; Spring 2023 &#x25aa; Uhan**

# Lesson 15. Coefficient of Determination for Multiple Linear Regression

## The coefficient of (multiple) determination $R^2$

- Question: __How much of the variation in $Y$ is explained by the model?__

- Formula:

$$ R^2 = \frac{\text{variability explained by model}}{\text{total variability in $Y$}} = \frac{\mathit{SSModel}}{\mathit{SSTotal}} = 1 - \frac{\mathit{SSE}}{\mathit{SSTotal}} $$

- Interpretation:

    - Suppose we fit a multiple linear regression model and the $R^2$ value is 0.75
    
    - Then we would say
        > The linear regression model with explanatory variables __[list of explanatory variables]__ explains 75% of the variability in the response variable __[response variable]__.

- Things to note:

    - We call $R^2$ the coefficient of _multiple_ determination because we now have multiple predictors
    
    - For simple linear regression, we use $r^2$ (with lowercase $r$) to denote the coefficient of determination because it was the sample correlation $r$ between $X$ and $Y$ squared
    
    - That interpretation does not translate directly to the case with multiple predictors, since each predictor has its own correlation with the response variable
    
    - However, in the multiple linear regression setting, if we calculate the correlation between $y$ (observed) and $\hat{y}$ (fitted) and then square that correlation, we would obtain $R^2$

### Example 1

Continuing with the `RailsTrails` data and model from the previous lesson...

In [None]:
library(Stat2Data)
data(RailsTrails)

#### a.
How much of the variability in $\mathit{Price2014}$ is explained by the _simple_ linear regression model with _only_ $\mathit{SquareFeet}$ as the predictor?

*Write your notes here. Double-click to edit.*

#### b.
How much of the variability in $\mathit{Price2014}$ is explained by the _simple_ linear regression model with _only_ $\mathit{Distance}$ as the predictor?

*Write your notes here. Double-click to edit.*

#### c.
How much of the variability in  $\mathit{Price2014}$ is explained by the _multiple_ linear regression model with both $\mathit{SquareFeet}$ and $\mathit{Distance}$ as predictors?

*Write your notes here. Double-click to edit.*

## The adjusted coefficient of determination (adjusted $R^2$) 

- Adding an additional predictor to a multiple linear regression model can _never_ decrease the percentage of variability explained by that model (assuming we are using the same data)

- __But: is the increase in explained variability due to important new information in the new predictor, or is it just due to chance?__

- One way to answer this question is to do a $t$-test for the new predictor's coefficient, like we did in the previous lesson

- Another less formal option is to look at the __adjusted $R^2$__, denoted by $R_{adj}^2$

- The __adjusted coefficient of determination $R_{adj}^2$__ accounts for
    
    1. the amount of variability explained by the model, __and__
    
    2. the number of predictors
    
- It includes a _penalty_ for additional predictors, so an extra predictor has to help _enough_ in order to raise $R_{adj}^2$

- $R_{adj}^2$ can go _up or down_ when a new predictor is included

- Therefore, $R_{adj}^2$ is a good way to capture models with different numbers of predictors
    - It's most useful to quickly compare lots of models

- Formula:
    $$R_{adj}^2 = 1 - \frac{\hat{\sigma}_{\varepsilon}^2}{s_{Y}^2} = 1 - \frac{ \mathit{SSE} \,/\, (n - (k + 1)) }{ \mathit{SSTotal} \,/\, (n - 1) } $$
    
    - $s_{Y}^2$ is the sample variance of $Y$

### Example 2

Continuing with Example 1...

#### a.
For our two predictor model of house prices, what is $R^2_{adj}$? Look for "Adjusted R-squared" in the Example 1 output.

*Write your answer here. Double-click to edit.*

#### b.
What is $R_{adj}^2$ for the simple linear regression model using only $\mathit{SquareFeet}$ as a predictor of $\mathit{Price2014}$?

*Write your answer here. Double-click to edit.*

#### c.
Based on your answers above, which model seems to be better?

*Write your answer here. Double-click to edit.*