book/05-regression-b.Rmd

# Simple Linear Regression

Simple Linear Regression is an analytical method that looks to model the relationship between an outcome variable and **one** explanatory predictor variables. For example, thinking about the data that we used for the Pearson Correlation analysis in this book, say instead of asking is HeadSize and IQ related, we could ask can you reliably predict IQ scores (our outcome or dependent variable) from HeadSize measurements (our predictor or independent variable), with the hypothesis of, "We predict that a linear model based on head size, as measured in cms, will significantly predict IQ scores as measured on a standard IQ test". We are going to look at this example to walk through some of the analysis but lets first remind ourselves of the data.

```{r we, message=FALSE, warning=FALSE, echo=FALSE}
dat_o <- tibble(Participants = 1:10,
              HeadSize = c(50.8, 63.5, 45.7, 25.4, 29.2, 49.5, 38.1, 30.5, 35.6, 58.4),
              IQ = c(107, 121, 106, 72, 85, 105, 93, 88, 97, 123))

m_x <- mean(dat_o$HeadSize) %>% round2(2)
m_y <- mean(dat_o$IQ) %>% round2(2)
sd_x <- sd(dat_o$HeadSize) %>% round2(2)
sd_y <- sd(dat_o$IQ) %>% round2(2)
cov_xy <- cov(dat_o$HeadSize, dat_o$IQ) %>% round2(2)
r_xy <- cor(dat_o$HeadSize, dat_o$IQ) %>% round2(3)
slope <- (r_xy * (sd_y/sd_x)) %>% round2(3)
intercept <- (m_y - slope * m_x) %>% round2(2)
HSp <- 60.1
IQp <- (intercept + slope * HSp) %>% round2(2)
HSp2 <- 59.4
IQp2 <- (intercept + slope * HSp2) %>% round2(2)
R2 <- (r_xy^2) %>% round2(3)

mod <- lm(IQ ~ HeadSize, data = dat_o) %>% anova() %>% tidy()
mod_df1 <- mod %>% filter(term == "HeadSize") %>% pull(df)
mod_df2 <- mod %>% filter(term == "Residuals") %>% pull(df)
mod_F <- mod %>% filter(term == "HeadSize") %>% pull(statistic) %>% round2(2)
mod_p <- ifelse(mod %>% filter(term == "HeadSize") %>% pull(p.value) < .001, 
                "< .001", 
                paste0("= ", mod %>% filter(term == "HeadSize") %>% pull(p.value) %>% round2(3)))


knitr::kable(dat_o, align = "c")
```

Now in order to determine how well Head Size predicts IQ we need to know the underlying formula for this type of analysis and use that to calculate elements such as the slope of the relationship and the intercept. You might remember from previous years that a line in a two-dimensional coordinate system (i.e. two dimensional space made of variables X and Y) can be described by a linear equation (model) of the form:

$$Y = b_{0} + b_{1}X + error$$
 
Where:

* $Y$ is the value on the outcome variable Y 
* $X$ is the value on the predictor variable X
* $b_{0}$ is the intercept (or regression constant) – pronounced beta-zero - and stated as the value of $Y$ when $X = 0$ when X = 0
* $b_{1}$ is the slope (or regression coefficient) – pronounced beta-one - and stated as the change in $Y$ associated with a one-unit increase in $X$
* $error$ is a measure of your residuals which are the difference between an actual value and a predicted value

Before getting into the full simple linear regression however we will look at making predictions from this linear model formula using the slope and the intercept, based on the following equations.

**the slope:**

$$b_{1} = \frac{cov_{(x, y)}}{s^2_{x}}$$

Or alternatively it can be written as,

$$b_{1} = r_{(x,y)} \times \frac{s_{y}}{s_{x}}$$

**the intercept**

$$b_{0} = \overline{y} - b_{1} \times \overline{x}$$

So lets spend some time looking at those two equations in turn.

## The slope:

As above, the formula for the slope is:

$$b_{1} = \frac{cov_{(x, y)}}{s^2_{x}}$$

which is stated as the covariance of x and y ($cov_{(x,y)}$) divided by the variance of x ($s^2_{x}$). If we translate that into outcome and predictors then, given our Head Size is the predictor ($x$) and the IQ is the outcome ($y$), that becomes:

$$b_{1} = \frac{cov_{(HeadSize, IQ)}}{s^2_{HeadSize}}$$

But this can also be written in terms of the correlation between the two variables as opposed to the covariance, and this would look like:

$$b_{1} = r_{(x,y)} \times \frac{s_{y}}{s_{x}}$$

Which is stated as the correlation between x and y ($r_{(x,y)}$) multiplied by the standard deviation of y ($s_{y}$) divided by the standard deviation of x ($s_{x}$). [Remember that $s$ is the standard deviation of a variable, and $s^2$ is the variance of the variable.]

And again we can translate that back into terms of HeadSize ($x$) and IQ ($y$) as:

$$b_{1} = r \times \frac{s_{IQ}}{s_{HeadSize}}$$

Now if we think back to the correlation chapter we actually do know all the values we need, or at least can calculate them. Here is what we know from the correlation chapter:

* $cov_{(x,y)}$ = `r cov_xy`
* $s_{HeadSize}$ = `r sd_x`
* $s_{IQ}$ = `r sd_y`
* $r_{(x,y)}$ = `r r_xy`

And from that we can calculate the variance of HeadSize ($s^2_{HeadSize}$) and IQ ($s^2_{IQ}$), based on $s^2 = s \times s$, meaning:

* $s^2_{HeadSize}$ = $`r sd_x` \times `r sd_x`$ = `r sd_x^2`
* $s^2_{IQ}$ = $`r sd_y` \times `r sd_y`$ = `r sd_y^2`

So we could really use either formula for the slope as we have all the information, but for now let's use:

$$b_{1} = r \times \frac{s_{IQ}}{s_{HeadSize}}$$

And if we start to feed in the values, remembering that the IQ is ($y$ - the outcome) and HeadSize is ($x$ - the predictor) we get:

$$b_{1} = `r r_xy` \times \frac{`r sd_y`}{`r sd_x`}$$

And then if we sort out the fraction first, that becomes:

$$b_{1} = `r r_xy` \times `r sd_y/sd_x`$$

Which leads down to:

$$b_{1} = `r r_xy * (sd_y/sd_x)`$$

Giving a slope of $b_{1}$ = `r slope`, to three decimal places, meaning that for a 1 unit change in $x$ we get a `r slope` unit change in $y$. Or in other words, for a 1 unit change in HeadSize we get a `r slope` unit change in IQ.

## The intercept

Great. So now we know the slope of the regression line between HeadSize and IQ, what we need next is the intercept.  The formula for the intercept, as above, is:

$$b_{0} = \overline{y} - b_{1} \times \overline{x}$$

Which is stated as the mean of y ($\overline{y}$ - in this case IQ, our outcome) minus the slope ($b_{1}$) multiplied by the mean of x ($\overline{x}$ - in this case HeadSize, our predictor).

Looking back at the correlation chapter, and above, we know:

* $b_{1}$ = `r slope`
* $\overline{x}$ = `r m_x`
* $\overline{y}$ = `r m_y`

And if we substitute those values into the formula we get:

$$b_{0} = `r m_y` - `r slope` \times `r m_x`$$

Remembering BODMAS we need to do the multiplication first, which gives us:

$$b_{0} = `r m_y` - `r slope * m_x`$$

And we then complete the formula by doing the subtraction giving us:

$$b_{0} = `r m_y - slope * m_x`$$

Meaning that we have an intercept of $b_{0}$ = `r intercept`, to two decimal places. And if we remember that the intercept is the value of $y$ when $x$ is zero, then this tells us that when $x$ = 0, the value of $y$ is $y$ = `r intercept`. Or stated in terms of HeadSize and IQ, when Headsize is 0, then IQ is `r intercept`.  

## Making a Prediction

Now based on the information we have gained above we can actually use that to make predictions about a person we have not measured based on the formula:

$$\hat{Y} = b_{0} + b_{1}X$$

Well technically it is 

$$\hat{Y} = b_{0} + b_{1}X + error$$

But we will disregard the error for now. Also you might have noticed the small hat about the $Y$ making it $\hat{Y}$ (pronounced Y-hat). This means we are making a prediction of Y ($\hat{Y}$) as opposed to an actually measured (i.e. observed) value ($Y$).

Now let's say we want to make a prediction about someone who has a HeadSize of 60.1. Then, using the information above, we know:

* the slope, $b_{1}$ = `r slope`
* the intercept, $b_{0}$ = `r intercept`
* and the HeadSize, $X$ = `r HSp` 

If we put all that into the formula we would get:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp`$$

And if we start to work that through, dealing with the multiplication first, we see:

$$\hat{Y} = `r intercept` + `r slope * HSp`$$

Which then becomes:

$$\hat{Y} = `r intercept + slope * HSp`$$

Giving a predicted value of $\hat{Y}$ = `r IQp`, to two decimal places. Meaning that for a participant with a HeadSize of `r HSp` we would predict an IQ of `r IQp`. And you can use the same approach for any number of predictions with this model. Say you measured someone with a HeadSize of `r HSp2` cm, then they would be:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp2` = `r intercept + slope * HSp2`$$

Meaning that for a participant with a HeadSize of `r HSp2` we would predict an IQ of `r IQp2`.

## The R-squared 

Obviously the question now is how good is our model at making these predictions, and one measure of that is $R^2$ (pronounced R-squared). This relates to the $error$ term that we saw in the original formula, i.e. $\hat{Y} = b_{0} + b_{1}X + error$, and as we said at the start it relates to the residuals. The residuals you might know are the difference between predicted values and observed values. For example, if we look at Participant 1 in our data we see they had a HeadSize of `r dat_o %>% slice(1) %>% pull(HeadSize)` and an IQ of `r dat_o %>% slice(1) %>% pull(IQ)`.  Those are our observed values of $X$ and $Y$ for that participant, but what if we predicted a value for that participant instead ($\hat{Y}$). Using the above formula we would see:

$$\hat{Y} = `r intercept` + `r slope` \times `r dat_o %>% slice(1) %>% pull(HeadSize)` = `r intercept + slope * (dat_o %>% slice(1) %>% pull(HeadSize))`$$

So we would predict for Participant 1, with a HeadSize of `r dat_o %>% slice(1) %>% pull(HeadSize)`, to have an IQ of `r intercept + slope * (dat_o %>% slice(1) %>% pull(HeadSize))`. And if we compare that to the actually observed value for that participant ($Y = `r dat_o %>% slice(1) %>% pull(IQ)`$) we see a difference of `r abs((intercept + slope * (dat_o %>% slice(1) %>% pull(HeadSize))) - (dat_o %>% slice(1) %>% pull(IQ)))` IQ points (i.e. $\hat{Y} - Y = `r abs((intercept + slope * (dat_o %>% slice(1) %>% pull(HeadSize))) - (dat_o %>% slice(1) %>% pull(IQ)))`$). And that difference is our residuals for that person - the left over part between the observed and the predicted. If we were then to check that for all the observed participants we would get an overall value that summarises how big or small our residuals are, and the smaller the residuals are, the better the model is at predicting values of $Y$ from $X$, because there is little difference between our predicted values and our observed values. And that is what our $R^2$ tells us. It is a measure of how good our model is at prediction, and a measure of how small our residuals are. $R^2$ can take values between 0 and 1 and simply put, the closer $R^2$ is to 1, the smaller our residuals and the better our model is at prediction.

Now there is a full approach to calculating the $R^2$ which we will show you in a little bit, but when you are dealing with simple linear regression as we are here, there is a quick approach that gives the same answer, and it is:

$$R^2 = r_{(x,y)} \times r_{(x,y)}$$

where $r_{(x,y)}$ is the correlation between the two variables, in this case Headsize ($x$) and IQ ($y$). And if we look above we see we already know the correlation between HeadSize and IQ, as we calculated it in the correlation chapter. The correlation between HeadSize and IQ is $r_{(HeadSize, IQ)} = `r r_xy`$. And if we put that into our formula for $R^2$ we see:

$$R^2 = `r r_xy` \times `r r_xy` = `r r_xy * r_xy`$$

Meaning that we have an $R^2$ of $R^2 = `r R2`$, to three decimal places. And given that that is quite close to an $R^2$ of 1, it would suggest that our model is very good at predicting IQ from HeadSize (Y from X), that our residuals are small, and that there is a strong positive relationship between the two variables.

## The Write-up

Ok great, so we have now got a lot of information on our model including the slope, the intercept, and the $R^2$. The one thing we don't actually know is where our model is significant. We will go into what that means and how to determine that in a later part, but in the meantime we will just tell you that the model is significant, (F(`r mod_df1`, `r mod_df2`) = `r mod_F`, p `r mod_p`). So now we have everything we need for the write up:

* The model is significant - F(`r mod_df1`, `r mod_df2`) = `r mod_F`, p `r mod_p`
* The $R^2$ = `r R2` is very high

The only other element we need is the coefficent - the relationship between the two variables - which can take two versions; unstandardised and standardised.  But, we actually do already know these. We just called them something else:

* what we call **the slope**, above, is the **unstandardised coefficient** and is written as $b$ = `r slope`
* what we called **the correlation** between the variables is actually the **standardised coefficient** and is written as $\beta$ = `r r_xy`

And it is totally acceptable to write-up either the standardised coefficient ($\beta$) or the unstandardised coefficient ($b$). The main thing is to make sure you use the right symbol with the right value and don't mix them up as that will mislead a reader.

Now if we take all the information we have, including some information from the correlation chapter, we can put it into a blurb as such:

**A team of researchers were interested in the relationship between Head Size and IQ, and specifically whether they could predict IQ from Head Size. The researchers set out the hypothesis that a linear model based on head size, as measured in cms, will significantly predict IQ scores as measured on a standard IQ test. Descriptive analysis suggested a strong positive relationship between Head Size (M = `r m_x`, SD = `r sd_x`) and IQ (M = `r m_y`, SD = `r sd_y`), (r(`r mod_df2`) = `r r_xy`). On analysis, a linear regression model revealed that head size (measured in cm) significantly predicted participant scores on an IQ test ($\beta$ = `r r_xy`, F(`r mod_df1`, `r mod_df2`) = `r mod_F`, p `r mod_p`, $R^2$ = `r R2`), with head size explaining `r R2*100`% of the variance in IQ scores. As such the alternative hypothesis was accepted suggesting that...**

But I guess it might look a bit odd as we give the standardised coeffiencient twice, just in different ways, so we could write it with the unstandardised coefficient as:

**A team of researchers were interested in the relationship between Head Size and IQ, and specifically whether they could predict IQ from Head Size. The researchers set out the hypothesis that a linear model based on head size, as measured in cms, will significantly predict IQ scores as measured on a standard IQ test. Descriptive analysis suggested a strong positive relationship between Head Size (M = `r m_x`, SD = `r sd_x`) and IQ (M = `r m_y`, SD = `r sd_y`), (r(`r mod_df2`) = `r r_xy`), with head size explaining `r R2*100`% of the variance in IQ scores. On analysis, a linear regression model revealed that head size (measured in cm) significantly predicted participant scores on an IQ test ($b$ = `r slope`, F(`r mod_df1`, `r mod_df2`) = `r mod_F`, p `r mod_p`, $R^2$ = `r R2`). As such the alternative hypothesis was accepted suggesting that...**

## Test Yourself

Here we are going to try out a few examples based on the above. First we will use the above numbers we have calculated to make some predictions to check our understanding. So, assuming the values of `r slope` for the slope, and `r intercept` for the intercept, try to answer the following questions:

```{r ty1, message=FALSE, warning=FALSE, echo=FALSE}
dat_o <- tibble(Participants = 1:10,
              HeadSize = c(50.8, 63.5, 45.7, 25.4, 29.2, 49.5, 38.1, 30.5, 35.6, 58.4),
              IQ = c(107, 121, 106, 72, 85, 105, 93, 88, 97, 123))

m_x <- mean(dat_o$HeadSize) %>% round2(2)
m_y <- mean(dat_o$IQ) %>% round2(2)
sd_x <- sd(dat_o$HeadSize) %>% round2(2)
sd_y <- sd(dat_o$IQ) %>% round2(2)
cov_xy <- cov(dat_o$HeadSize, dat_o$IQ) %>% round2(2)
r_xy <- cor(dat_o$HeadSize, dat_o$IQ) %>% round2(3)
slope <- (r_xy * (sd_y/sd_x)) %>% round2(3)
intercept <- (m_y - slope * m_x) %>% round2(2)
HSp <- 60.1
IQp <- (intercept + slope * HSp) %>% round2(2)
HSp2 <- 59.4
IQp2 <- (intercept + slope * HSp2) %>% round2(2)
HSp3 <- 37.8
IQp3 <- (intercept + slope * HSp3) %>% round2(2)
R2 <- (r_xy^2) %>% round2(3)

mod <- lm(IQ ~ HeadSize, data = dat_o) %>% anova() %>% tidy()
mod_df1 <- mod %>% filter(term == "HeadSize") %>% pull(df)
mod_df2 <- mod %>% filter(term == "Residuals") %>% pull(df)
mod_F <- mod %>% filter(term == "HeadSize") %>% pull(statistic) %>% round2(2)
mod_p <- ifelse(mod %>% filter(term == "HeadSize") %>% pull(p.value) < .001, 
                "< .001", 
                paste0("= ", mod %>% filter(term == "HeadSize") %>% pull(p.value) %>% round2(3)))
```

* To two decimal places, what would be the predicted value of IQ if the HeadSize was `r HSp3`? `r longmcq(sample(c(answer = IQp3, IQp2, IQp)))`

`r hide("Show the working")`

If the HeadSize is `r HSp3` and the slope is `r slope` and intercept is `r intercept`, then using the formula:

$$\hat{Y} = b_{0} + b_{1}X + error$$
If we fill in the details:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp3`$$

And if we start to work that through, dealing with the multiplication first, we see:

$$\hat{Y} = `r intercept` + `r slope * HSp3`$$

Which then becomes:

$$\hat{Y} = `r intercept + slope * HSp3`$$

Giving a predicted value of $\hat{Y}$ = `r IQp3`, to two decimal places.

`r unhide()`

```{r ty2, message=FALSE, warning=FALSE, echo=FALSE}
HSp4 <- 24.2
IQp4 <- (intercept + slope * HSp4) %>% round2(2)
HSp5 <- 52.9
IQp5 <- (intercept + slope * HSp5) %>% round2(2)
HSp6 <- 48.4
IQp6 <- (intercept + slope * HSp6) %>% round2(2)

```

* To two decimal places, what would be the predicted value of IQ if the HeadSize was `r HSp4`? `r longmcq(sample(c(answer = IQp4, IQp2, IQp3)))`

`r hide("Show the working")`

If the HeadSize is `r HSp4` and the slope is `r slope` and intercept is `r intercept`, then using the formula:

$$\hat{Y} = b_{0} + b_{1}X + error$$
If we fill in the details:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp4`$$

And if we start to work that through, dealing with the multiplication first, we see:

$$\hat{Y} = `r intercept` + `r slope * HSp4`$$

Which then becomes:

$$\hat{Y} = `r intercept + slope * HSp4`$$

Giving a predicted value of $\hat{Y}$ = `r IQp4`, to two decimal places.

`r unhide()`


* To two decimal places, what would be the predicted value of IQ if the HeadSize was `r HSp5`? `r longmcq(sample(c(answer = IQp5, IQp3, IQp2)))`

`r hide("Show the working")`

If the HeadSize is `r HSp5` and the slope is `r slope` and intercept is `r intercept`, then using the formula:

$$\hat{Y} = b_{0} + b_{1}X + error$$
If we fill in the details:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp5`$$

And if we start to work that through, dealing with the multiplication first, we see:

$$\hat{Y} = `r intercept` + `r slope * HSp5`$$

Which then becomes:

$$\hat{Y} = `r intercept + slope * HSp5`$$

Giving a predicted value of $\hat{Y}$ = `r IQp5`, to two decimal places.

`r unhide()`

* To two decimal places, what would be the predicted value of IQ if the HeadSize was `r HSp6`? `r longmcq(sample(c(answer = IQp6, IQp2, IQp4)))`

`r hide("Show the working")`

If the HeadSize is `r HSp6` and the slope is `r slope` and intercept is `r intercept`, then using the formula:

$$\hat{Y} = b_{0} + b_{1}X + error$$
If we fill in the details:

$$\hat{Y} = `r intercept` + `r slope` \times `r HSp6`$$

And if we start to work that through, dealing with the multiplication first, we see:

$$\hat{Y} = `r intercept` + `r slope * HSp6`$$

Which then becomes:

$$\hat{Y} = `r intercept + slope * HSp6`$$

Giving a predicted value of $\hat{Y}$ = `r IQp6`, to two decimal places.

`r unhide()`