# Simple Linear Regression


The most common reason that we want produce a simple linear regression model is when we want to represent a relationship between some outcome variable and input variable(s). We may do this to predict the outcome variable based on the input variables (prediction) or understand the relationship between the outcome variable and the input variables (inference).

We assume that each $i$-th observation can be represented as: 

$$y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i}$$

We can see that this relationship is linear with respect to the coefficients ($\beta_{j}$ terms).

The first part, $\beta_{0} + \beta_{1} x_{i}$, gives a general linear relationship that describes the relationship between $y$ and $x$ for all observations. The error term $\epsilon_{i}$ allows for there to be variation from the linear trend. Our prediction for the $i$-th observation would be ${\hat y}_{i} = \beta_{0} + \beta_{1} x_{i}$.

The most basic assumptions for a linear regression are: $\epsilon_{i}$ is random and the $x_{i}$ are fixed.

Generally, we also assume that the $\epsilon_{i}$ are IID and $\mathbb{E} (\epsilon_{i})=0$ and $Var(\epsilon_{i}) = \sigma^{2}$ for all $i$ observations.

Notice that in simple linear regression, we only have one predictor variable of interest.

Our predictions for our outcome variable will usually be higher, lower, or equal to the actual $y_{i}$ value and $y_{i} - {\hat y}_{i}$ is our error. Now, given our input and output variables, we want to find the $\beta_{0}$ and $\beta_{1}$ that minimize some cost/loss function, which quantifies how off our predictions are from the actual values. The two most common are the sum (or mean) of squared errors and the sum (or mean) of absolute errors.

Sum of squared errors:

$$\sum_{i=1}^{n} (y_{i} - {\hat y}_{i} )^{2} = \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} ) )^{2}$$


Sum of absolute errors:

$$\sum_{i=1}^{n} | (y_{i} - {\hat y}_{i}) |$$

Again, the goal is the choose $\beta_{0}$ and $\beta_{1}$ such that we minimize the loss/cost function (SSE or SAE).

If our cost function be the sum of squared errors, we can write ${\hat \beta_{0}} = {\arg\min}_{\beta_{0}} \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} ) )^{2}$ and ${\hat \beta_{1}} = \arg\min_{\beta_{1}} \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} ) )^{2}$

Using calculus, we can show that ${\hat \beta}_{0} = {\bar y} - {\hat \beta}_{1} {\bar x}$ and ${\hat \beta}_{1} = \frac{ (y_{i} - {\bar y})( x_{i} - {\bar x} ) }{ \sum_{i} (x_{i} - {\bar x})^{2} } = \frac{Cov(x, y)}{ Var(x)}$

### Example 1 

Show that ${\hat \beta}_{0} = {\bar y} - {\hat \beta}_{1} {\bar x}$ and ${\hat \beta}_{1} = \frac{ (y_{i} - {\bar y})( x_{i} - {\bar x} ) }{ \sum_{i} (x_{i} - {\bar x})^{2} } = \frac{Cov(x, y)}{ Var(x)}$


**Answer:** 

Let $S = \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} ) )^{2}$

Take the derivative of $S$ with respect to $\beta_{0}$ and $\beta_{1}$, set to 0, and solve for $\beta_{0}$ and $\beta_{1}$ respectively.

$$\frac{\partial S}{\partial \beta_{0}} = -2 \sum_{i=1}^{n} (y_{i} - \beta_{0} - \beta_{1} x_{i}) \overset{set}{=} 0$$

With some algebra, ${\beta}_{0} = {\bar y} - \beta_{1} {\bar x}$. Since we do not know the true $\beta_{1}$, we estimate with ${\hat \beta}_{1}$. Then, we have ${\hat \beta}_{0} = {\bar y} - {\hat \beta}_{1} {\bar x}$

$$\frac{\partial S}{\partial \beta_{1}} = -2 \sum_{i=1}^{n} x_{i}(y_{i} - \beta_{0} - \beta_{1} x_{i}) \overset{set}{=} 0$$

$$0=\sum_{i=1}^{n} x_{i} y_{i} - {\beta}_{0} \sum_{i=1}^{n} x_{i} - \beta_{1} \sum_{i=1}^{n} x_{i}^{2}$$

Using the previous condition, ${\beta}_{0} = {\bar y} - \beta_{1} {\bar x}$, substitute accordingly.

$$0=\sum_{i=1}^{n} x_{i} y_{i} -({\bar y} - \beta_{1} {\bar x}) \sum_{i=1}^{n} x_{i} - \beta_{1} \sum_{i=1}^{n} x_{i}^{2}$$

Rearranging terms, we find 

$${\hat \beta}_{1} = \frac{ \sum_{i=1}^{n} x_{i} y_{i} - {\bar y} \sum_{i=1}^{n} x_{i} }{ \sum_{i=1}^{n} x_{i}^{2} - {\bar x} \sum_{i=1}^{n} x_{i} } = \frac{ \sum_{ i=1 }^{n} ( x_{i} - {\bar x} )(y_{i} - {\bar y}) }{ \sum_{i=1}^{n} ( x_{i}- {\bar x})^{2}}$$


---

To get the top and bottom to look the same, notice that $\sum y_{i} = n{\bar y}$ and $\sum x_{i} = n{\bar x}$:

$$\sum_{ i=1 }^{n} ( x_{i} - {\bar x} )(y_{i} - {\bar y}) = \sum_{ i=1 }^{n} x_{i} y_{i} - \sum_{ i=1 }^{n} {\bar x} y_{i} - \sum_{ i=1 }^{n} {\bar y} x_{i} + n {\bar x} {\bar y} = \sum_{ i=1 }^{n} x_{i} y_{i}  - \sum_{ i=1 }^{n} {\bar y} x_{i} - n{\bar y} {\bar x}+n{\bar y} {\bar x}$$

You can also verify that $\sum_{i=1}^{n} x_{i}^{2} - {\bar x} \sum_{i=1}^{n} x_{i} = \sum_{i=1}^{n} ( x_{i}- {\bar x})^{2}$


---


Note: Using the sum of absolute errors (also L1 loss) is difficult since the L1 loss is not differentiable at the minimum. The sum of squared errors (also L2 loss) is easy since we can use calculus to find the minimizing $\beta_{0}$ and $\beta_{1}$.

### Example 2 


Show that ${\hat \beta}_{1} = {\hat \rho}_{x,y} \frac{ {\hat \sigma_{y}} }{ \hat \sigma_x }$ where ${\hat \rho}_{x,y}$ is the sample correlation, ${\hat \sigma_{y}}$ is the sample standard deviation of the $y$ values, and $\hat \sigma_x$ is the sample standard deviation of the $x$ values.

**Answer:** 

Since ${\hat \sigma}_{x}^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_{i} - {\bar x} )^{2}$,


$${\hat \rho}_{x,y} =\frac{ \sum_{i=1}^{n}  (x_{i} - {\bar x} )( y_{i} - {\bar y}) }{ (n-1) {\hat \sigma}_{x} {\hat \sigma}_{y} } = \frac{ \sum_{i=1}^{n}  (x_{i} - {\bar x} )( y_{i} - {\bar y}) }{ (n-1) {\hat \sigma}_{x} {\hat \sigma}_{y} } = \frac{ {\hat \beta}_{1} \sum_{i=1}^{n} (x_{i} - {\bar x} )^{2} }{ (n-1) {\hat \sigma}_{x} {\hat \sigma}_{y} } = \frac{ {\hat \beta}_{1} (n-1) {\hat \sigma}_{x}^{2}  }{ (n-1) {\hat \sigma}_{x} {\hat \sigma}_{y} }$$

$${\hat \beta}_{1} = {\hat \rho}_{x,y} \frac{\hat \sigma_{y}}{\hat \sigma_{x}}$$

### Example 3

a. Suppose that we have $y_{i} = \alpha + \epsilon_{i}$ where $i=1,..., n$ and $\epsilon_{i}$ are independent errors with $E[\epsilon_{i}] = 0$ and $Var(\epsilon_{i}) = \sigma^{2}$. We are estimating our $y$ values with no input variables. Find the least squares estimate of $\alpha$.


**Answer:** 

To find the least squares estimate, we want to find the following: $\hat \alpha = \arg\min_{\alpha} \sum_{i=1}^{n}  (y_{i} - \alpha)^{2}$

Simply differentiate with respect to $\alpha$, set to $0$, and solve for $\alpha$ to get ${\hat \alpha}$: 

$$-2 \sum_{i=1}^{n} (y_{i} - {\hat \alpha} ) = 0$$

$$\sum_{i=1}^{n} y_{i} = n {\hat \alpha}$$

$${\hat \alpha} = \sum_{i=1}^{n} y_{i} / n$$


b. Suppose that we have $y_{i} = \beta x_{i} + \epsilon_{i}$ where $i=1,..., n$ and $\epsilon_{i}$ are independent errors with $E[\epsilon_{i}] = 0$ and $Var(\epsilon_{i}) = \sigma^{2}$. We are estimating our $y$ values with an input variable and assuming that the intercept is $0$. Find the least squares estimate of $\beta$.


To find the least squares estimate, we want to find the following: $\hat \beta = \arg\min_{\beta} \sum_{i=1}^{n}  (y_{i} - \beta x_{i})^{2}$

Simply differentiate with respect to $\alpha$, set to $0$, and solve for $\alpha$ to get ${\hat \alpha}$: 

$$-2 \sum_{i=1}^{n} x_{i} (y_{i} - {\hat \beta} x_{i} ) = 0$$

$$\sum_{i=1}^{n} x_{i} (y_{i} - {\hat \beta} x_{i} ) = 0$$

$${\hat \beta} = \frac{ \sum_{i=1}^{n} x_{i} y_{i} }{ \sum_{i=1}^{n} x_{i}^2 }$$



