<a href="https://colab.research.google.com/github/lustraka/Data_Analysis_Workouts/blob/main/Introduction_to_Statistical_Learning/ISL03_Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Linear Regression
## Simple Linear Regression
**Linear Model**. A linear regression assumes that there is approximately a linear relationship between $X$ and $Y$. In other words, we are *regressing* $Y$ *on* $X$ (or $Y$ *onto* $X$) and write:
$$Y \approx \beta_0+\beta_1X, \qquad (1)$$
where $\beta_0$ and $\beta_1$ are two unknown constants that represent the *intercept* and *slope*, together the model *coefficients* or *parameters*. 

**Least Squares Line**. Once we have used our training data to produce estimates $\hat{\beta}_0$ and $\hat{\beta}_1$ for the model coefficients, we can compute:
$$\hat{y} = \hat{\beta}_0+\hat{\beta}_1x, \qquad (2)$$
where $\hat{y}$ indicates a prediction of $Y$ on the basis of $X=x$.

**Residual Sum of Squares (RSS)**. Let $\hat{y_i} = \hat{\beta}_0+\hat{\beta}_1x_i$ be the prediction for $X$ based on th $ith$ value of $X$. Then $e_i = y_i - \hat{y}_i$ represents the $ith$ *residual* - this is the difference between the $ith$ observed response value and the $ith$ response value that is predicted by our linear model. We define the *residual sum of squares* (RSS) as
$$RSS=e_1^2+e_2^2+...+e_n^2,$$
or equivalently as
$$RSS=(y_1-\hat{\beta}_0-\hat{\beta}_1x_1)^2+(y_2-\hat{\beta}_0-\hat{\beta}_1x_2)^2+...+(y_n-\hat{\beta}_0-\hat{\beta}_1x_n)^2. \qquad (3)$$

**Least Squares Regression Coefficent Estimates**. The least squares approach chooses $\hat{\beta}_0$ and $\hat{\beta}_1$ to minimize RSS:

$$\hat{\beta}_1=\frac{\sum_{i-1}^n(x_i-\bar{x}_i)(y_i-\bar{y}_i)}{\sum_{i-1}^n(x_i-\bar{x}_i)^2}, \quad \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}, \qquad (4)$$

where $\bar{y}\equiv\frac{1}{n}\sum_{i-1}^ny_i$ and $\bar{x}\equiv\frac{1}{n}\sum_{i-1}^nx_i$ are the sample means. In other words, $(4)$ defines the *least squares coefficient estimates* for simple linear regression.



### Assesing the Accuaracy of the Coefficient Estimates
**Population Regression Line**. If $f$ is to be approximated by a linear function, then we can write this relationship as
$$Y = \beta_0+\beta_1X+\epsilon. \qquad (5)$$
Here $\beta_0$ is the intercept term - that is, the expected value of $Y$ when $X=0$ and $\beta_1$ is the slope - the average increase in $Y$ associated with a one-unit increase in $X$. The error term is a catch-all for what we miss with this simple model: the true relationship is probably not linear, there may be other variables that cause variation in $Y$, and there may be measurement error. We typically assume that the error term in independent of $X$.

**Example**.
$$Y=2 + 3X + \epsilon \qquad (6)$$

**Standard Error of $\hat{\mu}$**.
$$\text{Var}(\hat{\mu})=\text{SE}(\hat{\mu})^2=\frac{\sigma^2}{n}, \qquad (7)$$
where $\sigma$ is the standard deviation of each of the realizations $y_i$ of $Y$ (provided that the $n$ observations are uncorrelated).

**Standard Errors associated with $\hat{\beta}_0$ and $\hat{\beta}_1$**.
$$\text{SE}(\hat{\beta}_0)^2=\sigma^2\big[\frac{1}{n}+\frac{\bar{x}^2}{\sum_{i-1}^n(x_i-\bar{x})^2}\big], \quad \text{SE}(\hat{\beta}_1)^2=\frac{\sigma^2}{\sum_{i-1}^n(x_i-\bar{x})^2}, \qquad (8)$$

$$\hat{\beta}_1 \pm 2 \cdot \text{SE}(\hat{\beta}_1) \qquad (9)$$

$$\big[\hat{\beta}_1-2\cdot\text{SE}(\hat{\beta}_1), \hat{\beta}_1+2\cdot\text{SE}(\hat{\beta}_1) \big] \qquad (10)$$

$$\hat{\beta}_0 \pm 2 \cdot \text{SE}(\hat{\beta}_0) \qquad (11)$$

**Null Hypothesis**
$$H_0 : \text{There is no relationship between } X \text{ and } Y. \qquad (12)$$

**Alternative Hypothesis**
$$H_1 : \text{There is some relationship between } X \text{ and } Y. \qquad (13)$$

**t-statistics**
$$t=\frac{\hat{\beta}_1-0}{\text{SE}(\hat{\beta}_1)}, \qquad (14)$$

**p-value**

### Assessing the Accuracy of the Model

**Residual Standard Error**
$$\text{RSE}=... \qquad(15)$$
$$\text{RSS}=... \qquad(16)$$
**$R^2$ Statistic**
$$R^2=... \qquad (17)$$
$$\text{Cor}(X,Y)=... \qquad(18)$$


## Multiple Linear Regression
Instead of fitting a separate simple linear regression model for each predictor, a better approach is to extend the simple linear regression model $(5)$ so that it can directly accomodate multiple predictors. We can do this by giving each predictor a separate slope coefficient in a single model. In general, suppose that we have $p$ distinct predictors. Then the multiple linear regression model takes the form
$$Y=\beta_0 +\beta_1X_1 +\beta_2X_2 +...+ \beta_pX_p + \epsilon, \qquad (19)$$
where $X_j$ represents the $jth$ predictor and $\beta_j$ quantifies the association between that variable and the response. We interpret $\beta_j$ as the average effect on $Y$ of a one unit increase in $X_j$, *holding all other predictor fixed*.


**Example**
$$\text{Sales}= \beta_0 + \beta_{TV}\times\text{TV} + \beta_{Radio}\times\text{Radio} + \beta_{Newpaper}\times\text{Newspaper} + \epsilon\qquad(20)$$

### Estimating the Regression Coefficients
As was the case in the simple linear regression setting, the regression coefficients $\beta_0,\beta_1,...,\beta_p$ in $(19)$ are unknown, and must be estimated. Given estimates $\hat{\beta_0},\hat{\beta_1},...,\hat{\beta_p}$, we can make predictions using the formula
$$\hat{y}=\hat{\beta_0} + \hat{\beta_1}x_1+\hat{\beta_2}x_2+...+\hat{\beta_p}X_p.\qquad (21)$$

The parameters are estimated using the same least squares approach that we saw in the context of simple linear regression. We choose $\hat{\beta_0},\hat{\beta_1},...,\hat{\beta_p}$ to minimize the sum of squared residals
$$\text{RSS}= \sum_{i=1}^n(y_i-\bar{y}_i)^2 = \sum_{i=1}^n(y_i-\hat{\beta}_0 -\hat{\beta}_1x_{i1} -\hat{\beta}_2x_{i2} - ... -\hat{\beta}_px_{ip} )^2\qquad (22)$$

### Some Important Questions
#### One: Is There a Relationship Between the Response and Predictors?
**F-statistics**
$$F=...\qquad (23)$$
$$F=...\qquad (24)$$

#### Two: Deciding on Important Variables

#### Three: Model Fit
$$\text{RSE}= \qquad (25)$$

#### Four: Prediction

## Other Consideration in the Regression Model
### Qualitative Predictors
**Predictors with Only Two Levels**.
$$x_i
  \begin{cases}
  1 & \text{if }ith\text{ person owns a house} \\
  0 & \text{if }ith\text{ person does not owns a house}
  \end{cases}, \qquad (26)$$

$$y_i= \qquad (27)$$

**Qualitative Predictors with More than Two Levels**.
$$x_{i1}=...\qquad (28)$$
$$x_{i2}=...\qquad (29)$$
$$y_i= \qquad (30)$$

### Extensions of the Linear Model
#### Removing the Additive Assumption
$$Y=...\qquad (31)$$
$$Y=...\qquad (32)$$

**Example**
$$\text{sales}=...\qquad (33)$$

**Example**
$$\text{balance}_i\approx...\qquad (34)$$

**Example**
$$\text{balance}_i\approx...\qquad (35)$$

#### Non-linear Relationships
**Example**
$$\text{mpg}=...\qquad (36)$$


### Potential Problems
1. Non-linearity of the Data
2. Correlation of Error Terms
3. Non-constant Variance of Error Terms
4. Outliers
5. High Leverage Points
$$h_i=...\qquad (37)$$
6. Collinearity

## Comparison of Linear Regression with $K$-Nearest Neighbors