# Digging into Linear Regression - Part II
##### Author: Naveen Kaveti
##### Email: kaveti.naveenkumar@gmail.com

In [Digging into Linear Regression - Part I](https://kavetinaveen.github.io/Learning.github.io/) discussed about Simple Linear Regression with an example, math behind parameter estimation, stat behind generalizing results to population (variance of estimates, testing of significance of estimates), measuring goodness-of-fit to validate the model. If you're already aware of these topics please proceed to, otherwise I would suggest you to have a look at the part - I.

## Multiple Linear Regression (MLR)

In multiple linear regression we consider more than one predictor and one dependent variable. Most of the above explanation is valid for MLR too.

### Example: Car's MPG (Miles Per Gallon) prediction

Our interest is to model the MPG of a car based on the other variables.

Variable Description:

*	VOL = cubic feet of cab space 
*	HP = engine horsepower 
*	MPG = average miles per gallon 
*	SP = top speed, miles per hour 
*	WT = vehicle weight, hundreds of pounds

In [3]:
# Reading Boston housing prices data
car = read.csv("Cars.csv")
cat("Number of rows: ", nrow(car), "\n", "Number of variables: ", ncol(car), "\n")
head(car)

Number of rows:  81 
 Number of variables:  5 


HP,MPG,VOL,SP,WT
49,53.70068,89,104.1854,28.76206
55,50.0134,92,105.4613,30.46683
55,50.0134,92,105.4613,30.1936
70,45.69632,92,113.4613,30.63211
53,50.50423,92,104.4613,29.88915
70,45.69632,89,113.1854,29.59177


Our objective is to model the variation in `MPG` using other independent variables. That is,

$$MPG = \beta_0 + \beta_1 VOL + \beta_2 HP + \beta_3 SP + \beta_4 WT + \epsilon$$

Where, $\beta_1$ represents the amount of change in `MPG` per one unit change in `VOL` provided other variables are fixed. Let's consider below two cases,

**Case1:** HP = 49; VOL = 89; SP = 104.1854; WT = 28.76206 => MPG = 104.1854

**Case2:** HP = 49; VOL = 90; SP = 104.1854; WT = 28.76206 => MPG = 105.2453

then $\beta_1 = 105.2453 - 104.1854 = 1.0599$. Similarly, $\beta_2, \beta_3, \beta_4$

The above effect is also called as [`Ceteris Paribus Effect`](https://en.wikipedia.org/wiki/Ceteris_paribus).

**Intuitive example:** Assume, 4 people are working on a project and you want to model the project's performace metric based on individual competency scores (assume all the metrics are numeric). Objective is to find out the individual's contribution to the project.

But in real world it is very difficult to collect records in above manner. That's why we compute partial correlation coefficients to quantify the effect of one variable, keeping others constant.

In [4]:
# Let's build MLR model to predict MPG based using other variables
fit_mlr_actual = lm(MPG ~ ., data = car)
summary(fit_mlr_actual)


Call:
lm(formula = MPG ~ ., data = car)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.6320 -2.9944 -0.3705  2.2149 15.6179 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.67734   14.90030   2.059   0.0429 *  
HP          -0.20544    0.03922  -5.239  1.4e-06 ***
VOL         -0.33605    0.56864  -0.591   0.5563    
SP           0.39563    0.15826   2.500   0.0146 *  
WT           0.40057    1.69346   0.237   0.8136    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.488 on 76 degrees of freedom
Multiple R-squared:  0.7705,	Adjusted R-squared:  0.7585 
F-statistic:  63.8 on 4 and 76 DF,  p-value: < 2.2e-16


One key observation from above output is, Std. Error for `VOL` and `WT` is very huge comparing to others and this inflates `t values` and `p value`. Hence, these two variables becomes very insignificant for the model.

Let's go into deep, what happened to $Var(\hat{\beta_{VOL}})$ and $Var(\hat{\beta_{WT}})$?

Analogy for $Var(\hat{\beta})$ in MLR is as follows (already derived variance of coefficients in [part I](https://kavetinaveen.github.io/Learning.github.io/)):

$$Var(\hat{\beta_{VOL}}) = \frac{\sigma^2}{n\sum_{i=1}^n (VOL_i - \bar{VOL})^2 (1 - R_{VOL}^2)}$$

Where, $R_{VOL}^2$ = Multiple R-squared value obtained by regressing VOL on all other independent variables

** Intuitive Example:** Let's continue the above example, assume two individuals (say, $X_1$ and $X_2$) have almost same competency scores (for instance, two individuals are good at `Python` and `Data Science`) then finding out the individual's contribution becomes very difficult because both are trying to explain the same part of project's performance. In otherwords, changing $X_1$ one unit by keeping others constant is not possible because changing $X_1$ one unit automatically changes $X_2$ also. 

In [5]:
# Let's regress VOL on all other independent variables'
fit_mlr = lm(VOL ~ HP + SP + WT, data = car)
summary(fit_mlr)


Call:
lm(formula = VOL ~ HP + SP + WT, data = car)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5374 -0.7056 -0.1961  0.7140  1.7380 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.631671   2.916358   1.931   0.0572 .  
HP           0.009102   0.007791   1.168   0.2463    
SP          -0.036083   0.031449  -1.147   0.2548    
WT           2.975701   0.013563 219.396   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8994 on 77 degrees of freedom
Multiple R-squared:  0.9984,	Adjusted R-squared:  0.9984 
F-statistic: 1.637e+04 on 3 and 77 DF,  p-value: < 2.2e-16


It's surprising that, $R_{VOL}^2$ is 0.9984 and also only `WT` is significant. That is, these two predictors (`VOL` and `WT`) are highly correlated. This inflates $Var(\hat{\beta_{VOL}})$ and thus `t value`. We might be missing some of the important information because of high correlation between predictors. This problem is called as [Multicollinearity](https://en.wikipedia.org/wiki/Multicollinearity).

One quick solution for this problem is to remove either `VOL` or `WT` from the model. Let's compute partial correlation coeficient between `MPG` and `VOL` by removing the effect of `WT` (say, $r_{MV.W}$) and partial correlation coeficient between `MPG` and `WT` by removing the effect of `VOL` (say, $r_{MW.V}$).

To compute $r_{MV.W}$ we need to compute the correlation between (a) part of `VOL` which cannot be explained by `WT` (regress `VOL` on `WT` and take the residuals) and (b) the part of `MPG` which cannot be explained by `WT` (regress `MPG` on `WT` and take the residuals)

In [6]:
fit_partial = lm(VOL ~ WT, data = car)
fit_partial2 = lm(MPG ~ WT, data = car)
res1 = fit_partial$residual
res2 = fit_partial2$residual
cat("Partial correlation coefficient between MPG and VOL by removing the effect of WT is: ", cor(res1, res2))

Partial correlation coefficient between MPG and VOL by removing the effect of WT is:  -0.08008873

In [7]:
fit_partial3 = lm(WT ~ VOL, data = car)
fit_partial4 = lm(MPG ~ VOL, data = car)
res1 = fit_partial3$residual
res2 = fit_partial4$residual
cat("Partial correlation coefficient between MPG and WT by removing the effect of VOL is: ", cor(res1, res2))

Partial correlation coefficient between MPG and WT by removing the effect of VOL is:  0.05538241

Since, $abs(r_{MV.W}) >= abs(r_{MW.V})$ we may remove `WT` from the model.

In [9]:
# Remove WT and rerun the model
fit_mlr_actual2 = lm(MPG ~ .-WT, data = car)
summary(fit_mlr_actual2)


Call:
lm(formula = MPG ~ . - WT, data = car)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.5869 -2.8942 -0.3157  2.1291 15.6669 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.92339   14.46589   2.069   0.0419 *  
HP          -0.20670    0.03861  -5.353 8.64e-07 ***
VOL         -0.20165    0.02259  -8.928 1.65e-13 ***
SP           0.40066    0.15586   2.571   0.0121 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.46 on 77 degrees of freedom
Multiple R-squared:  0.7704,	Adjusted R-squared:  0.7614 
F-statistic: 86.11 on 3 and 77 DF,  p-value: < 2.2e-16


After eliminating `WT` from the model there is an increment of ~0.3% in Adjusted R-squared and more importantly, `VOL` becomes significant at 0 [los](https://en.wikipedia.org/wiki/Statistical_significance) (level of significance)

## Assumptions

**Linear in Parameters:** We assume that there is a linear relation between dependent and set of independent variables

**Zero conditional mean:** $E(\epsilon \mid X) = 0$

**Homoskedasticity:** $Var(\epsilon \mid X) = \sigma^2$ (Constant)

**No perfect Collinearity:** All predecitors must be independent among themselves

**No serial correlation in errors:** Erros must be uncorrelated among themselves. In otherwords, observations or records must be independent of each other.

I will try to elaborate *Heteroscedasticity* and *Multicollinearity* in more detail in my next blog. 

**If you want to deep dive into statistical models with examples -- **

<div style="margin-left:1em ; text-align: center;">

<a target="_blank"  href="https://www.amazon.com/gp/product/1111531048/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1111531048&linkCode=as2&tag=nkaveti-20&linkId=83f6e694209869322f8bfad406883d2f"><img border="0" src="//ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=1111531048&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=_SL250_&tag=nkaveti-20" ></a><img src="//ir-na.amazon-adsystem.com/e/ir?t=nkaveti-20&l=am2&o=1&a=1111531048" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />

</div>

**If you want to deep dive into Algebra -- **

<a target="_blank"  href="https://www.amazon.com/gp/product/8185931267/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=8185931267&linkCode=as2&tag=nkaveti-20&linkId=2cc070a6ffbaf79ed4ae5ea72f78c6db"><img border="0" src="//ws-na.amazon-adsystem.com/widgets/q?_encoding=UTF8&MarketPlace=US&ASIN=8185931267&ServiceVersion=20070822&ID=AsinImage&WS=1&Format=_SL250_&tag=nkaveti-20" ></a><img src="//ir-na.amazon-adsystem.com/e/ir?t=nkaveti-20&l=am2&o=1&a=8185931267" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />