# Introduction to Multiple Linear Regression

__Linear Regression Learning Objectives__ 
* Multiple Linear Regression Models
* Estimating Multiple Parameters

__Salmon Learning Objectives__ 
* Multiple Linear Regression Models

In [1]:
import pandas as pd
from salmon import *
%matplotlib inline
plt.style.use('seaborn')

## Multiple Linear Regression Model
Multiple linear regression models are extremely similar to our simple linear regression models. The only difference is that you are comparing the affects of two or more explanatory variables on a response variable. Let's take a look the multiple linear regression model to solidify this idea:
$$y_i = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_nx_n + \varepsilon_i$$
Where n is the number of explanatory variables.

### Estimating Parameters
To estimate our parameters, $\beta_0, \beta_1, ... \beta_n$, we will use least squares regression. The equation is almost identical to simple linear regression but it accounts for the new parameters. Remember, in least-squares regression, the aim is to minimize the _sum of squared errors_:
$$
\begin{align*}
\sum_{i=1}^n (\textrm{errors}_i)^2 &= \sum_{i=1}^n (y_i - \hat y_i)^2 \\
&= \sum_{i=1}^n (y_i - (b_0 + b_1 x_i + b_1 x_i ))^2
\end{align*}
$$

Now, we take the derivative of the equation above, with respect to $\beta_0, \beta_1, ... \beta_n$, set them equal to zero, and then solve for $\beta_0, \beta_1, ... \beta_n$. While it's good to know the underlining math behind this, we will use SALMON to calculate these parameters.

#### Estimating Housing Prices with  with SALMON
For this chapter, we will be looking at the AMES housing dataset. Let's find out how the total living area and the number of bedrooms affect the price of a house. First let's regress separately on total living area and number bedrooms.

In [16]:
ames = pd.read_csv("data/chapter4/AmesHousing.txt", sep="\t")
explanatory = Q("Bedroom AbvGr")
response = Q("SalePrice")
model = LinearModel(explanatory, response)
model.fit(ames)["Coefficients"]

Bedroom AbvGr     13889.495269
Intercept        141151.743026
Name: Coefficients, dtype: float64

In [18]:
explanatory = Q("Gr Liv Area")
response = Q("SalePrice")
model = LinearModel(explanatory, response)
model.fit(ames)["Coefficients"]

Gr Liv Area      111.694001
Intercept      13289.634365
Name: Coefficients, dtype: float64

From our simple linear regression chapters, we know that these models imply that for each single unit increase of a bedroom and living area, the housing price will increase roughly by \$13,889 and \$112. But, by just looking at these models independently, we are not taking into account the full effects of both variables. Now, let's regress on both living area and number of bedrooms. 

In [26]:
bedrooms = Q("Bedroom AbvGr")
living_area = Q("Gr Liv Area")
explanatory = bedrooms + living_area
response = Q("SalePrice")
model = LinearModel(explanatory, response)
model.fit(ames)["Coefficients"]

Gr Liv Area        136.360903
Bedroom AbvGr   -29149.109660
Intercept        59496.235786
Name: Coefficients, dtype: float64

Interestingly, we can see that the number of bedrooms has a negative effect on the housing price and the living area has a positive effect. To fully understand this model, we need to take a deeper look at it and fully interpret it.

### Interpreting Multiple Linear Regression Models
The coefficients of a multiple linear regression model, $\beta_0, \beta_1, ... \beta_n$, are sometimes called _partial regression coefficients_ because each coefficient is responsible for part of the effect on our response variable. It is important to note that these coefficients explain their effect on y when holding all other variables constant. This is why the number of bedrooms had a negative effect on the housing price. 
 <img src='images/chapter4/bed_houses.png'>
If we keep the living area constant but continually increase the number of bedrooms, the house would loose value because there wouldn't be any room for our spaces like a kitchen or a dining room. Conversly, if we keep the number of bedrooms constant but increase the living area the price increases.
<img src='images/chapter4/size_house.png'>
Intuitively, this makes sense because as the home becomes larger and the number of bedrooms is constant, there is more space for the homeowners. 

Formally, we can interpret this model in the following way:

$\beta_1$: Holding the number of bedrooms constant, the price of a house increases by \$136 for each single unit increase of living area.

$\beta_2$: Holding the living area constant, the price of a house decreases by $29,149 for each additional bedroom.