# Linear Regresson


## Linear Regression in Scikit-learn
Using linear regression to predict life expectancy from __body mass index (BMI)__. Before you do that, let's go over the tools required to build this model.
For the linear regression model, using scikit-learn's `LinearRegression` class. This class provides the function `fit()` to fit the model to the data.
```
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_values, y_values)
```
The `model` variable is a linear regression model that has been fitted to the data `x_values` and `y_values`. The fitted model will be used for prediction.
```
print(model.predict([ [127], [248] ]))
[[ 438.94308857, 127.14839521]]
```

### Quiz
Use the data "bmi_and_life_expectancy.csv" which includes the following:
* **Country** – The country the person was born in.
* **Life expectancy** – The average life expectancy at birth for a person in that country.
* **BMI** – The mean BMI of males in that country.

In [1]:
# TODO: Add import statements
import pandas as pd
from sklearn.linear_model import LinearRegression

In [3]:
# Assign the dataframe to this variable.
# TODO: Load the data
bmi_life_data = pd.read_csv("bmi_and_life_expectancy.csv")
bmi_life_data.head()

Unnamed: 0,Country,Life expectancy,BMI
0,Afghanistan,52.8,20.62058
1,Albania,76.8,26.44657
2,Algeria,75.5,24.5962
3,Andorra,84.6,27.63048
4,Angola,56.7,22.25083


In [11]:
x_values = bmi_life_data[['BMI']]
y_values = bmi_life_data[['Life expectancy']]

In [12]:
# Make and fit the linear regression model
#TODO: Fit the model and Assign it to bmi_life_model
bmi_life_model = LinearRegression()
bmi_life_model.fit(x_values, y_values)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [18]:
# Make a prediction using the model
# TODO: Predict life expectancy for a BMI value of 21.07931
laos_life_exp = bmi_life_model.predict([[21.07931]])
print (laos_life_exp)

[[60.31564716]]


## Multiple Linear Regression
For a $n$ predictor values, the linear regression equation is:
\begin{equation}
\hat{y} = m_1 x_1 + m_2 x_2 + \cdots + m_{n-1} x_{n-1} + m_n
\end{equation}
### Quiz
The following dataset has 13 feature as a predictor. Use them to predict the house price.

In [25]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

In [27]:
# Load the data from the boston house-prices dataset 
boston_data = load_boston()
x = boston_data['data']
y = boston_data['target']

In [28]:
# Make and fit the linear regression model
# TODO: Fit the model and assign it to the model variable
model = LinearRegression()
model.fit(x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [29]:
# Make a prediction using the model
sample_house = [[2.29690000e-01, 0.00000000e+00, 1.05900000e+01, 0.00000000e+00, 4.89000000e-01,
                6.32600000e+00, 5.25000000e+01, 4.35490000e+00, 4.00000000e+00, 2.77000000e+02,
                1.86000000e+01, 3.94870000e+02, 1.09700000e+01]]

In [31]:
# TODO: Predict housing price for the sample_house
prediction = model.predict(sample_house)
print (prediction)

[23.68284712]


## Issues With Linear Regression
* Linear Regression Works Best When the Data is Linear: either use another model or transform the data so it becomes linear.
* Linear Regression is Sensitive to Outliers

## Polynomial Regression
```
# TODO: Add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('data.csv')
X = train_data['Var_X'].values.reshape(-1, 1)
y = train_data['Var_Y'].values

# Create polynomial features
# TODO: Create a PolynomialFeatures object, then fit and transform the
# predictor feature
poly_feat = PolynomialFeatures(degree = 4)
X_poly = poly_feat.fit_transform(X)

# Make and fit the polynomial regression model
# TODO: Create a LinearRegression object and fit it to the polynomial predictor
# features
poly_model = LinearRegression(fit_intercept = False).fit(X_poly, y)

# Once you've completed all of the steps, select Test Run to see your model
# predictions against the data, or select Submit Answer to check if the degree
# of the polynomial features is the same as ours!
```

## Regularization
```
# TODO: Add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('data.csv', header = None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]

# TODO: Create the linear regression model with lasso regularization.
lasso_reg = Lasso()

# TODO: Fit the model.
lasso_reg.fit(X, y)

# TODO: Retrieve and print out the coefficients from the regression model.
reg_coef = lasso_reg.coef_
print(reg_coef)
```

## Recap
Recap
In this lesson, you were introduced to linear models. Specifically, you saw:

* Gradient descent as a method to optimize your linear models.
* Multiple Linear Regression as a technique for when you are comparing more than two variables.
* Polynomial Regression for relationships between variables that aren't linear.
* Regularization as a technique to assure that your models will not only fit to the data available, but also extend to new situations.
