## CHAPTER 13
---
# LINEAR REGRESSION

---
- Linear regression is one of the simplest supervised learning algorithms in our toolkit
- It is so simple that it is sometimes not considered machine learning at all!
- The fact is that linear regression—and its extensions—continues to be a common and useful method of making predictions when the target vector is a quantitative value (e.g., home price, age).

## 13.1 Fitting a Line

**Problem:** You want to train a model that represents a linear relationship between the feature and target vector.

**Solution:** Use a linear regression (in scikit-learn, ${LinearRegression}$)

In [9]:
# Load libraries
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston

# Load data with only two features
boston = load_boston()
features = boston.data[:,0:2]
target = boston.target

# Create linear regression
regression = LinearRegression()

# Fit the linear regression
model = regression.fit(features, target)

# View the intercept
print('Intercept:', model.intercept_)

# View the feature coefficients
print('Coefficients:', model.coef_)

# First value in the target vector multiplied by 1000
print('First Target Value:', target[0]*1000)

# Predict the target value of the first observation, multiplied by 1000
print('First Observation:', model.predict(features)[0]*1000)

# First coefficient multiplied by 1000
print('First Coefficient:', model.coef_[0]*1000)

Intercept: 22.485628113468223
Coefficients: [-0.35207832  0.11610909]
First Target Value: 24000.0
First Observation: 24573.366631705547
First Coefficient: -352.07831564026765


#### Discussion:
- In our dataset, the target value is the median value of a Boston home (in the 1970s) in thousands of dollars
- The major advantage of linear regression is its interpretability, in large part because the coefficients of the model are the effect of a one-unit change on the target vector.
- For example, the first feature in our solution is the number of crimes per resident.
    - Our model’s coefficient of this feature was ~–0.35, meaning that if we multiply this coefficient by 1,000, we have the change in house price for each additional one crime per capita
- This says that every single crime per capita will decrease the price of the house by approximately $350!

## 13.2 Handling Interactive Effects

**Problem:** You have a feature whose effect on the target variable depends on another feature.

**Solution:** Create an interaction term to capture that dependence using scikit-learn’s $PolynomialFeatures$

In [10]:
# Load libraries
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.preprocessing import PolynomialFeatures

# Load data with only two features
boston = load_boston()
features = boston.data[:,0:2]
target = boston.target

# Create interaction term
interaction = PolynomialFeatures(
    degree=3, include_bias=False, interaction_only=True)
features_interaction = interaction.fit_transform(features)

# Create linear regression
regression = LinearRegression()

# Fit the linear regression
model = regression.fit(features_interaction, target)

# View the feature values for first observation
print('First Observation Feature Values:', features[0])

# Import library
import numpy as np

# For each observation, multiply the values of the first and second feature
interaction_term = np.multiply(features[:, 0], features[:, 1])

# View interaction term for first observation
print('First Observation Interaction Term:', interaction_term[0])

# View the values of the first observation
print('First Observation Values:', features_interaction[0])

First Observation Feature Values: [6.32e-03 1.80e+01]
First Observation Interaction Term: 0.11376
First Observation Values: [6.3200e-03 1.8000e+01 1.1376e-01]


#### Discussion:
- Sometimes a feature’s effect on our target variable is at least partially dependent on another feature
- We can account for interaction effects by including a new feature comprising the product of corresponding values from the interacting features
- In our solution, we used a dataset containing only two features. 
    - We printed the first observation’s values for each of those features above
    - An interaction term was created by simply multiplying those two values together for every observation
    - We printed the interaction term for the first observation above
- However, while often we will have a substantive reason for believing there is an interaction between two features, sometimes we will not. 
- In those cases it can be useful to use scikit-learn’s PolynomialFeatures to create interaction terms for all combinations of features. 
- We can then use model selection strategies to identify the combination of features and interaction terms that produce the best model. 
- To create interaction terms using PolynomialFeatures, there are three important parameters we must set. 
    - Most important, $interaction-only=True$ tells Polynomial Features to only return interaction terms (and not polynomial features, which we will discuss in Section 13.3). 
    - By default, PolynomialFeatures will add a feature containing ones called a bias. We can prevent that with $include-bias=False$. 
    - Finally, the degree parameter determines the maximum number of features to create interaction terms from (in this case we wanted to create an interaction term that is the combination of three features). 
- We can see the output of PolynomialFeatures from our solution by checking to see if the first observation’s feature values and interaction term value match our manually calculated version
    - Printed above as the 'First Observation Values'

## 13.3 Fitting a Nonlinear Relationship

**Problem:** You want to model a nonlinear relationship.

**Solution:** Create a polynomial regression by including polynomial features in a linear regression model

In [12]:
# Load library
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn.preprocessing import PolynomialFeatures

# Load data with one feature
boston = load_boston()
features = boston.data[:,0:1]
target = boston.target

# Create polynomial features x^2 and x^3
polynomial = PolynomialFeatures(degree=3, include_bias=False)
features_polynomial = polynomial.fit_transform(features)

# Create linear regression
regression = LinearRegression()

# Fit the linear regression
model = regression.fit(features_polynomial, target)

# View first observation
print('First_Observation:', features[0])

# View first observation raised to the second power, x^2
print('First_Observation^2:', features[0]**2)

# View first observation raised to the third power, x^3
print('First_Observation^3:', features[0]**3)

# View the first observation's values for x, x^2, and x^3
print('First_Observation_Values:', features_polynomial[0])

First_Observation: [0.00632]
First_Observation^2: [3.99424e-05]
First_Observation^3: [2.52435968e-07]
First_Observation_Values: [6.32000000e-03 3.99424000e-05 2.52435968e-07]


#### Discussion:
- Polynomial regression is an extension of linear regression to allow us to model non‐linear relationships by adding add polynomial features to the model
- The linear regression does not “know” that the $x^2$ is a quadratic transformation of $x$. It just considers it one more variable.
- The more of these new features we add, the more flexible the “line” created by our model

## 13.4 Reducing Variance with Regularization

**Problem:** You want to reduce the variance of your linear regression model.

**Solution:** Use a learning algorithm that includes a shrinkage penalty (also called regularization) like ridge regression and lasso regression

In [15]:
# Load libraries
from sklearn.linear_model import Ridge
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler

# Load data
boston = load_boston()
features = boston.data
target = boston.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create ridge regression with an alpha value
regression = Ridge(alpha=0.5)

# Fit the linear regression
model = regression.fit(features_standardized, target)

# Load library
from sklearn.linear_model import RidgeCV

# Create ridge regression with three alpha values
regr_cv = RidgeCV(alphas=[0.1, 1.0, 10.0])

# Fit the linear regression
model_cv = regr_cv.fit(features_standardized, target)

# View coefficients
print('Coefficients:','\n', model_cv.coef_)

# View alpha
print('Alpha:', model_cv.alpha_)

Coefficients: 
 [-0.91987132  1.06646104  0.11738487  0.68512693 -2.02901013  2.68275376
  0.01315848 -3.07733968  2.59153764 -2.0105579  -2.05238455  0.84884839
 -3.73066646]
Alpha: 1.0
