# Regularization and Feature Scaling

## Regularization

we'll find data for a bunch of points including six predictor variables and one outcome variable. Use sklearn's Lasso class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

### Import libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import Lasso

### Load in the data

    The data is in the file called 'data.csv'. Note that there's no header row on this file.
    Split the data so that the six predictor features (first six columns) are stored in X, and the outcome feature (last column) is stored in y.


In [3]:
train_data = pd.read_csv("data.csv")
X = train_data.iloc[:,0:6]
y = train_data.iloc[:,-1]

### Fit data using linear regression with Lasso regularization

    Create an instance of sklearn's Lasso class and assign it to the variable lasso_reg. You don't need to set any parameter values: use the default values for the quiz.
    Use the Lasso object's .fit() method to fit the regression model onto the data.


In [4]:
lasso_reg = Lasso()
lasso_reg.fit(X,y)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

### Inspect the coefficients of the regression model

    Obtain the coefficients of the fit regression model using the .coef_ attribute of the Lasso object. Store this in the reg_coef variable: the coefficients will be printed out.

In [6]:
reg_coef = lasso_reg.coef_
print(reg_coef)

[ 0.          2.33659619  2.0140086  -0.05753445 -3.91583673  0.        ]


## Feature_Scaling

Previously, we saw how regularization will remove features from a model (by setting their coefficients to zero) if the penalty for removing them is small. we'll revisit the same dataset as before and see how scaling the features changes which features are favored in a regularization step.

we will use sklearn's StandardScaler to standardize the data before you fit a linear regression model to the data with L1 (Lasso) regularization.

### Import Libraries

In [8]:
from sklearn.preprocessing import StandardScaler

### (NEW) Perform feature scaling on data via standardization

    Create an instance of sklearn's StandardScaler and assign it to the variable scaler.
    Compute the scaling parameters by using the .fit_transform() method on the predictor feature array, which also returns the predictor variables in their standardized values. Store those standardized values in X_scaled.


In [9]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### Fit data using linear regression with Lasso regularization

    Create an instance of sklearn's Lasso class and assign it to the variable lasso_reg. You don't need to set any parameter values: use the default values for the quiz.
    Use the Lasso object's .fit() method to fit the regression model onto the data. Make sure that you apply the fit to the standardized data from the previous step (X_scaled), not the original data.


In [10]:
lasso_reg = Lasso()
lasso_reg.fit(X_scaled,y)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

### Inspect the coefficients of the regression model

    Obtain the coefficients of the fit regression model using the .coef_ attribute of the Lasso object. Store this in the reg_coef variable: the coefficients will be printed out.


In [11]:
reg_coef = lasso_reg.coef_
print(reg_coef)

[  0.           3.8596924    9.05021225  -0.         -11.72692976
   0.41040086]
