## Data Processing: Linear, Ridge, And Lasso Regression
----


In [1]:
import numpy as np
import pandas as pd


### Regression Models 
---
Models and Measuring fits;

bias-variance trade-offs come in, overfitted models

* Lasso
* Ridge
* OLS
* Support Vector Regression


###  Regression Setup
---
predict an output value from a continuous range
I have _causes_ and I want to predict _effects_

These can be plotted on a 2D graph, with a best fit line draw on the data using the linear equation: $ y = mx + b $

Minimize all the least squared residuals

Doing this for all possible regression lines for a plane, is finding the best fit line for a plane

_Residual_ of an regression are the differences between actual and fitted variable on the line. 

Oridinaly Least Squares minimizes this as its objective model


###  Good Detections Models can be found by Testing Fit
---
* Best fit mimizes the errors between the fit and the data; 
* Minimizing the Mean Squared Error is a goal
* Calculate fitted values for each values and calculate the difference between the observed values


### What to Do with All my Errors
---

* MSE - variance of the residuals
* MSE can be extended to multiple linear regression models as well
* _BLUE_ Best Linear Unbiased Estimator

* _R2_ is calculated by divided the ESS / TSS 

( explained Sum of squares/ variance of actual values ) => How well the regression represented the data; the higher the number the higher the representation

* Other types of regression make use of other measure than MSE:
    * Ridge
    * Lasso
    * Elastic Net
    * Support Vector 


### Results Interpretation: T-Statistics and p-values
----
* Fairly "out of vogue"
* A T-Statistic is the computed measure of the number of standard deviations away from 0, that the current multiplier is.
* a p-value = 0.0501 is the same as a p-value of 0.0499
* One is a cause for celebration, the other not so much

In [2]:
# Load some data to test with
df = pd.read_csv(filepath_or_buffer = "../data/available_data.csv")

FileNotFoundError: [Errno 2] No such file or directory: '../data/available_data.csv'

In [None]:
df.head(2)

In [None]:
X = df.iloc[ : , 1:-1].values
y = df.iloc[ : , -1].values

In [None]:
X

In [None]:
y

### Handling Missing Values
----
First Steps with any data project
Also, most of the steps in any data project are

*Data Cleaning*

#### Generic Steps
----
1. Create an Instance Variable of the Imputer/Scaler
2. `fit` that instance on the data
3. `transform` the data 

In [None]:
from sklearn.impute import SimpleImputer
#Imputation transformer for completing missing values.

In [None]:
numerical_imputer = SimpleImputer(missing_values=np.nan, strategy="median")
categorical_imputer = SimpleImputer(missing_values=np.nan, strategy="most_frequent")

In [None]:
# fit the imputer to the X vars
numerical_imputer.fit(X[ : , 1:3])
categorical_imputer.fit(X [ : , 0:1])

In [None]:
X[ : , 1:3] = numerical_imputer.transform(X[ : , 1:3])
X[ : , 0:1] = categorical_imputer.transform(X [ : , 0:1])

### Handling Categorical Value
----
Nearly the Same process for categorical data as well

In [None]:
# grab the OneHotEncoder and ColumnTransformer from sklearn
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

In [None]:
# declaring an instance variable of the encoder
gender_encoder = OneHotEncoder(sparse=False)

In [None]:
# build a column transformer, this can be expanded to encode more
# columns that were not specified in `transformers` will be passed
# through. This subset of columns is concatenated with the output of
# the transformers.
clm = ColumnTransformer(transformers=[
    ('cat', gender_encoder, [0])],
                        remainder='passthrough')

In [None]:
clm.fit(X)
X = clm.transform(X)

In [None]:
# [ M, F, Age, Salary]
X

### Splitting the data into training and test
----
- Splits can take many forms
- most people opt for 80-20 or 60-40 splits
- There are software packages out there to allow `git` - like versioning of data science models

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .20)

### Scaling your data
----
Remember that scaling the data will allow it to have mean of or close to `0` and a Standard Deviation of `1`

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
sc_train = StandardScaler()

In [None]:
sc_train.fit(X_train)

In [None]:
# Run the transformation
X_train = sc_train.transform(X_train)
X_test = sc_train.transform(X_test)

In [None]:
X_train

## Linear Regression
----



$ y = mx + b $


Simple approach to machine learning

Where most models begin, can be expanded for multiple dimensional purposes

* Reflexive Statistics

In [None]:
df = pd.read_csv("../data/Salary_Data.csv")

In [None]:
df.head()

#### Checking the Data
----

In [None]:
df.isnull().sum()

### Assigning X and Y
----

In [None]:
X = df[["YearsExperience"]]

In [None]:
y = df.Salary

#### Split the Data into Training and Test Sets
----

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .20)

### Import the LinearRegression model
----
1. Import the model
2. Create an Instance of it
3. Feed the data to the instance

In [None]:
from sklearn.linear_model import LinearRegression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

In [None]:
# Create your Predictions for Y and store them in a Variable
y_pred = lr_model.predict(X_test)
y_pred

In [None]:
# Import mean_absolute_error to check your models accuracy
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_test, y_pred)

In [None]:
# Import mean_squared_error to check your models accuracy
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)

In [None]:
# Taking the Square Root of the Mean Squared Error will give you
# the standard deviation of the residuals 
from sklearn.metrics import mean_squared_error
np.sqrt(mean_squared_error(y_test, y_pred))

In [None]:
# Import and Calculate R^2 to test for accuracy in X's ability to
# Explain the Y, according to the model
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

## Linear Regression with Advanced Preprocessing
----
Smaller Data Cleansing exercises can be done in small scripts, or with a few lines inside a larger context

`sklearn.preprocessing` will help you build out _pipelines_ that will transform your data along it's way to the end user

In [None]:
df = pd.read_csv("../data/50_Startups.csv")

In [None]:
df.head()

In [None]:
# Create X,Y; Drop the prediction value from X, assign it to Y
X = df.drop('Profit', axis=1)
y = df['Profit']

In [None]:
X.head()

In [None]:
y.head()

### Create Lists Representing the features you need to process
----

In [None]:
categorical_features = ['State']
numeric_features = ['R&D Spend', 'Administration', 'Marketing Spend']

In [None]:
# import the required toolset
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

### `sklearn.pipeline` creating reproducible transformation
----
Sequentially apply a list of transforms and a final estimator.

* Intermediate steps of the pipeline must be 'transforms'
* Must implement fit and transform methods.
* The final estimator only needs to implement fit.
* The transformers in the pipeline can be cached using ``memory`` argument

The purpose of the pipeline is to assemble several steps that can be
cross-validated together while setting different parameters.

In [None]:
# Create a Numeric Transformer PIPELINE to handling missing numbers
# Use the strategy 'median' to impute the median value
# np.nan is only the placeholder for missing values to be imputed
numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(missing_values=np.nan, strategy='median')),
        ('scaler', StandardScaler())])

In [None]:
# categorical transformers look similar
categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(missing_values=np.nan, strategy='most_frequent')),
        ('onehot' , OneHotEncoder(sparse=False))])

In [None]:
# Finally create an instance of a column transformer ot be invoked
# during the model implementation steps
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

### Building the Model Pipeline
----


In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
# create the instance of the regression class
lr_model = LinearRegression()

In [None]:
# instantiate a model variable that represents the linear pipeline
model = Pipeline(steps= [ ("preprocess", preprocessor), ("linear_model", lr_model)])

In [None]:
# Split your data to be fed into the model instance
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .20, random_state = 123)

In [None]:
# Fit your model to your data
model.fit(X_train, y_train)

### Use the Model to make preditions on Y
----

In [None]:
y_pred = model.predict(X_test)

In [None]:
# Check for your models explanation of Y according to X
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

In [None]:
# Check for your Root Mean Squared Errors
# to see how spread out the errors are from the fit
from sklearn.metrics import mean_squared_error
np.sqrt(mean_squared_error(y_test, y_pred))

## Lasso and Ridge Regression
----
Why Do we need regularized regressions?

Large numbers of features can cause models to over estimate their impact, lead to overfitting, or cause computational issues

#### It might be a significant complexity issue as you try to model every feature of everything

##### create some boundings around your regressions

### Model Complexity Increases as Features are added in
----
Along with Complexity comes overfitting 

In [None]:
from sklearn.linear_model import LinearRegression
import random
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 10

#Define input array with angles from 60deg to 300deg converted to radians
x = np.array([i*np.pi/180 for i in range(60,300,4)])
np.random.seed(10)  #Setting seed for reproducability
y = np.sin(x) + np.random.normal(0,0.15,len(x))
data = pd.DataFrame(np.column_stack([x,y]),columns=['x','y'])


In [None]:
#Demo Model Complexity Increased with N-number of features
# https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
plt.plot(data['x'],data['y'],'.')
for i in range(2,16):  #power of 1 is already there
    colname = 'x_%d'%i      #new var will be x_power
    data[colname] = data['x']**i


In [None]:

def linear_regression(data, power, models_to_plot):
    #initialize predictors:
    predictors=['x']
    if power>=2:
        predictors.extend(['x_%d'%i for i in range(2,power+1)])
    
    #Fit the model
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors],data['y'])
    y_pred = linreg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered power
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for power: %d'%power)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret

#Initialize a dataframe to store the results:
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['model_pow_%d'%i for i in range(1,16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)

#Define the powers for which a plot is required:
models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}


In [None]:

#Iterate through all powers and assimilate results
for i in range(1,16):
    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data, power=i, models_to_plot=models_to_plot)

In [None]:
# Exponential Growth inside of model complexity; 
# Constrain your coefficient magnitudes
pd.options.display.float_format = '{:,.2g}'.format
coef_matrix_simple

If a features becomes too much of the models core emphasis in predictions, the model focuses on estimating the output according to those relations and overfitting itself to the training data

In [None]:
from IPython.display import Image
Image("../src/photos/holy_grail.png")

### Least *Absolute* Shrinkage & *Selection* Operator
----
* Performs an L1 regularization on the model, *sum of absolute values of the magnitude of the coefficients* is added in as a penalty
* Minimizalization objectives: least sqaures objective + $ \alpha $ * ( sum of absolute values of coefficents )

### what is $ \alpha $:
----
a parameter that balances out the amount of emphasis given to minimizing the RSS vs. minimzing the sum of sqaure coeffiencts
###### As $ \alpha $ increases, model complexity decreases



#####  $ \alpha $  = 0 :
----
same as OLS, same coefficients

##### $ \alpha $ equal to $ \infty $ : coefficents = 0

##### As $ \alpha $ is greater than 0 and less than $ \infty $ :Coefficeints will be between 0 and 1
----


In [None]:
from sklearn.linear_model import Lasso
def lasso_regression(data, predictors, alpha, models_to_plot={}):
    #Fit the model
    lassoreg = Lasso(alpha=alpha,normalize=True, max_iter=1e5)
    lassoreg.fit(data[predictors],data['y'])
    y_pred = lassoreg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered alpha
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    return ret

In [None]:
#Initialize predictors to all 15 powers of x
predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

#Define the alpha values to test
alpha_lasso = [1e-15, 1e-10, 1e-8, 1e-5,1e-4, 1e-3,1e-2, 1, 5, 10]

#Initialize the dataframe to store coefficients
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_lasso[i] for i in range(0,10)]
coef_matrix_lasso = pd.DataFrame(index=ind, columns=col)

#Define the models to plot
models_to_plot = {1e-10:231, 1e-5:232,1e-4:233, 1e-3:234, 1e-2:235, 1:236}


In [None]:

#Iterate over the 10 alpha values:
for i in range(10):
    coef_matrix_lasso.iloc[i,] = lasso_regression(data, predictors, alpha_lasso[i], models_to_plot)

In [None]:
from IPython.display import Image
Image("../src/photos/johnnyD_huh.png")

In [None]:
# many Coefficients are 0, even for very small values of alpha
#      
# coef_matrix_lasso
coef_matrix_lasso.apply(lambda x: sum(x.values==0), axis=1)

##### observe that even for a small value of alpha, a significant number of coefficients are zero.

This also explains the horizontal line fit for alpha=1 in the lasso plots, its just a baseline model!

#### This phenomenon of most of the coefficients being zero is called _‘sparsity‘_.

### Ridge Regression
----
* Performs an L2 regularization on the model, *sum of sqaured values of the magnitude of the coefficients* is added in as a penalty
* Minimizalization objectives: least sqaures objective + $ \alpha $ * ( sum of sqaure values of coefficents )
    * A.K.A: RSS + $ \alpha $ * ( sum of sqaure values of coefficents )

### what is $ \alpha $:
----
a parameter that balances out the amount of emphasis given to minimizing the RSS vs. minimzing the sum of sqaure coeffiencts
* As $ \alpha $ increases, model complexity decreases



#####  $ \alpha $  = 0 :
----
same as OLS, same coefficients

##### $ \alpha $ equal to $ \infty $ : coefficents = 0

##### As $ \alpha $ is greater than 0 and less than $ \infty $ :Coefficeints will be between 0 and 1
----


In [None]:
from sklearn.linear_model import Ridge
def ridge_regression(data, predictors, alpha, models_to_plot={}):
    #Fit the model
    ridgereg = Ridge(alpha=alpha,normalize=True)
    ridgereg.fit(data[predictors],data['y'])
    y_pred = ridgereg.predict(data[predictors])
    
    #Check if a plot is to be made for the entered alpha
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    
    #Return the result in pre-defined format
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([ridgereg.intercept_])
    ret.extend(ridgereg.coef_)
    return ret

predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])

#Set the different values of alpha to be tested
alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]

#Initialize the dataframe for storing coefficients.
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,10)]
coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)

models_to_plot = {1e-15:231, 1e-10:232, 1e-4:233, 1e-3:234, 1e-2:235, 5:236}


In [None]:
for i in range(10):
    coef_matrix_ridge.iloc[i,] = ridge_regression(data, predictors, alpha_ridge[i], models_to_plot)

In [None]:
pd.options.display.float_format = '{:,.2g}'.format
coef_matrix_ridge

#### Though the coeffcients are very small, they aren't zero

#### High alpha can lead to significant underfitting

#### Small Alpha, can perform very well

### What about Elastic Net
----
Elastic Nets combine L1 and L2 transformations

1. Key Difference

    Ridge: It includes all (or none) of the features in the model. Thus, the major advantage of ridge regression is coefficient shrinkage and reducing model complexity.
    Lasso: Along with shrinking coefficients, lasso performs feature selection as well. (Remember the ‘selection‘ in the lasso full-form?) As we observed earlier, some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.

Traditionally, techniques like stepwise regression were used to perform feature selection and make parsimonious models. But with advancements in Machine Learning, ridge and lasso regression provide very good alternatives as they give much better output, require fewer tuning parameters and can be automated to a large extend.

 
2. Typical Use Cases

    Ridge: It is majorly used to prevent overfitting. Since it includes all the features, it is not very useful in case of exorbitantly high #features, say in millions, as it will pose computational challenges.
    Lasso: Since it provides sparse solutions, it is generally the model of choice (or some variant of this concept) for modelling cases where the #features are in millions or more. In such a case, getting a sparse solution is of great computational advantage as the features with zero coefficients can simply be ignored.

Its not hard to see why the stepwise selection techniques become practically very cumbersome to implement in high dimensionality cases. Thus, lasso provides a significant advantage.

 
3. Presence of Highly Correlated Features

    Ridge: It generally works well even in presence of highly correlated features as it will include all of them in the model but the coefficients will be distributed among them depending on the correlation.
    Lasso: It arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero. Also, the chosen variable changes randomly with change in model parameters. This generally doesn’t work that well as compared to ridge regression.


#### Summary: Predict the price of a vehicle given other information about it


In [None]:
auto_data = pd.read_csv('../data/imports-85.data', sep=r'\s*,\s*', engine='python')
auto_data.head(5)

### 20+ Features for each car
* Missing values represented with `?`

#### Fill missing values with NaN

In [None]:
import numpy as np

auto_data = auto_data.replace('?', np.nan)
auto_data.head(2)

### Using `Describe()` to generate information about the data
#### Information about numeric fields in our dataframe
Note that the automobile price is not present

In [None]:
auto_data.describe()

#### Information about all fields in our dataframe
----
You can see more data fields not included by passing in the kwag `all`

In [None]:
auto_data.describe(include='all')

### Data Cleaning
Also called data cleansing. Involves identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.

#### What data type is price?

In [None]:
auto_data['price'].describe()

#### Convert the values in the price column to numeric values
If conversion throws an error set to NaN (by setting errors='coerce')

In [None]:
auto_data['price'] = pd.to_numeric(auto_data['price'], errors='coerce') 

In [None]:
auto_data['price'].describe()

#### Dropping a column which we deem unnecessary
----
Dropping an insurance-related measure

In [None]:
auto_data = auto_data.drop('normalized-losses', axis=1)
auto_data.head()

#### Checking for other processesing

In [None]:
auto_data.describe()

#### Horsepower is also non-numeric...

In [None]:
auto_data['horsepower'].describe()

#### ...so this is also converted to a numeric value

In [None]:
auto_data['horsepower'] = pd.to_numeric(auto_data['horsepower'], errors='coerce') 

In [None]:
auto_data['horsepower'].describe()

### Dealing with Categorical Values 
----
Mapping categorical to numeric values to make sense
we aren't encoding
we are _just_ assigning the numbers that will allow calculation

In [None]:
auto_data['num-of-cylinders'].describe()

#### Since there are only 7 unique values, we can explicitly set the corresponding numeric values

In [None]:
cylinders_dict = {'two': 2, 
                  'three': 3, 'four': 4, 'five': 5, 'six': 6, 'eight': 8, 'twelve': 12}
auto_data['num-of-cylinders'].replace(cylinders_dict, inplace=True)

auto_data.head()

#### All other non-numeric fields can be made into usable features by applying one-hot-encoding

In [None]:
auto_data = pd.get_dummies(auto_data, 
                           columns=['make', 
                                    'fuel-type', 
                                    'aspiration', 
                                    'num-of-doors', 
                                    'body-style', 
                                    'drive-wheels', 
                                    'engine-location', 
                                    'engine-type', 
                                    'fuel-system'])
auto_data.head()

#### Drop rows containing missing values
----
Save dropping `na` values until nearly the last step

In [None]:
auto_data = auto_data.dropna()
auto_data.tail(3)

#### Verify that there are no null values in the data set

In [None]:
auto_data[auto_data.isnull().any(axis=1)]

### Data Cleaning is now complete
We can now use our data to build our models

#### Create training and test data using train_test_split`

In [None]:
# Function to train and test the data into instances for us
# Splitting the data will allow us to tune the model and help it perform
# against unseen data
from sklearn.model_selection import train_test_split

# not part of the X variables; 
X = auto_data.drop('price', axis=1)

# Taking the labels (price)
Y = auto_data['price']

# Spliting into 80% for training set and 20% for testing set so we can see our accuracy
X_train, x_test, Y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

#### Create a LinearRegression model with our training data
----
Instantiate an instance of the `sklearn.LinearRegression()` and then `fit()` you model with your data.
* Estimators are high-level objects in sklearn that offer `fit()`

In [None]:
from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(X_train, Y_train)

#### X values will not be overwritten

#### _fit intercept_ will fit the line because the data is not centered around 0

#### normalized is false, we do not need to normalize this data set. 

#### Check R-square on training data
----
calling the `score()` on the linear model

In [None]:
linear_model.score(X_train, Y_train)

`96%` of the data is captured by the line

#### View coefficients for each feature
----
weights for each feature in the data set

In [None]:
linear_model.coef_

#### A better view of the coefficients
List of features and their coefficients, ordered by coefficient value

coefficents with small of negative numbers will have little or negative effects on the price, while other features will have large effects on the price of the car as we predict it

In [3]:
predictors = X_train.columns
coef = pd.Series(linear_model.coef_,predictors).sort_values()

print(coef)

NameError: name 'X_train' is not defined

#### Make predictions on test data
----
All estimators have a `predict()` attached to them that allows you to predict values using the trained model

In [None]:
y_predict = linear_model.predict(x_test)

#### Compare predicted and actual values of Price

In [None]:
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 6)

plt.plot(y_predict, label='Predicted')
plt.plot(y_test.values, label='Actual')
plt.ylabel('Price')

plt.legend()
plt.show()

#### R-square score
For our model, how well do the features describe the price?

In [None]:
r_square = linear_model.score(x_test, y_test)
r_square

#### Testing the Model itself using MSE and RootMSE

#### Calculate Mean Square Error

In [None]:
from sklearn.metrics import mean_squared_error

linear_model_mse = mean_squared_error(y_predict, y_test)
linear_model_mse

#### Root of Mean Square Error

In [None]:
import math

math.sqrt(linear_model_mse)

`linear_model_mse` can be in the positive or negative direction

### Lasso Regression
Cost Function: RSS + <b>&alpha;</b>*(sum of absolute values of coefficients)

RSS = Residual Sum of Squares

Larger values of <b>&alpha;</b> should result in smaller coefficients as the cost function needs to be minimized

In [None]:
from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.5, normalize=True)
lasso_model.fit(X_train, Y_train)

#### Check R-square on training data

In [None]:
lasso_model.score(X_train, Y_train)

#### Coefficients when using Lasso

In [None]:
coef = pd.Series(lasso_model.coef_,predictors).sort_values()
print(coef)

#### Make predictions on test data

In [None]:
y_predict = lasso_model.predict(x_test)

#### Compare predicted and actual values of Price

In [None]:
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 6)

plt.plot(y_predict, label='Predicted')
plt.plot(y_test.values, label='Actual')
plt.ylabel('Price')

plt.legend()
plt.show()

#### Check R-square value on test data

In [None]:
r_square = lasso_model.score(x_test, y_test)
r_square

#### Is the root mean square error any better?

In [None]:
lasso_model_mse = mean_squared_error(y_predict, y_test)
math.sqrt(lasso_model_mse)

### Ridge Regression
Cost Function: RSS + <b>&alpha;</b>*(sum of squares of coefficients)

RSS = Residual Sum of Squares

Larger values of α should result in smaller coefficients as the cost function needs to be minimized

Ridge Regression penalizes large coefficients even more than Lasso as coefficients are squared in cost function

In [None]:
from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=0.05, normalize=True)
ridge_model.fit(X_train, Y_train)

#### Check R-square on training  data

In [None]:
ridge_model.score(X_train, Y_train)

#### Coefficients when using Ridge

In [None]:
coef = pd.Series(ridge_model.coef_,predictors).sort_values()
print(coef)

#### Make predictions on test data

In [None]:
y_predict = ridge_model.predict(x_test)

#### Compare predicted and actual values of Price

In [None]:
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 6)

plt.plot(y_predict, label='Predicted')
plt.plot(y_test.values, label='Actual')
plt.ylabel('Price')

plt.legend()
plt.show()

#### Get R-square value for test data

In [None]:
r_square = ridge_model.score(x_test, y_test)
r_square

In [None]:
ridge_model_mse = mean_squared_error(y_predict, y_test)
math.sqrt(ridge_model_mse)