# Surprise Housing : Advanced Regression
### author : Jesal P.

## Problem Statement
* A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price.
* The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.
* The company wants to know:
    * Which variables are significant in predicting the price of a house, and
    * How well those variables describe the price of a house.

In [1]:
# importing required libraries and setting defaults for environment
import numpy as np
import pandas as pd
pd.set_option('display.max_columns',250)
pd.set_option('display.max_rows',300)

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

# model building and evaluation libraries

* ### User defined functions

In [2]:
#some user defined functions
def get_corr(data,cut_off,fig_x,fig_y):
    if cut_off == 0:
        plt.figure(figsize = (fig_x,fig_y))
        sns.heatmap(round(data.corr(),2), annot = True, cmap="coolwarm",)
        plt.show()
    else:
        plt.figure(figsize = (fig_x,fig_y))
        sns.heatmap(round(data.corr()>cut_off,2), annot = True, cmap="coolwarm")
        plt.show()


* ##  Step:1 EDA and Data Cleaning

In [3]:
# Reading the data
housing = pd.read_csv("train.csv")
house_cpy = housing.copy()
print("housing dataframe size = ",housing.shape)
house_cpy.head()

housing dataframe size =  (1460, 81)


Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gable,CompShg,VinylSd,VinylSd,BrkFace,196.0,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,SBrkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,,Attchd,2003.0,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gable,CompShg,MetalSd,MetalSd,,0.0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976.0,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2001,2002,Gable,CompShg,VinylSd,VinylSd,BrkFace,162.0,Gd,TA,PConc,Gd,TA,Mn,GLQ,486,Unf,0,434,920,GasA,Ex,Y,SBrkr,920,866,0,1786,1,0,2,1,3,1,Gd,6,Typ,1,TA,Attchd,2001.0,RFn,2,608,TA,TA,Y,0,42,0,0,0,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,Gtl,Crawfor,Norm,Norm,1Fam,2Story,7,5,1915,1970,Gable,CompShg,Wd Sdng,Wd Shng,,0.0,TA,TA,BrkTil,TA,Gd,No,ALQ,216,Unf,0,540,756,GasA,Gd,Y,SBrkr,961,756,0,1717,1,0,1,0,3,1,Gd,7,Typ,1,Gd,Detchd,1998.0,Unf,3,642,TA,TA,Y,0,35,272,0,0,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,Gtl,NoRidge,Norm,Norm,1Fam,2Story,8,5,2000,2000,Gable,CompShg,VinylSd,VinylSd,BrkFace,350.0,Gd,TA,PConc,Gd,TA,Av,GLQ,655,Unf,0,490,1145,GasA,Ex,Y,SBrkr,1145,1053,0,2198,1,0,2,1,4,1,Gd,9,Typ,1,TA,Attchd,2000.0,RFn,3,836,TA,TA,Y,192,84,0,0,0,0,,,,0,12,2008,WD,Normal,250000


In [4]:
house_cpy.describe()

Unnamed: 0,Id,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,TotRmsAbvGrd,Fireplaces,GarageYrBlt,GarageCars,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SalePrice
count,1460.0,1460.0,1201.0,1460.0,1460.0,1460.0,1460.0,1460.0,1452.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1379.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0,1460.0
mean,730.5,56.89726,70.049958,10516.828082,6.099315,5.575342,1971.267808,1984.865753,103.685262,443.639726,46.549315,567.240411,1057.429452,1162.626712,346.992466,5.844521,1515.463699,0.425342,0.057534,1.565068,0.382877,2.866438,1.046575,6.517808,0.613014,1978.506164,1.767123,472.980137,94.244521,46.660274,21.95411,3.409589,15.060959,2.758904,43.489041,6.321918,2007.815753,180921.19589
std,421.610009,42.300571,24.284752,9981.264932,1.382997,1.112799,30.202904,20.645407,181.066207,456.098091,161.319273,441.866955,438.705324,386.587738,436.528436,48.623081,525.480383,0.518911,0.238753,0.550916,0.502885,0.815778,0.220338,1.625393,0.644666,24.689725,0.747315,213.804841,125.338794,66.256028,61.119149,29.317331,55.757415,40.177307,496.123024,2.703626,1.328095,79442.502883
min,1.0,20.0,21.0,1300.0,1.0,1.0,1872.0,1950.0,0.0,0.0,0.0,0.0,0.0,334.0,0.0,0.0,334.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,1900.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0,34900.0
25%,365.75,20.0,59.0,7553.5,5.0,5.0,1954.0,1967.0,0.0,0.0,0.0,223.0,795.75,882.0,0.0,0.0,1129.5,0.0,0.0,1.0,0.0,2.0,1.0,5.0,0.0,1961.0,1.0,334.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,2007.0,129975.0
50%,730.5,50.0,69.0,9478.5,6.0,5.0,1973.0,1994.0,0.0,383.5,0.0,477.5,991.5,1087.0,0.0,0.0,1464.0,0.0,0.0,2.0,0.0,3.0,1.0,6.0,1.0,1980.0,2.0,480.0,0.0,25.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0,163000.0
75%,1095.25,70.0,80.0,11601.5,7.0,6.0,2000.0,2004.0,166.0,712.25,0.0,808.0,1298.25,1391.25,728.0,0.0,1776.75,1.0,0.0,2.0,1.0,3.0,1.0,7.0,1.0,2002.0,2.0,576.0,168.0,68.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0,214000.0
max,1460.0,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1600.0,5644.0,1474.0,2336.0,6110.0,4692.0,2065.0,572.0,5642.0,3.0,2.0,3.0,2.0,8.0,3.0,14.0,3.0,2010.0,4.0,1418.0,857.0,547.0,552.0,508.0,480.0,738.0,15500.0,12.0,2010.0,755000.0


In [5]:
house_cpy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallC

* ### Checking Missing Data

In [6]:
# calculating the null percentage of each column
cols = round(100* (house_cpy.isnull().sum()/house_cpy.shape[0]),2)
col =[] 
col_val=[]
for i in range(len(cols)):
    if cols[i]>0:
        col.append(cols.index[i])
        col_val.append(cols[i])
list(sorted(zip(col,col_val),key = lambda t: t[1]))


[('Electrical', 0.07),
 ('MasVnrType', 0.55),
 ('MasVnrArea', 0.55),
 ('BsmtQual', 2.53),
 ('BsmtCond', 2.53),
 ('BsmtFinType1', 2.53),
 ('BsmtExposure', 2.6),
 ('BsmtFinType2', 2.6),
 ('GarageType', 5.55),
 ('GarageYrBlt', 5.55),
 ('GarageFinish', 5.55),
 ('GarageQual', 5.55),
 ('GarageCond', 5.55),
 ('LotFrontage', 17.74),
 ('FireplaceQu', 47.26),
 ('Fence', 80.75),
 ('Alley', 93.77),
 ('MiscFeature', 96.3),
 ('PoolQC', 99.52)]

In [None]:
# Listing columns having more than 50% of NA data
null_cols=[]
set_null_percent = 50
for col in range(len(cols)):
    if cols[col]>= set_null_percent:
        null_cols.append(cols.index[col])
null_cols

In [None]:
# Dropping columns with >50% null values
house_cpy = house_cpy.drop(columns = null_cols)

# Dropping column ID as it is of no use for model building
house_cpy = house_cpy.drop(columns = ['Id'])
house_cpy.info()

In [None]:
# calculating the null percentage of each column
cols = round(100* (house_cpy.isnull().sum()/house_cpy.shape[0]))
col =[] 
col_val=[]
for i in range(len(cols)):
    if cols[i]>0:
        col.append(cols.index[i])
        col_val.append(cols[i])
list(zip(col,col_val))


In [None]:
# Handling categorical variables with meaningful NA
meaning_NA_cols = ['MasVnrType','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','FireplaceQu','GarageType','GarageFinish','GarageQual','GarageCond']
house_cpy[meaning_NA_cols] = house_cpy[meaning_NA_cols].fillna(0)

In [None]:
# calculating the null percentage of each column after handling NA for categorical variables 
cols = round(100* (house_cpy.isnull().sum()/house_cpy.shape[0]))
col =[] 
col_val=[]
for i in range(len(cols)):
    if cols[i]>0:
        col.append(cols.index[i])
        col_val.append(cols[i])
na_num_cols = list(zip(col,col_val))
na_num_cols

In [None]:
# visualising the distplots for the numerical variables for NA substitute imputation 
plt.figure(figsize=(25,7))
for var in range(len(col)):
    plt.subplot(1,3,var+1)
    sns.distplot(house_cpy[col[var]])

In [None]:
# visualising the distplots for the numerical variables for NA substitute imputation 
plt.figure(figsize=(25,7))
for var in range(len(col)):
    plt.subplot(1,3,var+1)
    sns.scatterplot(house_cpy[col[var]],np.log(house_cpy['SalePrice']))
    plt.grid(1)

In [None]:
# Getting details of Garage year based on Garage type = NA
no_garage = house_cpy.loc[house_cpy['GarageType']==0]
no_garage['GarageYrBlt'].shape

In [None]:
#NO garage -> No year present hence dropping such entries
house_cpy['GarageYrBlt'] = house_cpy['GarageYrBlt'].dropna()
house_cpy['GarageYrBlt'].describe()

In [None]:
# this categorical variable had None and 0 as duplicate entries
print(house_cpy['MasVnrType'].unique())
house_cpy['MasVnrType'] = house_cpy['MasVnrType'].apply(lambda x: 0 if x=="None" else x)
house_cpy['MasVnrType'].unique()

In [None]:
house_cpy['MSSubClass'] = house_cpy.astype('object')
cat_cols = list(house_cpy.select_dtypes(include = ['object']))
num_cols = list(house_cpy.select_dtypes(include = ['int64','float64']))
print("categorical_columns = ",cat_cols,len(cat_cols))
print("\nnumerical_columns = ",num_cols, len(num_cols))

In [None]:
for var in range(len(cat_cols)):
    print("\n ",cat_cols[var],house_cpy[cat_cols[var]].unique(),house_cpy[cat_cols[var]].nunique())

In [None]:
exterior_same = house_cpy.loc[house_cpy['Exterior1st']==house_cpy['Exterior2nd']]
print(exterior_same.shape)
exterior_same.head()

In [None]:
exterior_diff = house_cpy.loc[house_cpy['Exterior1st']!=house_cpy['Exterior2nd']]
print(exterior_diff.shape)
exterior_diff.head()

### > There is a spelling correction in Exterior2nd: `Wd Shng` instead of `Wd Sdng`

In [None]:
house_cpy.loc[house_cpy['Exterior2nd'] == "Wd Shng", 'Exterior2nd'] = 'Wd Sdng'

In [None]:
exterior_diff = house_cpy.loc[house_cpy['Exterior1st']!=house_cpy['Exterior2nd']]
print(exterior_diff.shape)
exterior_diff.head()

In [None]:
house_cpy.shape

In [None]:
plt.figure(figsize=(25,150))
for var in range(len(cat_cols)):
    plt.subplot(20,2,var+1)
    sns.countplot(house_cpy[cat_cols[var]])
    plt.xticks(rotation=30)
    plt.grid(1)

In [None]:
# Visualizing the scatter plot for MasVnrType 0 and Saleprice
Mas_Vnr_none = house_cpy.loc[house_cpy['MasVnrType']==0]
sns.scatterplot(Mas_Vnr_none['MasVnrType'],Mas_Vnr_none['SalePrice'])
print(Mas_Vnr_none.shape)
print("Percentage of MasVnrType -> 0 =  ",round(100*(Mas_Vnr_none.shape[0]/house_cpy.shape[0]),2))

In [None]:
get_corr(house_cpy,0,25,25)

### Looking at the correlation heatmap : 
* #### LotFrontage has corr_value > 0.4 with `1stFlrSF` ,`LotArea` ,`GrLivArea`

In [None]:
house_cpy[['1stFlrSF','LotFrontage']].describe()

In [None]:
get_mean_val_1stFlrSF = house_cpy.loc[(house_cpy['1stFlrSF'] >= 1000) & (house_cpy['1stFlrSF'] < 1162)]
get_mean_val_1stFlrSF['LotFrontage'].describe()

* ### Looking at the above statistics we can impute the mean value to the LotFrontage

In [None]:
house_cpy['LotFrontage'] = house_cpy['LotFrontage'].fillna(house_cpy['LotFrontage'].mean())
house_cpy['LotFrontage'].describe()

In [None]:
house_cpy = house_cpy.dropna()

In [None]:
round(100* (house_cpy.isnull().sum()/house_cpy.shape[0]))

In [None]:
print("house_cpy dataframe size = ",house_cpy.shape)
data_left = round((1094/1460)*100,2)
print("\nWe still have got {} of data remaining !\n".format(data_left))
house_cpy.info()

In [None]:
plt.figure(figsize=(25,150))
for var in range(len(cat_cols)):
    plt.subplot(20,2,var+1)
    sns.boxplot(data = house_cpy,x=house_cpy[cat_cols[var]],y= house_cpy['SalePrice'])
    plt.xticks(rotation=30)
    plt.grid(1)

In [None]:
sns.pairplot(house_cpy,corner=False)

In [None]:
# Visualizing the correlation between sales price and other columns
plt.figure(figsize=(3,10))
sns.heatmap(house_cpy.corr()[['SalePrice']].sort_values('SalePrice',ascending=False),annot=True,cmap='coolwarm')


In [None]:
corr = pd.DataFrame()
corr = house_cpy.corr()[['SalePrice']].sort_values('SalePrice',ascending=False)
to_drop = corr.loc[corr['SalePrice']<0.3]
to_drop.index

In [None]:
import threading
threading.activeCount()


In [None]:
# sns.pairplot(house_cpy,corner=True)
# plt.show()

In [None]:
# getting counts of category levels in each categorical variables
for category in cat_cols:
    print("\n",house_cpy[category].astype('category').value_counts())


In [None]:
# plt.figure(figsize=(25,50))
# for i in range(len(num_cols)):
#     plt.subplot(11,4,i+1)
#     sns.distplot(house_cpy[num_cols[i]],label=True)

In [None]:
house_cpy.describe(percentiles=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.95,0.99])

In [None]:
house_cpy.shape

In [None]:
# finding 2-level categories to convert to 0 and 1
cat_cols_dummy=[]
for i in range(len(cat_cols)):
    if house_cpy[cat_cols[i]].nunique()<=2:
        print(house_cpy[cat_cols[i]].nunique(),cat_cols[i])
    else:
        cat_cols_dummy.append(cat_cols[i])
cat_cols_dummy

In [None]:
# As all the houses has got basic utilities such as  : Electricity, Gas, Water, Septic Tank
# the variable utilities can be dropped from the main dataframe
house_cpy = house_cpy.drop(columns='Utilities')

In [None]:
# creating dummies for Street
house_cpy['Street'] = house_cpy['Street'].apply(lambda x: 1 if x=='Pave' else 0)
house_cpy['Street'].describe()

In [None]:
# creating dummies for CentralAir
house_cpy['CentralAir'] = house_cpy['CentralAir'].apply(lambda x: 1 if x=='Y' else 0)
house_cpy['CentralAir'].describe()

* ### Handling the skweness of the target variable

In [None]:
plt.figure(figsize=(25,7))
plt.subplot(1,2,1)
sns.distplot(house_cpy['SalePrice'])

plt.subplot(1,2,2)
#  Norbmalizing the target variable
house_cpy['SalePrice'] = np.log(house_cpy['SalePrice'])
sns.distplot(house_cpy['SalePrice'])


In [None]:
# Looking the correlation data
get_corr(house_cpy,0,25,25)

In [None]:
# Looking the correlation data with cutoff
# get_corr(house_cpy,0.5,25,25)

In [None]:
# # # Visualising the categorical variables
# cat_update_cols = house_cpy.select_dtypes('object').columns
# print(cat_update_cols)
# plt.figure(figsize=(60, 200))
# for i in range(1,len(cat_update_cols)):
#     plt.subplot(18,2,i)
#     house_cpy = house_cpy.sort_values(by='SalePrice',ascending=False)
#     sns.boxplot(x = cat_update_cols[i], y = 'SalePrice', data = house_cpy)
#     plt.grid(1)
# plt.show()

In [None]:
house_cpy[cat_update_cols].nunique()

In [None]:
# Creating dummies for categorical columns
for i in range(len(cat_update_cols)):
    dummys = pd.get_dummies(house_cpy[cat_update_cols[i]],drop_first=True,prefix=cat_update_cols[i])
    house_cpy = pd.concat([house_cpy,dummys],axis=1)
    house_cpy.drop([cat_update_cols[i]],axis=1,inplace=True)

In [None]:
house_cpy.describe()

In [None]:
print(house_cpy.shape)
house_cpy.info()

In [None]:
# get_corr(house_cpy,0,50,50)

In [None]:
# Importing libraries and modules for regression model building
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
scalar = MinMaxScaler()
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV


# > Data Splitting in to train set and test set

In [None]:
y = house_cpy.pop('SalePrice')
X = house_cpy

In [None]:
y.describe()

In [None]:
print(X.info())
X.describe()

In [None]:
# train-test data split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=19)

In [None]:
print("X_train =",X_train.shape)
print("y_train =",y_train.shape)
print("X_test =",X_test.shape)
print("y_test =",y_test.shape)

In [None]:
X_train[X_train.columns] = scalar.fit_transform(X_train[X_train.columns])
X_train.describe()

## 3. Model Building and Evaluation

## Ridge and Lasso Regression

Let's now try predicting car prices, a dataset used in simple linear regression, to perform ridge and lasso regression.

## Ridge Regression

In [None]:
# list of alphas to tune
params = {'alpha': [0.0001, 0.001, 0.01, 0.05, 0.1, 
 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 
 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20, 50, 100, 500, 1000 ]}


ridge = Ridge()

# cross validation
folds = 5
model_cv = GridSearchCV(estimator = ridge, 
                        param_grid = params, 
                         scoring= 'neg_mean_absolute_error', 
                        cv = folds, 
                        return_train_score=True,
                        verbose = 1)            
model_cv.fit(X_train, y_train) 

In [None]:
cv_results = pd.DataFrame(model_cv.cv_results_)
cv_results = cv_results[cv_results['param_alpha']<=900]
cv_results.head()

In [None]:
# plotting mean test and train scoes with alpha 
cv_results['param_alpha'] = cv_results['param_alpha'].astype('int32')

# plotting
plt.figure(figsize=(20,10))
plt.plot(cv_results['param_alpha'], cv_results['mean_train_score'])
plt.plot(cv_results['param_alpha'], cv_results['mean_test_score'])
plt.xlabel('alpha')
plt.ylabel('Negative Mean Absolute Error')
plt.title("Negative Mean Absolute Error and alpha")
plt.legend(['train score', 'test score'], loc='upper left')
plt.grid(1)
plt.show()


In [None]:
alpha = 100
ridge = Ridge(alpha=alpha)

ridge.fit(X_train, y_train)
ridge.coef_