### Lasso and Ridge Regression

linear regression model can overfit, and give error when testing. To take care of this issue, we can penalize the regression using techniques such as Lasso and Ridge regression. This is also known as regularizing / generalizing the model.

Idea is to minimize cost function, i.e. sum of square of diff between actaul and predicted.

In lasso we add lambda/alpha (default value of 1) multiplied by magnitude of slope. In ridge, we do the same thing but multiply with square of slope magnitude. (L1 and L2 techniques).

##### used the linear regression code developed for prediction of house prices
Then save that clean file and use it here

In [1]:
# importing resources
import pandas as pd
import numpy as np

In [19]:
# importing
path = r"C:\DS\house_price\df_clean.csv"
df = pd.read_csv(path)
df.head()

Unnamed: 0.1,Unnamed: 0,bath,balcony,price,area_sqft,bedrooms,area_type_Carpet Area,area_type_Plot Area,area_type_Super built-up Area,availability_Ready To Move
0,0,2.0,1.0,39.07,1056.0,2,0,0,1,0
1,1,5.0,3.0,120.0,2600.0,4,0,1,0,1
2,2,2.0,3.0,62.0,1440.0,3,0,0,0,1
3,3,3.0,1.0,95.0,1521.0,3,0,0,1,1
4,4,2.0,1.0,51.0,1200.0,2,0,0,1,1


In [21]:
# performing regression with this dataset as in previous case
# splittin x y
x = df.drop(['price'], axis=1)
y = df['price']

# train test split
from sklearn.model_selection import train_test_split as tts
xtr, xt, ytr, yt = tts(x, y, test_size=0.2, random_state=108)

# feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(xtr)
xtr = sc.transform(xtr)
xt = sc.transform(xt) # we do not transformthe response

from sklearn.linear_model import LinearRegression
lr = LinearRegression() # for both simple and multiple linear regression
lr.fit(xtr, ytr) # training

lr.score(xt, yt)

0.4781208054019941

In [22]:
# we have 47% accuracy
# let us assume this is overfit model
# using lasso / ridge

### Implementing ridge / lasso regression

In [23]:
# importing
from sklearn.linear_model import Ridge, Lasso

In [24]:
rd = Ridge() # default value of alpha = 1, which is main parameter

rd.fit(xtr, ytr)

rd.score(xt, yt)

0.478115451822966

In [None]:
# this is slightly different
# now we use Lasso

In [25]:
ls = Lasso() # default value of alpha = 1, which is main parameter

ls.fit(xtr, ytr)

ls.score(xt, yt)

0.47770662723758206

In [None]:
# this is slightly different
# let us play with alpha value

In [26]:
rd2 = Ridge(alpha=2) # default value of alpha = 1, which is main parameter

rd2.fit(xtr, ytr)

rd2.score(xt, yt)

0.4781100580089759

In [None]:
# very slight impact

In [27]:
ls2 = Lasso(alpha=2) # default value of alpha = 1, which is main parameter

ls2.fit(xtr, ytr)

ls2.score(xt, yt)

0.47517106197802605

In [None]:
# we can play with alpha to see which is the best value

In [28]:
ls3 = Lasso(alpha=3) # default value of alpha = 1, which is main parameter

ls3.fit(xtr, ytr)

ls3.score(xt, yt)

0.4724183673130964

In [29]:
# we can use hyper parameter tuning to find best alpha value for best accuracy - will go through later
# RMSE

In [30]:
# testing RMSE
ypred = lr.predict(xt)
ypred

array([ 60.30704955,  70.39469808,  79.07350641, ...,  44.27842228,
        69.95494891, 200.07829378])

In [36]:
from sklearn.metrics import mean_squared_error as mse
# mse is most common cost function used in ML - many other cost functions are also used

In [38]:
ms_error = mse(yt, ypred)
rmse = np.sqrt(ms_error)
print('MSE: ', ms_error)
print('RMSE: ', rmse)

MSE:  10240.543804325935
RMSE:  101.19557205888968


In [39]:
# this is very large error
# let us find R-square
from sklearn.metrics import r2_score

In [40]:
r2_score(yt, ypred)

0.4781208054019941

In [None]:
# this is same as lr.score function