<a href="https://colab.research.google.com/github/nileshgode/My-Python-Projects/blob/master/Ridge_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Ridge regression takes an alternative approach by introducing a penalty that penalizes large weights. As a result, the optimization process tries to reduce the magnitude of the coefficients without completely eliminating them.

In [0]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures

In [0]:
_df = pd.read_csv('https://raw.githubusercontent.com/PacktWorkshops/The-Data-Science-Workshop/master/Chapter07/Dataset/ccpp.csv')

In [3]:
_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9568 entries, 0 to 9567
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AT      9568 non-null   float64
 1   V       9568 non-null   float64
 2   AP      9568 non-null   float64
 3   RH      9568 non-null   float64
 4   PE      9568 non-null   float64
dtypes: float64(5)
memory usage: 373.9 KB


In [0]:
X = _df.drop(['PE'], axis=1).values

In [0]:
y = _df['PE'].values

In [0]:
train_X, eval_X, train_y, eval_y = train_test_split(X, y, train_size=0.8, random_state=0)

In [0]:
lr_model_1 = LinearRegression()

In [8]:
lr_model_1.fit(train_X, train_y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [0]:
lr_model_1_preds = lr_model_1.predict(eval_X)

In [10]:
print('lr_model_1 R2 Score: {}'.format(lr_model_1.score(eval_X, eval_y)))

lr_model_1 R2 Score: 0.9325315554761302


In [11]:
print('lr_model_1 MSE: {}'.format(mean_squared_error(eval_y, lr_model_1_preds)))

lr_model_1 MSE: 19.733699303497648


In [0]:
steps = [
    ('scaler', MinMaxScaler()),
    ('poly', PolynomialFeatures(degree=3)),
    ('lr', LinearRegression())
]

In [0]:
lr_model_2 = Pipeline(steps)

In [15]:
lr_model_2.fit(train_X, train_y)

Pipeline(memory=None,
         steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('poly',
                 PolynomialFeatures(degree=3, include_bias=True,
                                    interaction_only=False, order='C')),
                ('lr',
                 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
                                  normalize=False))],
         verbose=False)

In [16]:
print('lr_model_2 R2 Score: {}'.format(lr_model_2.score(eval_X, eval_y)))

lr_model_2 R2 Score: 0.9443678654045208


In [0]:
lr_model_2_preds = lr_model_2.predict(eval_X)

In [18]:
print('lr_model_2 MSE: {}'.format(mean_squared_error(eval_y, lr_model_2_preds)))

lr_model_2 MSE: 16.271722632207666


You can see from the output that the MSE of the second model is 16.272. This is less than the MSE of the first model, which is 19.734. You can safely conclude that the second model is better than the first.

In [19]:
print(lr_model_2[-1].coef_)

[ 7.72661789e-14 -1.77278028e+02 -4.60337188e+01 -1.60520675e+02
 -1.23076123e+02  6.23358210e+00  8.19655844e+00  1.45478576e+02
  1.88658651e+02  2.43740192e+01  1.80553150e+02 -1.08058561e+02
  1.09713294e+02  1.79121906e+02  1.06460596e+02  2.67290613e+01
  7.79833654e+01  3.69241324e+01 -1.13863997e+02 -1.42673215e+02
 -9.69606773e+01  1.90706809e+02 -5.56429546e+01 -1.32595225e+02
 -9.41682917e+01  9.40112729e+01 -1.18732510e+02 -7.64871610e+01
 -4.18714081e+01  6.36772260e+01  4.42340977e+01 -3.81114691e+01
 -4.71547759e+01 -9.16797074e+01 -2.52346805e+01]


In [20]:
print(len(lr_model_2[-1].coef_))

35


In [0]:
steps = [
    ('scaler', MinMaxScaler()),
    ('poly', PolynomialFeatures(degree=10)),
    ('lr', LinearRegression())
]

In [0]:
lr_model_3 = Pipeline(steps)

In [23]:
lr_model_3.fit(train_X, train_y)

Pipeline(memory=None,
         steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('poly',
                 PolynomialFeatures(degree=10, include_bias=True,
                                    interaction_only=False, order='C')),
                ('lr',
                 LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
                                  normalize=False))],
         verbose=False)

In [24]:
print('lr_model_3 R2 Score: {}'.format(lr_model_3.score(eval_X, eval_y)))

lr_model_3 R2 Score: 0.5683459493202576


In [0]:
lr_model_3_preds = lr_model_3.predict(eval_X)

In [26]:
print('lr_model_3 MSE: {}'.format(mean_squared_error(eval_y, lr_model_3_preds)))

lr_model_3 MSE: 126.25355896914343


In [27]:
print(len(lr_model_3[-1].coef_))

1001


In [28]:
print(lr_model_3[-1].coef_[:35])

[ 3.92417572e+05 -6.90884957e+07 -4.12728037e+07  2.27928559e+07
 -4.76788092e+07  2.96663457e+08  2.73267484e+08  1.07844757e+08
  3.73718995e+08  8.79698968e+07 -2.35367959e+07  2.46251982e+08
 -2.61104947e+08  1.86087661e+07  1.41131196e+08 -6.53886924e+08
 -8.90633353e+08 -1.06074814e+09 -1.29264510e+09 -4.28435046e+08
  5.31626843e+07 -1.30408977e+09  4.41024830e+08 -8.86228713e+08
 -8.78160515e+08 -1.97377311e+06 -5.39375259e+08 -3.68352714e+08
  9.82113559e+08 -2.76718687e+08 -6.28824872e+08  8.14257203e+08
  5.43205856e+08 -2.03045074e+08 -2.42929048e+08]


In [0]:
steps = [
    ('scaler', MinMaxScaler()),
    ('poly', PolynomialFeatures(degree=10)),
    ('lr', Ridge(alpha=0.9))
]

In [0]:
ridge_model = Pipeline(steps)

In [31]:
ridge_model.fit(train_X, train_y)

Pipeline(memory=None,
         steps=[('scaler', MinMaxScaler(copy=True, feature_range=(0, 1))),
                ('poly',
                 PolynomialFeatures(degree=10, include_bias=True,
                                    interaction_only=False, order='C')),
                ('lr',
                 Ridge(alpha=0.9, copy_X=True, fit_intercept=True,
                       max_iter=None, normalize=False, random_state=None,
                       solver='auto', tol=0.001))],
         verbose=False)

In [32]:
print('ridge_model R2 Score: {}'.format(ridge_model.score(eval_X, eval_y)))

ridge_model R2 Score: 0.9451949082623449


In [0]:
ridge_model_preds = ridge_model.predict(eval_X)

In [34]:
print('ridge_model MSE: {}'.format(mean_squared_error(eval_y, ridge_model_preds)))

ridge_model MSE: 16.02982265685497


In [35]:
print(len(ridge_model[-1].coef_))

1001


In [36]:
print(ridge_model[-1].coef_[:35])

[  0.         -39.79803902  -7.77413135   6.07694837   3.10326786
 -18.17945028  -9.45440071  -7.4037462  -16.97192766  -9.10799691
   6.96959155  -1.55574911   4.49242992   0.31127893   5.27565009
  -4.07568831  -0.95958324   2.38995687  -6.1583696   -2.05510604
   2.3741985   -1.30281151  -1.7837005   -4.53024264  -8.30749466
  -3.42801698   0.65288784  -2.74767783   5.47711767   4.68241474
  -2.1214614   -0.47331885   0.43221968  -0.28909998   4.64549348]


You can see from the preceding output that the coefficient values no longer have large magnitudes. A lot of the coefficients have a magnitude that is less than 10, with none we can see exceeding 100. This goes to show that the model is no longer overfitting.

This exercise taught you how to fix overfitting by using RidgeRegression to train a new model.