## Fixing Model Overfitting Using Lasso Regression
The goal of this exercise is to teach you how to identify when your model starts overfitting, and to use lasso regression to fix overfitting in your model.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures

In [2]:
_df = pd.read_csv('https://raw.githubusercontent.com/'\
                 'PacktWorkshops/The-Data-Science-Workshop/'\
                 'master/Chapter07/Dataset/ccpp.csv')
_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9568 entries, 0 to 9567
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AT      9568 non-null   float64
 1   V       9568 non-null   float64
 2   AP      9568 non-null   float64
 3   RH      9568 non-null   float64
 4   PE      9568 non-null   float64
dtypes: float64(5)
memory usage: 373.9 KB


- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Net hourly electrical energy output (PE) 420.26-495.76 MW

In [3]:
# features and labels
X = _df.drop(['PE'], axis=1).values
y = _df['PE'].values

In [4]:
# split data into training and evaluation sets
train_X, eval_X, train_y, eval_y = train_test_split(X, y, train_size=0.8, random_state=0)
#val_X, test_X, val_y, test_y = train_test_split(eval_X, eval_y, random_state=0)

In [5]:
# instantiate LinearRegression
lr_model_1 = LinearRegression()

#fit model
lr_model_1.fit(train_X, train_y)

LinearRegression()

In [6]:
# make predictions on the evaluation dataset
lr_model_1_preds = lr_model_1.predict(eval_X)

In [7]:
# R2 of the model
print('lr_model_1 Score: {}'.format(lr_model_1.score(eval_X, eval_y)))

lr_model_1 Score: 0.9325315554761303


In [8]:
# MSE
print('lr_model_1 MSE: {}'.format(mean_squared_error(eval_y, lr_model_1_preds)))

lr_model_1 MSE: 19.733699303497637
