# Polynomial Regression Example

Notebook to demonstrate Polynomial Regression

Dataset source: https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ 
Metadata:  https://archive.ics.uci.edu/ml/datasets/Energy+efficiency

Both Y1 and Y2 can be used as target variables

In [17]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PolynomialFeatures 
from sklearn.model_selection import train_test_split

Note: in sklearn you modify the original data by adding polynomial (and optionally interaction) features and fit those to a linear model 

Note: no need to explore data, already done in other notebooks and this is ready as is

In [None]:
enb = pd.read_excel('ENB2012_data.xlsx')
#Select df columns with regex:
X = enb.filter(regex=("X.*"))
y = enb.filter(regex=("Y1"))

In [6]:
poly = PolynomialFeatures(degree=2)

In [8]:
X_poly = poly.fit_transform(X)

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X_poly,y)

In [10]:
regressor = LinearRegression().fit(X_train, y_train)

In [14]:
print('(poly deg 2) linear model coeff (w):\n{}'.format(regressor.coef_))
print('(poly deg 2) linear model intercept (b): {}'.format(regressor.intercept_))

(poly deg 2) linear model coeff (w):
[[-3.92564239e+07  3.95972358e+06  2.30953045e+05 -2.25517384e+05
  -4.39516047e+05 -8.48258380e+05  2.73540051e+00  4.99223593e+00
  -1.54350692e+00 -1.73532419e+05 -2.77698145e+04  2.74220116e+04
   4.30246638e+04 -2.45497103e+05 -1.34953689e+00  4.44865147e+01
   1.82529540e+00  4.32266177e+07 -5.10069068e+07 -8.73159434e+07
   7.40787193e+04 -1.18745133e+05 -8.85632601e+03  6.68053852e+04
   7.78028886e+06  1.64232685e+07 -7.44291600e+04  1.18745132e+05
   8.85638241e+03 -6.68053836e+04  1.72537245e+06 -1.47281898e+05
   2.37490261e+05  1.77125593e+04 -1.33610767e+05  8.57659406e+04
  -9.14943473e-02 -1.81209796e+00  2.70193773e-02  3.63729832e-03
  -3.77306157e-02  1.96225307e-02 -1.58078286e+01 -1.75292101e+00
  -1.20909621e-01]
 [ 1.68032205e+08 -3.64622723e+07 -2.13525461e+06  2.08285609e+06
   4.04455242e+06  7.81723618e+06 -1.30584487e+00  1.10676344e+02
  -2.62918909e+00 -2.45416966e+05  2.58641825e+05 -2.59135444e+05
  -3.91909581e+05  2

In [15]:
print('(poly deg 2) R-squared score (training): {}'
     .format(regressor.score(X_train, y_train)))
print('(poly deg 2) R-squared score (test): {}\n'
     .format(regressor.score(X_test, y_test)))

(poly deg 2) R-squared score (training): 0.9847802269841436
(poly deg 2) R-squared score (test): 0.9810493522210394



Frequently, polynomial features are used in conjunction with regularization to avoid overfitting

In [18]:
regressor_reg = Ridge().fit(X_train, y_train)

In [19]:
print('(poly deg 2) linear model coeff (w):\n{}'.format(regressor_reg.coef_))
print('(poly deg 2) linear model intercept (b): {}'.format(regressor_reg.intercept_))

(poly deg 2) linear model coeff (w):
[[ 0.00000000e+00  2.13203534e-01  1.80864755e+00 -7.86433017e+00
   4.83648885e+00 -1.53826042e-01 -6.11920942e-01  8.40085906e-01
   3.65262156e-02  6.41232910e-01 -6.47595628e-01  4.87816609e+00
  -2.76288086e+00  3.34596126e+00 -3.70560022e-01  1.61435836e+00
   9.05600337e-01 -1.05367627e-03  4.47814300e-03 -2.75442446e-03
   3.29058713e-03  1.13242903e-03  3.75864138e-03  7.73980049e-04
   2.49216762e-04  2.10677813e-03  1.29731358e-01 -1.97231879e-03
   3.15416614e-02 -1.14721431e-03 -2.43275704e-03 -6.32203836e-02
   1.55238186e-03 -1.38915102e-02  9.60564541e-04 -1.61517344e+00
   1.40247391e-01  2.55976836e+00  8.69946516e-02 -1.27564875e-02
  -1.21705179e-01 -2.43405901e-02 -7.15316241e-01 -2.14993462e+00
  -1.42497417e-01]
 [ 0.00000000e+00  2.37645929e-01  1.55235278e+00 -6.15957402e+00
   3.85596339e+00 -1.22768083e-01 -1.49246590e+00  8.86803420e-01
  -7.02361538e-02  7.11891003e-01 -4.32188429e-01  3.62977690e+00
  -2.03098267e+00  3

In [20]:
print('(poly deg 2) R-squared score (training): {}'
     .format(regressor_reg.score(X_train, y_train)))
print('(poly deg 2) R-squared score (test): {}\n'
     .format(regressor_reg.score(X_test, y_test)))

(poly deg 2) R-squared score (training): 0.9489003317600034
(poly deg 2) R-squared score (test): 0.9417996931889339



In our case, regularization did not help