# Polynomial Regression

## Benefits
- Improves Linear Regression
- Can model higher order relationships
- Non linear feature relatuionships with label

If the relationship between the feature and label is not linear, then the relationship between a higher order feature and label might be linear

> Idea: Features at higher orders will have relationships with label

Points to be noted:
1. Synergy between features
2. Not all features in higher dimensions will have relationships with the label

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('Advertising.csv')

In [5]:
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


## Perform feature engineering to obtain polynomial features

### Separate Features and Label

In [6]:
X = df.drop('sales', axis=1)

In [7]:
y = df['sales']

### Polynomial features

We have to provide a degree to indicate how many polynomial features we want. Scikit learn also provides **interaction terms**

In [8]:
from sklearn.preprocessing import PolynomialFeatures

In [9]:
polynomial_converter = PolynomialFeatures(degree=2, include_bias=False)

In [10]:
polynomial_converter.fit(X)

PolynomialFeatures(include_bias=False)

Transformed features

**Went from 3 features to 9**

In [12]:
poly_features = polynomial_converter.transform(X)

In [14]:
poly_features.shape

(200, 9)

In [15]:
X.shape

(200, 3)

In [18]:
# X1, X2, X3, X1^2, X2^2, X3^2,X1*X2, X1*X3, X2*X3, 
poly_features[0]

array([2.301000e+02, 3.780000e+01, 6.920000e+01, 5.294601e+04,
       8.697780e+03, 1.592292e+04, 1.428840e+03, 2.615760e+03,
       4.788640e+03])

In [19]:
X.loc[0]

TV           230.1
radio         37.8
newspaper     69.2
Name: 0, dtype: float64

## Training and Evaluating the model

In [20]:
from sklearn.model_selection import train_test_split

In [22]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)

### Import Model

In [23]:
from sklearn.linear_model import LinearRegression

In [24]:
model = LinearRegression()

### Perform linear regression with polynomial features

In [26]:
model.fit(X_train, y_train)

LinearRegression()

### Evaluate Performance

In [28]:
test_predictions = model.predict(X_test)

In [29]:
model.coef_

array([ 5.17095811e-02,  1.30848864e-02,  1.20000085e-02, -1.10892474e-04,
        1.14212673e-03, -5.24100082e-05,  3.34919737e-05,  1.46380310e-04,
       -3.04715806e-05])

In [30]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [31]:
MAE = mean_absolute_error(y_test, test_predictions)

In [32]:
MSE = mean_squared_error(y_test, test_predictions)

In [33]:
RMSE = np.sqrt(MSE)

In [35]:
print(MAE, MSE, RMSE)

0.48967980448038373 0.4417505510403753 0.6646431757269274


When we did Linear Regression on the same data with linear features,

we got **MAE:1.213, RMSE: 1.516**

> In this case polynomial regression works much better

## Deploying model

In [38]:
final_model = LinearRegression()
final_model.fit(poly_features, y)
final_model.coef_

array([ 5.16525487e-02,  2.10742970e-02,  6.88373531e-03, -1.09702663e-04,
        1.10525949e-03, -4.55155391e-05,  1.11997015e-04,  8.26605896e-05,
        1.19125650e-05])

In [37]:
from joblib import load, dump

In [40]:
dump(final_model, 'polynomial_regression_sales.joblib')

['polynomial_regression_sales.joblib']