<a href="https://colab.research.google.com/github/gr3ybr0w/Machine_Learning_Examples/blob/master/regression/PolynomialRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Polynomial Regression

What if your data doesn't look linear at all? Let's look at some more realistic-looking page speed / purchase data:

In [3]:
import altair as alt
import numpy as np
import pandas as pd
from pylab import *

np.random.seed(2)
pageSpeeds = np.random.normal(3.0, 1.0, 1000)
purchaseAmount = np.random.normal(50.0, 10.0, 1000) / pageSpeeds

data = pd.DataFrame({'pageSpeeds':pageSpeeds, 'purchaseAmount':purchaseAmount})

scatter_chart = alt.Chart(data).mark_point().encode(x='pageSpeeds:Q',  y='purchaseAmount:Q')
scatter_chart

numpy has a handy polyfit function we can use, to let us construct an nth-degree polynomial model of our data that minimizes squared error. Let's try it with a 4th degree polynomial:

In [0]:
x = np.array(pageSpeeds)
y = np.array(purchaseAmount)

p4 = np.poly1d(np.polyfit(x, y, 4))

We'll visualize our original scatter plot, together with a plot of our predicted values using the polynomial for page speed times ranging from 0-7 seconds:

In [7]:
xp = np.linspace(0, 7,  100)
p4_xp = p4(xp)

pred_data = pd.DataFrame({'xp':xp, 'p4_xp': p4_xp})

line_chart = alt.Chart(pred_data).mark_line(color='red').encode(x='xp:Q', y='p4_xp:Q')
scatter_chart + line_chart

Looks pretty good! Let's measure the r-squared error:

In [8]:
from sklearn.metrics import r2_score

r2 = r2_score(y, p4(x))

print(r2)


0.8293766396303073


## Activity

Try different polynomial orders. Can you get a better fit with higher orders? Do you start to see overfitting, even though the r-squared score looks good for this particular data set?