# What's up with polynomial regression?

Why do we have to use this `PolynomialFeatures` thing from scikit? What does it do?

Let's imagine we have some data from which we know the true function we want our model to learn. This function is:
\begin{align}
y = 3x + 1x^2 -2
\end{align}

<img src="polynomial-graph.png" alt="Iris Setosa" style="width: 600px;"/>

In [1]:
import numpy
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures 


x_values = numpy.array([[-2], [-1], [0], [1], [2]])
y_values = numpy.array([[-4], [-4], [-2], [2], [8]])

In [2]:
x_values

array([[-2],
       [-1],
       [ 0],
       [ 1],
       [ 2]])

Polynomial regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power. Our original predictors were:
```
[-2, -1, 0, 1, 2]
```

In [3]:
transformer = PolynomialFeatures(degree=2)
x_values_transformed = transformer.fit_transform(x_values)

In [4]:
x_values_transformed

array([[ 1., -2.,  4.],
       [ 1., -1.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  1.,  1.],
       [ 1.,  2.,  4.]])

What the transformer has done for each value of x is to expand it from a single number into an array for three numbers:

1. The bias (always 1.0, the feature in which all polynomial powers are zero )
2. The original value
3. The value, squared

It has *extended the linear model* by adding extra predictors. 

We can now hand off the transformed values off to the linear regression model's ``fit`` method and it will understand that it needs to fit a second-degree polynomial instead of a straight line.


In [5]:
model = LinearRegression()
model.fit(x_values_transformed,y_values)

LinearRegression()

Now we can predict. In the same way we transformed out `x` inputs with the `PolynomialFeatures` class before training the model, we will need to transform any `x` values for which we want to predict a `y`:

In [6]:
# Values of x for which we want to predict a y
x_pred = [[3], [-3]]

x_pred_transformed = transformer.fit_transform(x_pred)

model.predict(x_pred_transformed)

array([[16.],
       [-2.]])

Are these correct?

\begin{align}
y = (3 * 3) + 3^2 -2 \\
y = (3 * -3) + -3^2 -2
\end{align}

Yes, they are! Our model correctly learned the function! If you still aren't convinced from just two examples that the model has correctly learned the true function:


In [7]:
intercept = model.intercept_[0]
slope = model.coef_[0]
print(f"Intercept is: {intercept}")
print(f"Slope is: {slope}")

Intercept is: -2.000000000000004
Slope is: [0. 3. 1.]


And our true function, again:
\begin{align}
y = 3x + 1x^2 -2
\end{align}