# PolynomialFeatures

How does polynomial regression work?

Suppose we have a database $X$ with predictor variables $A$, $B$ and response variable (y variable) $T$. A observation of these variables is given by $(a_0, b_0)$, with response $t_0$


* A linear approximation of $t_0$ (suppose values $\beta_i$ were found using any algorithm like Least Squares) is given by:

  $t^{(1)}_0=f^{(1)}(a_0, b_0)= \beta^{(1)}_0 + \beta^{(1)}_1 a_0 + \beta^{(1)}_2 b_0 $

* A quadratic approximation (a polynomial of degree 2) of $t_0$ is given by:

  $t^{(2)}_0=f^{(2)}(a_0, b_0)= \beta^{(2)}_0 + \beta^{(2)}_1 a_0 + \beta^{(2)}_2 b_0 + \beta^{(2)}_4 a_0^2 + \beta^{(2)}_5 b_0^2 + \beta^{(2)}_7 a_0 b_0$

* Note that we can say "approximation by quadratic function" as "approximation by linear function" if we consider $a_0^2$, $b_0^2$ and $a_0b_0$ as new variables, $c_0, d_0$ and $e_0$ in equation:

  $t^{(3)}_0=f^{(3)}(a_0, b_0, c_0, d_0, e_0)= \beta^{(3)}_0 + \beta^{(3)}_1 a_0 + \beta^{(3)}_2 b_0 + \beta^{(3)}_3 c_0 + \beta^{(3)}_4 d_0 + \beta^{(3)}_5 e_0$

* We just need to transform $(a, b)$ to $(a, b, c, d, e)$, with $c=a^2, d=b^2$ and $e=ab$.


---


Fortunately, the class **PolynomialFeatures** is very useful since it allows to make the tranform rather easily, utilizing the same syntax used to create a linear model.

It is possible to instantiate a polynomial with any degree and then transform the dataset:

```python
p = PolynomialFeatures(degree=2)
X_transf = p.fit_transform(X)
```

Now, instead of training our model on dataset $X$, we train it on $X\_transf$.


In [14]:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np

x = np.array([2]).reshape(1, -1)
p = PolynomialFeatures(degree=3)

X_Transf = p.fit_transform(x)
print(p.get_feature_names())
print(X_Transf)

['1', 'x0', 'x0^2', 'x0^3']
[[1. 2. 4. 8.]]
