# Polynomial Regression (theory)

**Author**: [Gilyoung Cheong](https://www.linkedin.com/in/gycheong/)

**Disclaimer**. This article explains how to create a reasonable model that involves polynomials of input data. We focus on explaining theoretical aspect of an algrorithm that uses [sklearn.preprocessing.PolynomialFeatures](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) in conjuction with the linear regression algorithm. The way we use the terminology "polynomial regression" may not be so standard in academia, but the algorithm intoduced below seems to be quite standard in practice. It is a generalized version of the algorithm introduced in [this Wikipedia article](https://en.wikipedia.org/wiki/Polynomial_regression).

"Polynomial regression" is a [supervised machine learning](https://en.wikipedia.org/wiki/Supervised_learning) algorithm that remedies linear regression. Given input data $\boldsymbol{x}_1, \dots, \boldsymbol{x}_m \in \mathbb{R}^n$ and output data $\boldsymbol{y} = (y_1, \dots, y_n) \in \mathbb{R}^n$. Unlike linear regression, we also fix $d \in \mathbb{Z}_{\geq 1}$. Fix the $j$-th feature
$$\boldsymbol{x}_j = (x_{1j}, x_{2j}, \dots, x_{nj}) = \begin{bmatrix}
x_{1j} \\
x_{2j} \\
\vdots \\
x_{nj}
\end{bmatrix}$$ 
among the input data. We define the **$j$-th polynomial feature of degree $d$** of the given input data (or the **polynomial feature of the $j$-th feature $\boldsymbol{x}_j$ of degree $d$**) to be
$$\boldsymbol{x}_j^{(d)} = (1, x_{1j}, \dots, x_{nj}, x_{1j}^2, x_{1j}x_{2j}, \dots, x_{nj}^2, \dots, x_{nj}^d),$$
which consists of all monomials of degree $\leq d$ generated by $x_{1j}, \dots, x_{nj}$. We similarly define $\boldsymbol{y}^{(d)}$ for $\boldsymbol{y}$. Note that the length of the list $\boldsymbol{x}_j^{(d)}$ is equal to
$$N_{n,d} := \sum_{j=0}^d{n - 1 + j \choose j}.$$
We also note that the only degree $1$ part of each $\boldsymbol{x}_j^{(d)} \in \mathbb{R}^{N_{n,d}}$ is the actual input data $\boldsymbol{x}_j \in \mathbb{R}^n$.


**Polynomial regression of degree $d$** with respect to the data $((\boldsymbol{x}_1, \dots, \boldsymbol{x}_m), \boldsymbol{y})$ is linear regression with respect to the data $((\boldsymbol{x}_1^{(d)}, \dots, \boldsymbol{x}_m^{(d)}), \boldsymbol{y}^{(d)})$.

That is, it finds $\boldsymbol{\beta} = (\beta_0, \beta_1, \dots, \beta_m) \in \mathbb{R}^{m+1}$ such that $\boldsymbol{\hat{y}}^{(d),\boldsymbol{\beta}} = (\hat{y}^{(d),\boldsymbol{\beta}}_f)_{f \in M_{d}} \in \mathbb{R}^{N_{n,d}}$ defined by
$$\hat{y}^{(d), \boldsymbol{\beta}}_f := \beta_0 + \beta_1 f(\boldsymbol{x}_{1}) + \cdots + \beta_m f(\boldsymbol{x}_{m})$$
is the best possible approximation of $\boldsymbol{\hat{y}}^{(d)}$, where $M_d$ is the set of all monomial expressions of $n$ distinct intermeidates of degree $\leq d$. As in linear regression, the "best approximation" means that $\|\boldsymbol{y}^(d) - \boldsymbol{\hat{y}}^{(d)},\boldsymbol{\beta}\|$ is minimized, which is giving more conditions than minimzing $\|\boldsymbol{y} - \boldsymbol{\hat{y}}^{\boldsymbol{\beta}}\|$ in linear regression.

The choice of $\boldsymbol{\beta}$ that gives us the polynomial regression of degree $d$ is precisely those that satisfy
$$(X^{(d)})^TX^{(d)}\boldsymbol{\beta} = (X^{(d)})^T\boldsymbol{y},$$
where $X^{(d)}$ is the $N_{n,d} \times (m+1)$ matrix defined by
$$X^{(d)} := \begin{bmatrix}
\boldsymbol{1} & \boldsymbol{x}_1^{(d)} & \boldsymbol{x}_2^{(d)} & \cdots & \boldsymbol{x}_m^{(d)}
\end{bmatrix},$$
where
$$\boldsymbol{1} := (1, 1, \dots, 1) = \begin{bmatrix}
1 \\
1 \\
\vdots \\
1
\end{bmatrix} \in \mathbb{R}^{N_{n,d}}.$$
In particular, if $(X^{(d)})^TX^{(d)}$ is invertible, then we have a unique such choice:
$$\boldsymbol{\beta} = ((X^{(d)})^TX^{(d)})^{-1}(X^{(d)})^T\boldsymbol{y}.$$

**Remark**. The invertibility of $X^TX$ does not guarantee the invertibility of $(X^{(d)})^TX^{(d)}$. This can be observed with $m=1$ with any $d \geq 2$, recalling what [the determinant of a Vandermode matrix looks like](https://en.wikipedia.org/wiki/Vandermonde_matrix).