# Kernel Ridge Regression

## Theory
### 1. Linear regression
The goal of linear regression is, given X input feature vectors and their Y labels:

$$ X \in \mathbb{R}^{N \times d}, W \in \mathbb{R}^{d \times 1}, Y \in \mathbb{R}^{N \times 1} $$

To minimize the average mean squared error between labels y and predictions W^Tx_i
$$\textrm{MSE}(X,Y,W) = \frac{1}{N} \sum_{i}\left(y_i - W^Tx_i\right)^2$$
That is 
$$\textrm{OBJ}(X,Y) = min_w \textrm{MSE}(X,Y,W)$$


This problem is convex, so this optimization problem is well-posed, and admits one unique solution, where the derivative of the objective w.r.t. the parameter W is null.

$$ \frac{\partial \textrm{OBJ(X,Y,W)}}{\partial W} = 0$$
$$ \Leftrightarrow \frac{1}{N} \sum_{i} 2(y_i-W^Tx_i)x_i = 0$$ 
$$ \Leftrightarrow \sum_{i} y_ix_i=  \sum_{j}(W^Tx_j)x_j $$ 
$$ \Leftrightarrow \sum_{i} y_ix_i=  \sum_{j}x_j(x_j^TW) $$ 
$$ \Leftrightarrow \sum_{i} y_ix_i=  \sum_{j}\left(x_jx_j^T\right)W $$ 
$$ \Leftrightarrow W =  \sum_{i} y_i x_i \left(\sum_{j}x_jx_j^T\right)^{-1} $$ 


### Features & Labels generation

In [26]:
import numpy as np
N = 10
d = 100
X = np.random.rand(d,N)
Y = np.random.choice([-1,1], size=N)
W = np.random.rand(d)

### Definition of the objective

In [27]:
MSE = 1/N * ((Y - W.dot(X))**2).sum()
print(MSE)

684.7317325141285


### Optimization of the objective

### 2. Kernel ridge regression
The goal of linear ridge regression is 