You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Least Squares Support Vector Machines (LS-SVM), Kernel Ridge Regression (KRR), Radial Basis Function Networks (RBFN), Gaussian Processes (GP) and Kriging are regression models that are based on the same idea of decomposing the target $y$ in the Radial Basis Function (RBF) basis generated by all observations $x_i \in \mathbb{R}^d$, $i=1,\ldots,N$. More specifically,
In other words, the value $y(x)$ is determined as a linear combination of kernel functions (in this case Gaussian kernels) centered at the observations $x_i$.
Given observed targets $y_i$ corresponding to the observations $x_i$, the dual weights $\alpha$ and bias $b$ can be learned from the least squares problem
$$\arg \min_{\alpha,,b} |K \cdot \alpha + b - y|^2$$
where $K$ is called the kernel matrix and is defined as
However, learning $\alpha$ this way leads to low generalization performance because $y(x)$ interpolates the $y_i$. In the case of a classification task, we want the solution to strike a good balance between the following three properties:
Low bias. This goal is captured by the least squares objective above.
Maximum margin. Among all classifiers, we are more interested in classifiers whose decision boundary have a maximal margin or distance to the observed classes. Maximizing the margin is equivalent to adding a primal regularization term $|w|^2 \equiv \alpha^T\cdot K \cdot \alpha$.
Minimal complexity. Among all classifiers, we are more interested in classifiers whose decision boundary is as smooth as possible. This can be captured elegantly in a dual regularization term of the form $\alpha^T \cdot \begin{bmatrix} \sqrt{k(x_i,x_j)} \left( 2 - \gamma|x_i - x_j|^2\right) \end{bmatrix}_{i,j} \cdot \alpha := \alpha^T \cdot L \cdot \alpha$.
Combining these into a single objective leads to the least squares problem
$$\arg \min_{\alpha,,b} \frac{1-\rho}{2} |K \cdot \alpha + b - y|^2 + \frac{\rho}{2}\alpha^T\cdot ((1-\delta) K + \delta L)\cdot \alpha$$
where the hyperparemeter $\rho \in (0,1)$ controls the amount of regularization and the hyperparameter $\delta \in (0,1)$ controls the balance between primal and dual regularization.
where $R$ is the matrix square root of the regularization term $(1-\delta)K + \delta L$ and P is the diagonal matrix $\mathrm{diag}(\sqrt{1-\rho}\mathbb{I}_n,\sqrt{\rho}\mathbb{I}_n)$.
Removing the hyperparameter $\rho$ for a moment, the normal equations associated with this least squares problem are given by
Can we go from our system to a more simple system simlar to LS-SVM? Why is the last equation so different?
Marc VB: Als je de eerste blokvergelijking van het eerste stelsel langs links vermenigvuldigt met K^T^{-1}, krijg je de eerste blokvergelijking van het tweede stelsel. Als je daarna alle vergelijkingen van de eerste blokrij optelt en daarna de laatste (blok) vergelijking ervan aftrekt, krijg je de tweede (blok) vergelijking van het tweede stelsel (als mu verschilt van 0).