# Support Vector Machine (SVM)

### Chapter 1 foundation of SVM

Suppose we have an optimization problem:
$$
\begin{aligned}
\min_{\boldsymbol{x}}f(\boldsymbol{x}) \\
s.t. \quad & h_{i}(\boldsymbol{x})\leq 0, &i=1,2,\cdots,m \\
     \quad & g_{j}(\boldsymbol{x})=0,     &j=1,2,\cdots,n
\end{aligned}
$$
where $f(\boldsymbol{x})$ is the objective function, $h_{i}(\boldsymbol{x})$ are inequality constraints, $g_{j}(\boldsymbol{x})$ are equality constraints.<br>
This optimization problem is called **primal problem**. The Lagrange function is:
$$
L(\boldsymbol{x},\boldsymbol{\lambda},\boldsymbol{\mu})=f(\boldsymbol{x})+\sum_{i=1}^{m}\lambda_{i}h_{i}(\boldsymbol{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\boldsymbol{x})
$$
where $\lambda_{i}$ and $\mu_{j}$ are Lagrange multipliers.<br>

Define the **Lagrange dual function** $\Gamma(\boldsymbol{\lambda},\boldsymbol{\mu}) $ as:
$$\begin{aligned}
&\Gamma(\boldsymbol{\lambda},\boldsymbol{\mu})=\inf_{\boldsymbol{x}\in \Psi}L(\boldsymbol{x},\boldsymbol{\lambda},\boldsymbol{\mu}) \\
= &\inf_{\boldsymbol{x}\in \Psi}\{ f(\boldsymbol{x})+\sum_{i=1}^{m}\lambda_{i}h_{i}(\boldsymbol{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\boldsymbol{x}) \}
\end{aligned}$$
where $\Psi$ is the feasible set of primal problem and $\boldsymbol{\lambda}\geq 0$.<br>
$$
\begin{aligned}
&\because \lambda_{i}\geq 0,\quad h_{i}(\boldsymbol{x})\leq 0, \quad g_{i}(\boldsymbol{x})=0 \\
&\therefore \sum_{i=1}^{m}\lambda_{i}h_{i}(\boldsymbol{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\boldsymbol{x})\leq 0\\
&\therefore \Gamma(\boldsymbol{\lambda},\boldsymbol{\mu})\leq L(\boldsymbol{x},\boldsymbol{\lambda},\boldsymbol{\mu}) \leq f(\boldsymbol{x})\\
&\therefore \forall \boldsymbol{\lambda}\geq 0,\boldsymbol{\mu},& \\
&\quad \quad\Gamma(\boldsymbol{\lambda},\boldsymbol{\mu})\leq p^{*}\equiv \min_{\boldsymbol{x}}f(\boldsymbol{x})
\end{aligned}
$$
where $p^{*}$ is the optimal value of primal problem.<br>
From the above, we can get the **Lagrange dual problem**:
$$
\begin{aligned}
\max_{\boldsymbol{\lambda},\boldsymbol{\mu}}\quad &\Gamma(\boldsymbol{\lambda},\boldsymbol{\mu}) \\
s.t. \quad &\boldsymbol{\lambda}\geq 0
\end{aligned}
$$
The optimal value of dual problem is $d^{*}$, and $d^{*}\leq p^{*}$.

If $d^{*}=p^{*}$, we say the strong duality holds. If $d^{*}<p^{*}$, we say the weak duality holds.<br>
KKT(Karush-Kuhn-Tucker) conditions:
$$
\begin{aligned}
&\nabla_{\boldsymbol{x}}L(\boldsymbol{x}^{*},\boldsymbol{\lambda}^{*},\boldsymbol{\mu}^{*})=0 \\ 
& \lambda_{i}^{*}h_{i}(\boldsymbol{x}^{*})=0 \\
&h_{i}(\boldsymbol{x}^{*})\leq 0, g_{j}(\boldsymbol{x}^{*})=0 \\
&\lambda_{i}^{*}\geq 0
\end{aligned}
$$
where $\boldsymbol{x}^{*}$ is the optimal solution of primal problem, $\boldsymbol{\lambda}^{*}$ is the optimal solution of dual problem.<br>

Proof:
$$
\begin{aligned}
&\text{If strong dualily holds. Then} \\
&f(\boldsymbol{x}^{*})=d^{*}=\Gamma(\boldsymbol{\lambda}^{*},\boldsymbol{\mu}^{*}) \\
&=\inf_{\boldsymbol{x}\in \Psi}\{ f(\boldsymbol{x})+\sum_{i=1}^{m}\lambda_{i}^{*}h_{i}(\boldsymbol{x})+\sum_{j=1}^{n}\mu_{j}^{*}g_{j}(\boldsymbol{x}) \}\\
&\leq f(\boldsymbol{x}^{*})+\sum_{i=1}^{m}\lambda_{i}^{*}h_{i}(\boldsymbol{x}^{*})+\sum_{j=1}^{n}\mu_{j}^{*}g_{j}(\boldsymbol{x}^{*}) \\
&\leq f(\boldsymbol{x}^{*})
\end{aligned}
$$
From the first inequality we obtain the first KKT condition $\nabla_{\boldsymbol{x}}L(\boldsymbol{x}^{*},\boldsymbol{\lambda}^{*},\boldsymbol{\mu}^{*})=0$. From the second inequality we obtain the second KKT condition $\lambda_{i}^{*}h_{i}(\boldsymbol{x}^{*})=0 $, which is called complementary slackness. The third KKT condition comes from constraints of the prime problem. The fourth KKT condition comes from the constraints of the dual problem.<br>