# Support Vector Machine (SVM)

### Chapter 1 foundation of SVM

Suppose we have an optimization problem:
$$
\begin{aligned}
\min_{\bm{x}}f(\bm{x}) \\
s.t. \quad & h_{i}(\bm{x})\leq 0, &i=1,2,\cdots,m \\
     \quad & g_{j}(\bm{x})=0,     &j=1,2,\cdots,n
\end{aligned}
$$
where $f(\bm{x})$ is the objective function, $h_{i}(\bm{x})$ are inequality constraints, $g_{j}(\bm{x})$ are equality constraints.<br>
This optimization problem is called **primal problem**. The Lagrange function is:
$$
L(\bm{x},\bm{\lambda},\bm{\mu})=f(\bm{x})+\sum_{i=1}^{m}\lambda_{i}h_{i}(\bm{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\bm{x})
$$
where $\lambda_{i}$ and $\mu_{j}$ are Lagrange multipliers.<br>

Define the **Lagrange dual function** $\Gamma(\bm{\lambda},\bm{\mu}) $ as:
$$\begin{aligned}
&\Gamma(\bm{\lambda},\bm{\mu})=\inf_{\bm{x}\in \Psi}L(\bm{x},\bm{\lambda},\bm{\mu}) \\
= &\inf_{\bm{x}\in \Psi}\{ f(\bm{x})+\sum_{i=1}^{m}\lambda_{i}h_{i}(\bm{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\bm{x}) \}
\end{aligned}$$
where $\Psi$ is the feasible set of primal problem and $\bm{\lambda}\geq 0$.<br>
$$
\begin{aligned}
&\because \lambda_{i}\geq 0,\quad h_{i}(\bm{x})\leq 0, \quad g_{i}(\bm{x})=0 \\
&\therefore \sum_{i=1}^{m}\lambda_{i}h_{i}(\bm{x})+\sum_{j=1}^{n}\mu_{j}g_{j}(\bm{x})\leq 0\\
&\therefore \Gamma(\bm{\lambda},\bm{\mu})\leq L(\bm{x},\bm{\lambda},\bm{\mu}) \leq f(\bm{x})\\
&\therefore \forall \bm{\lambda}\geq 0,\bm{\mu},& \\
&\quad \quad\Gamma(\bm{\lambda},\bm{\mu})\leq p^{*}\equiv \min_{\bm{x}}f(\bm{x})
\end{aligned}
$$
where $p^{*}$ is the optimal value of primal problem.<br>
From the above, we can get the **Lagrange dual problem**:
$$
\begin{aligned}
\max_{\bm{\lambda},\bm{\mu}}\quad &\Gamma(\bm{\lambda},\bm{\mu}) \\
s.t. \quad &\bm{\lambda}\geq 0
\end{aligned}
$$
The optimal value of dual problem is $d^{*}$, and $d^{*}\leq p^{*}$.

If $d^{*}=p^{*}$, we say the strong duality holds. If $d^{*}<p^{*}$, we say the weak duality holds.<br>
KKT(Karush-Kuhn-Tucker) conditions:
$$
\begin{aligned}
&\nabla_{\bm{x}}L(\bm{x}^{*},\bm{\lambda}^{*},\bm{\mu}^{*})=0 \\ 
& \lambda_{i}^{*}h_{i}(\bm{x}^{*})=0 \\
&h_{i}(\bm{x}^{*})\leq 0, g_{j}(\bm{x}^{*})=0 \\
&\lambda_{i}^{*}\geq 0
\end{aligned}
$$
where $\bm{x}^{*}$ is the optimal solution of primal problem, $\bm{\lambda}^{*}$ is the optimal solution of dual problem.<br>

Proof:
$$
\begin{aligned}
&\text{If strong dualily holds. Then} \\
&f(\bm{x}^{*})=d^{*}=\Gamma(\bm{\lambda}^{*},\bm{\mu}^{*}) \\
&=\inf_{\bm{x}\in \Psi}\{ f(\bm{x})+\sum_{i=1}^{m}\lambda_{i}^{*}h_{i}(\bm{x})+\sum_{j=1}^{n}\mu_{j}^{*}g_{j}(\bm{x}) \}\\
&\leq f(\bm{x}^{*})+\sum_{i=1}^{m}\lambda_{i}^{*}h_{i}(\bm{x}^{*})+\sum_{j=1}^{n}\mu_{j}^{*}g_{j}(\bm{x}^{*}) \\
&\leq f(\bm{x}^{*})
\end{aligned}
$$
From the first inequality we obtain the first KKT condition $\nabla_{\bm{x}}L(\bm{x}^{*},\bm{\lambda}^{*},\bm{\mu}^{*})=0$. From the second inequality we obtain the second KKT condition $\lambda_{i}^{*}h_{i}(\bm{x}^{*})=0 $, which is called complementary slackness. The third KKT condition comes from constraints of the prime problem. The fourth KKT condition comes from the constraints of the dual problem.<br>