### Robust SVM

The robust SVM problem is an extention to the original SVM problem that takes into account measurement uncertainty or variation in the location of the data points that we wish to separate. A typical ($\ell_2$-regularized) SVM might look like:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i a_i^T x, 0) \\
  \end{aligned}
\end{equation*}

where $x$ is the variable, $a_1,...,a_m$ are the (zero-centered) data vectors, and $b_1,...,b_m$ are their respective labels in $\{-1, +1\}$.

To add robustness to our model, we would like to find the value of $x$ that is still optimal when the data values $a_i$ have all been perturbed in some fashion.. For example, assume the data values $a_i$ are perturbed as follows:

\begin{equation}
a_i \to a_i + P\delta_i
\end{equation}

where $P$ is a (known) perturbation matrix and $\delta = (\delta_1,...,\delta_m)$ is an (unknown) collection of vectors constrained to be in some set $\mathcal{D}$.

Taking $\mathcal{D}$ to be

\begin{equation}
\mathcal{D} = \{ \|\delta_i\|_\infty \leq 1 \;\mid\; i = 1,...,m \}
\end{equation}

we produce the modified objective:

\begin{equation}
\max_{\delta \in \mathcal{D}}\left[ \lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i (a_i + P\delta)^T x, 0) \right]
\end{equation}
\begin{equation}
= \max_{\delta \in \mathcal{D}}\left[ \lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i a_i^Tx + \delta^T P^Tx, 0) \right]
\end{equation}
\begin{equation}
=\lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i a_i^Tx + \|P^T x\|_1, 0)
\end{equation}

The above derivation uses the definition of dual norm (See Convex Optimization, Boyd and Vandenberghe, Appendix A1.6) and the fact that the $\ell_\infty$ and $\ell_1$ norms are dual (also see Hölder's inequality). Because we are maximizing over $\mathcal{D}$, this is the **worst-case** robust SVM.

Putting all this together, we form the new robust, regularized SVM problem:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i a_i^Tx + \|P^T x\|_1, 0) \\
  \end{aligned}
\end{equation*}

with variable $x$, data vectors $a_i$ with corresponding labels $b_i$, and perturbation matrix $P$.

We can also write this problem in an epigraph form:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \lambda\|x\|^2 + \sum_{i = 1}^m \max(1 - b_i a_i^Tx + t, 0) \\
    &\text{subject to} && \|P^T x\|_1 \leq t 
  \end{aligned}
\end{equation*}


References:

https://people.eecs.berkeley.edu/~elghaoui/Talks/talkNeyman2008.pdf


In [5]:
import cvxpy as cp
import numpy as np
import scipy as sp

# setup

problemID = "robust_svm_0"
prob = None
opt_val = None




# Variable declarations

import scipy.sparse as sps
import scipy.linalg as la

def normalized_data_matrix(m, n, mu):
    if mu == 1:
        # dense
        A = np.random.randn(m, n)
        A /= np.sqrt(np.sum(A**2, 0))
    else:
        # sparse
        A = sps.rand(m, n, mu)
        A.data = np.random.randn(A.nnz)
        N = A.copy()
        N.data = N.data**2
        A = A*sps.diags([1 / np.sqrt(np.ravel(N.sum(axis=0)))], [0])

    return A

np.random.seed(0)
m = 200
n = 60
mu = 1
rho = 1
sigma = 0.1

A = normalized_data_matrix(m, n, mu)
x0 = sps.rand(n, 1, rho)
x0.data = np.random.randn(x0.nnz)
x0 = x0.toarray().ravel()

# Move positive entries more positive, negative entries more negative.
# (w.r.t. inner product with x_0)
b = np.sign(A.dot(x0) + sigma*np.random.randn(m))
A[b>0,:] += 0.7*np.tile([x0], (np.sum(b>0),1))
A[b<0,:] -= 0.7*np.tile([x0], (np.sum(b<0),1))

# Noise Perturbation matrix
P = la.block_diag(np.random.randn(n-1,n-1), 0)
lam = 1

# Problem 1: Unconstrained
x1 = cp.Variable(n)
z1 = 1 - sps.diags([b],[0])*A*x1 + cp.norm1(P.T*x1) 
f = lam*cp.sum_squares(x1) + cp.sum_entries(cp.max_elemwise(z1, 0))
prob1 = cp.Problem(cp.Minimize(f))

# Problem 2: Epigraph formulation
x2 = cp.Variable(A.shape[1])
t = cp.Variable(1)
z2 = 1 - sps.diags([b],[0])*A*x2 + t
f = lam*cp.sum_squares(x2) + cp.sum_entries(cp.max_elemwise(z2, 0))
C = [cp.norm1(P.T*x2) <= t]
prob2 = cp.Problem(cp.Minimize(f), C)


# Problem collection

# Single problem collection
problem1Dict = {
    "problemID" : problemID,
    "problem"   : prob1,
    "opt_val"   : opt_val
}
problem2Dict = {
    "problemID" : problemID+"_epigraph",
    "problem"   : prob2,
    "opt_val"   : opt_val
}

problems = [problem1Dict, problem2Dict]

# For debugging individual problems:
if __name__ == "__main__":
    def printResults(problemID = "", problem = None, opt_val = None):
        print(problemID)
        problem.solve()
        print("\tstatus: {}".format(problem.status))
        print("\toptimal value: {}".format(problem.value))
        print("\ttrue optimal value: {}".format(opt_val))
    printResults(**problems[0])
    printResults(**problems[1])




robust_svm_0
	status: optimal
	optimal value: 116.65257077276507
	true optimal value: None
robust_svm_0_epigraph
	status: optimal
	optimal value: 116.65257125050829
	true optimal value: None
