### One class SVM

Given a set of in-group points known to be in a particular class (e.g. a set of points corresponding to machines operating within specified tolerances), the one-class SVM problem is to come up with a way of distinguishing points in this class with points outside this class (e.g. machines operating outside tolerances). The difficulty with this problem is that the set of out-group points is not necessarily given, so it's only possible to train a "one-sided" SVM. 

One way of solving this problem is to find the center and radius of a sphere that best encapsulateds the data. Kernel functions (e.g. polynomials) can be applied to the data first - this allows oddly-shaped datasets to be treated under this method. Given data $n$-vectors $a_1,...,a_m$, the optimization problem is:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \sum_{i=1}^m \max(\|a_i - x\|_2^2 - \rho, 0) + \lambda \max(\rho, 0)  \\
  \end{aligned}
\end{equation*}

where the variables are the sphere's center $x$ and the sphere radius $\rho$.

This problem can be equivalently represented in epigraph form as:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && \sum_{i=1}^m \max(\|a_i\|^2 - 2Ax + t - \rho, 0) + \lambda \max(\rho, 0)  \\
    &\text{subject to} && \|x\|_2^2 \leq t 
  \end{aligned}
\end{equation*}

with variables $x$, $\rho$, and epigraph variable $t$.

Note that the original formulation of this problem is:

\begin{equation*}
  \begin{aligned}
    &\text{minimize} && R^2 + \lambda\sum_{i = 1}^m \xi_i \\
    &\text{subject to} && \|a_i - x\|_2^2 \leq R^2 + \xi_i &&&i = 1,...,m
                      & && \xi_i \geq 0
  \end{aligned}
\end{equation*}

with variables $x$, the sphere's center, $R^2$, the squared radius of the sphere, and $\xi$, a dummy variable.

References:

http://www.jmlr.org/papers/volume2/tax01a/tax01a.pdf

http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/



In [12]:
import cvxpy as cp
import numpy as np
import scipy as sp

# setup

problemID = "oneclass_svm_0"
prob = None
opt_val = None

# Variable declarations

np.random.seed(0)
m = 5000
n = 200

# Generate random points uniform over hypersphere
A = np.random.randn(m, n)
A /= np.sqrt(np.sum(A**2, axis=1))[:,np.newaxis]
A *= (np.random.rand(m)**(1./n))[:,np.newaxis]

# Shift points and add some outliers
x0 = np.random.randn(n) # Random shift
A += x0

k = max(m//50, 1) # Number of outliers
idx = np.random.randint(0, m, k)
A[idx, :] += np.random.randn(k, n)
lam = 1


# Problem 1: Unconstrained
x1 = cp.Variable(n) # The center of the sphere
rho1 = cp.Variable(1) # Radius
z1 = np.sum(A**2, axis=1) - 2*A*x1 + cp.sum_squares(x1)  # z_i = ||a_i - x||^2
f = cp.sum_entries(cp.max_elemwise(z1 - rho1, 0)) + lam*cp.max_elemwise(0, rho1)

prob1 = cp.Problem(cp.Minimize(f))

# Problem 2: Epigraph form
x2 = cp.Variable(n)
rho2 = cp.Variable(1)
t = cp.Variable(1)
z2 = np.sum(A**2, axis=1) - 2*A*x2 + t  # z_i = ||a_i - x||^2
f2 = cp.sum_entries(cp.max_elemwise(z2-rho2, 0)) + lam*cp.sum_entries(cp.max_elemwise(rho2, 0))
C = [cp.sum_squares(x2) <= t]

prob2 = cp.Problem(cp.Minimize(f2), C)

# Problem 3: Original formulation
x3 = cp.Variable(n)
rho3 = cp.Variable(1)
xi = cp.Variable(m)

f3 = rho3 + lam*cp.sum_entries(xi)
C3 = [rho3 >= 0]
for i in range(m):
    C3 += [cp.sum_squares(A[i,:].T - x3) <= rho3 + xi[i]]
    C3 += [xi[i] >= 0]
prob3 = cp.Problem(cp.Minimize(f3), C3)

# Problem collection

# Single problem collection
problem1Dict = {
    "problemID" : problemID,
    "problem"   : prob1,
    "opt_val"   : opt_val
}
problem2Dict = {
    "problemID" : problemID+"_epigraph",
    "problem"   : prob2,
    "opt_val"   : opt_val
}
problem3Dict = {
    "problemID" : problemID+"_epigraph",
    "problem"   : prob3,
    "opt_val"   : opt_val
}
problems = [problem1Dict, problem2Dict, problem3Dict]



# For debugging individual problems:
if __name__ == "__main__":
    def printResults(problemID = "", problem = None, opt_val = None):
        print(problemID)
        problem.solve()
        print("\tstatus: {}".format(problem.status))
        print("\toptimal value: {}".format(problem.value))
        print("\ttrue optimal value: {}".format(opt_val))
    printResults(**problems[0]) 
    printResults(**problems[1])
    printResults(**problems[2])

oneclass_svm_0
	status: optimal
	optimal value: 221.6838844894541
	true optimal value: None
oneclass_svm_0_epigraph
	status: optimal
	optimal value: 221.6838844538567
	true optimal value: None
oneclass_svm_0_epigraph
	status: optimal
	optimal value: 221.68385486554394
	true optimal value: None
