# Kernels

In some cases the problem is that we are looking for a linear boundary when the boundary in our space is non-linear. However is does not mean that there is no space in which this problem have no linear boundary. If we find the mapping to such space the problem could be solved. 

The question is if we can map the features space into more dimensions in such a way to make the problem linearly separable. Is is quite obvious that if we will use the function:
\begin{equation}
K(x) = (x_1^2, \sqrt(2) x_1 x_2, x_2^2)
\end{equation}
then we transfer our problem into $R^3$ space in which it is linearly separable.

Load the data set in the first place

In [1]:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.model_selection import train_test_split
import cvxopt

iris = load_iris()
data_set = iris.data
labels = iris.target

data_set = data_set[labels!=2]
labels = labels[labels!=2]

train_data_set, test_data_set, train_labels, test_labels = train_test_split(
    data_set, labels, test_size=0.2, random_state=15)

train_labels[train_labels<1] = -1
test_labels[test_labels<1] = -1

objects_count = len(train_labels)

The ``build_kernel`` function for building more than just the linear kernel needs to be modifier. The RBF kernel can be written as:
\begin{equation}
K=\exp(-\frac{|x-x'|^{2}}{2\sigma^{2}})
\end{equation}

In [2]:
def build_kernel(data_set, kernel_type='linear'):
    kernel = np.dot(data_set, data_set.T)
    if kernel_type == 'rbf':
        sigma = 1.0
        objects_count = len(data_set)
        b = np.ones((len(data_set), 1))
        kernel -= 0.5 * (np.dot((np.diag(kernel)*np.ones((1, objects_count))).T, b.T)
                         + np.dot(b, (np.diag(kernel) * np.ones((1, objects_count))).T.T))
        kernel = np.exp(kernel / (2. * sigma ** 2))
    return kernel

The training part is the same as in the case of linear kernel:

In [3]:
def train(train_data_set, train_labels, kernel_type='linear', C=10, threshold=1e-5):
    kernel = build_kernel(train_data_set, kernel_type=kernel_type)

    P = train_labels * train_labels.transpose() * kernel
    q = -np.ones((objects_count, 1))
    G = np.concatenate((np.eye(objects_count), -np.eye(objects_count)))
    h = np.concatenate((C * np.ones((objects_count, 1)), np.zeros((objects_count, 1))))

    A = train_labels.reshape(1, objects_count)
    A = A.astype(float)
    b = 0.0

    sol = cvxopt.solvers.qp(cvxopt.matrix(P), cvxopt.matrix(q), cvxopt.matrix(G), cvxopt.matrix(h), cvxopt.matrix(A), cvxopt.matrix(b))

    lambdas = np.array(sol['x'])

    support_vectors_id = np.where(lambdas > threshold)[0]
    vector_number = len(support_vectors_id)
    support_vectors = train_data_set[support_vectors_id, :]

    lambdas = lambdas[support_vectors_id]
    targets = train_labels[support_vectors_id]

    b = np.sum(targets)
    for n in range(vector_number):
        b -= np.sum(lambdas * targets * np.reshape(kernel[support_vectors_id[n], support_vectors_id], (vector_number, 1)))
    b /= len(lambdas)

    return lambdas, support_vectors, support_vectors_id, b, targets, vector_number

The prediction part is a bit different as we need to take RBF kernel into consideration:

In [4]:
def classify_rbf(test_data_set, train_data_set, lambdas, targets, b, vector_number, support_vectors, support_vectors_id):
    kernel = np.dot(test_data_set, support_vectors.T)
    sigma = 1.0
    c = (1. / sigma * np.sum(test_data_set ** 2, axis=1) * np.ones((1, np.shape(test_data_set)[0]))).T
    c = np.dot(c, np.ones((1, np.shape(kernel)[1])))
    sv = (np.diag(np.dot(train_data_set, train_data_set.T))*np.ones((1,len(train_data_set)))).T[support_vectors_id]
    aa = np.dot(sv,np.ones((1,np.shape(kernel)[0]))).T
    kernel = kernel - 0.5 * c - 0.5 * aa
    kernel = np.exp(kernel / (2. * sigma ** 2))

    y = np.zeros((np.shape(test_data_set)[0], 1))
    for j in range(np.shape(test_data_set)[0]):
        for i in range(vector_number):
            y[j] += lambdas[i] * targets[i] * kernel[j, i]
        y[j] += b
    return np.sign(y)

The prediction and accuracy is usually higher than the linear kernel.

In [5]:
lambdas, support_vectors, support_vectors_id, b, targets, vector_number = train(train_data_set, train_labels, kernel_type='rbf')
predicted = classify_rbf(test_data_set, train_data_set, lambdas, targets, b, vector_number, support_vectors, support_vectors_id)
predicted = list(predicted.astype(int))

from sklearn.metrics import accuracy_score

print(accuracy_score(predicted, test_labels))

     pcost       dcost       gap    pres   dres
 0:  9.6305e+01 -1.2289e+03  2e+03  2e-01  2e-15
 1:  5.9143e+01 -1.2031e+02  2e+02  5e-03  2e-15
 2:  7.0898e+00 -1.6497e+01  2e+01  3e-16  2e-15
 3: -5.2057e-01 -3.7668e+00  3e+00  2e-16  8e-16
 4: -1.1712e+00 -1.8374e+00  7e-01  2e-16  3e-16
 5: -1.3952e+00 -1.6846e+00  3e-01  2e-16  2e-16
 6: -1.4671e+00 -1.5679e+00  1e-01  2e-16  2e-16
 7: -1.5060e+00 -1.5164e+00  1e-02  2e-16  2e-16
 8: -1.5105e+00 -1.5106e+00  1e-04  2e-16  2e-16
 9: -1.5105e+00 -1.5105e+00  1e-06  2e-16  2e-16
Optimal solution found.
0.85
