# Demo of kernel methods library

In this notebook we present the various important components of the `kernelmethods` library, and provide some example usage scenarios.

This library consists of a set of key classes such as `KernelMatrix`, a diverse library of kernel functions, as well as meta classes like `KernelSet` and `KernelBucket` to manage an array of kernel matrices. In addition, a library of kernel operations and related utilities are included.


## Table of Contents
- [Kernel functions](#kerfuncs)
- [Kernel matrix](#kernelmatrix)
- [Attributes for kernel matrix](#attr_km)
- [Usage in kernel machines](#usage_kernel_machines)


## Kernel functions <a name="kerfuncs"></a> 

Let's get started with some **kernel functions**!

A [kernel function](https://en.wikipedia.org/wiki/Positive-definite_kernel) takes in two samples (each represented by an array of values in their raw input space) as inputs and computes their inner product. In a typical machine learning library, these kernel functions are usually directly implemented as mathematical formulas, that blindly compute and return the inner product. They are almost always without any structure, validation or representation associated with them, which can be a recipe for invalid or disastrous implementations. In this library, given kernel functions are key to and at the core of everything, we take a more concerted effort to enforce certain structure, uniform validation and readable representation. We achieve this by defining a `BaseKernelFunction` abstract base class and making each kernel function inherit from it.

The `BaseKernelFunction` enforces each derived kernel:
1. to be callable, with two inputs
2. to have a name and a str representation
3. provides a method to check whether the derived kernel func is a valid kernel i.e. the kernel matrix derived on a random sample is positive semi-definite (PSD)
4. and that it is symmetric (via tests) as required.

These properties can be verified using the built-in kernel functions such as `PolyKernel` and `GaussianKernel` e.g. 

In [16]:
from kernelmethods import PolyKernel, GaussianKernel

poly = PolyKernel(degree=4)
rbf = GaussianKernel()
# you can print/present then in many ways
print(poly)
print(rbf)
repr(rbf)

polynomial(degree=4,b=0)
gaussian(sigma=2.0)


'gaussian(sigma=2.0)'

Each of these are "children" of `BaseKernelFunction`, and hence have the aforementioned desirable properties and relevant attributes such as *degree* and intercept (*b*), a name and validation on input data:

In [17]:
print([poly.degree, poly.b, poly.name])
# which can also be conveniently presented via its repr or str form
print(poly)

[4, 0, 'polynomial']
polynomial(degree=4,b=0)


They can be called with two vectorial inputs, which returns their input product:

In [18]:
import numpy as np
x = np.array([1, 2, 3])
y = np.array([2, 3, 4])
poly(x, y)

160000

There is an internal input validation - which throws a `ValueError` if the data is not of the right type or valid

In [19]:
poly([1, 2, 3], [4, 5, 'a'])

ValueError: input data type <U21 is not compatible with the required <class 'numpy.number'>

More importantly, every kernel function has a method to verify that kernel function is valid (KM induced is PSD!):

In [20]:
poly.is_psd()

True

In addition to using the pre-defined classes, one can easily build new classes either by defining new classes themselves (starting from `BaseKernelFunction` or its derived classes), or by simply specifying a callable and using the `KernelFromCallable` convenience class. For example, you a new type of polynomial kernel without an intercept, that can be achieved via

In [21]:
from kernelmethods.base import BaseKernelFunction, KernelFromCallable

# define that function
def poly_no_intercept(x, y, degree=2):
    return x.dot(y.T) ** degree

new_poly = KernelFromCallable(input_func=poly_no_intercept)
print(new_poly)

poly_no_intercept


You can check that `new_poly` is indeed a KernelFunction:

In [22]:
isinstance(new_poly, BaseKernelFunction)

True

Now, we can check its properties and usability:

In [23]:
new_poly(x, y)

400

You can also quickly check if this new function is a valid [mercer kenel](https://en.wikipedia.org/wiki/Mercer%27s_theorem) or not:

In [24]:
new_poly.is_psd()

True

In [25]:
# you will see the rbf is also pSD
rbf.is_psd()

True

## Kernel matrix <a name="kernelmatrix"></a>

The gram matrix resulting from the pairwise application of the kernel function results in what is called the kernel matrix. This is a key data structure for all the kernel methods and learning algorithms. We designed `KernelMatrix` to make it self-contained, efficient and yet generic. 

You can import it simply by:

In [26]:
from kernelmethods import KernelMatrix

An instance can be created by specifying which function to be used as the kernel, and an optional name:

In [27]:
km = KernelMatrix(rbf)

You can inspect its properties easily, and get an easy to read representation anytime:

In [28]:
km

KernelMatrix: gaussian(sigma=2.0)

Specifying a kernel function is not enough - we usually need to attach and apply it to a sample. Let's create a simple dataset consisting of 10 points with 4 features each:

In [29]:
sample = np.random.rand(10, 4)

np.set_printoptions(precision=2)
print(sample)

[[0.99 0.81 0.75 0.58]
 [0.33 0.64 0.07 0.3 ]
 [0.34 0.04 0.33 0.78]
 [0.16 0.92 0.25 0.83]
 [0.86 0.87 0.7  0.56]
 [0.91 0.44 0.68 0.49]
 [0.72 0.01 0.64 0.95]
 [0.7  0.4  0.62 0.57]
 [0.26 0.89 0.63 0.24]
 [0.33 0.55 0.38 0.75]]


Attaching it is as easy as 

In [30]:
km.attach_to(sample)

You can then see it is ready for use, with a clear repr:

In [31]:
km

KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (10, 4)

You can display the full matrix simply with the `.full` attribute:

In [32]:
km.full

array([[1.  , 0.88, 0.86, 0.88, 1.  , 0.98, 0.9 , 0.97, 0.92, 0.92],
       [0.88, 1.  , 0.92, 0.95, 0.91, 0.91, 0.85, 0.93, 0.95, 0.96],
       [0.86, 0.92, 1.  , 0.9 , 0.87, 0.92, 0.97, 0.95, 0.87, 0.97],
       [0.88, 0.95, 0.9 , 1.  , 0.91, 0.87, 0.85, 0.91, 0.94, 0.98],
       [1.  , 0.91, 0.87, 0.91, 1.  , 0.98, 0.89, 0.97, 0.94, 0.94],
       [0.98, 0.91, 0.92, 0.87, 0.98, 1.  , 0.95, 0.99, 0.92, 0.94],
       [0.9 , 0.85, 0.97, 0.85, 0.89, 0.95, 1.  , 0.96, 0.83, 0.93],
       [0.97, 0.93, 0.95, 0.91, 0.97, 0.99, 0.96, 1.  , 0.94, 0.97],
       [0.92, 0.95, 0.87, 0.94, 0.94, 0.92, 0.83, 0.94, 1.  , 0.95],
       [0.92, 0.96, 0.97, 0.98, 0.94, 0.94, 0.93, 0.97, 0.95, 1.  ]])

Notice that the kernel matrices are `normalized` by default, and hence all the diagonal elements are `1.0`. If you have a specific requirement not to normalize it, you can choose `normalized=False` during instantiation.

which then allows retrieval of various properties including size, full kernel matrix or portions of it in a convenient manner.

In [33]:
print('number of elements: {}, \n\tsamples: {} '.format(km.size, km.num_samples))
print('shape of KM: ', km.shape)

number of elements: 100, 
	samples: 10 
shape of KM:  (10, 10)


as well as easily evaluate its frobenius norm:

In [34]:
km.frob_norm

9.345098083222108

In addition, this class offers the following public attributes of KM (including methods):

In [35]:
[ a for a in dir(km)  if not a.startswith('_') ]

['attach_to',
 'attributes',
 'center',
 'centered',
 'diagonal',
 'frob_norm',
 'full',
 'full_sparse',
 'get_attr',
 'kernel',
 'name',
 'normalize',
 'normed_km',
 'num_samples',
 'set_attr',
 'shape',
 'size']

You can also perform common operations such as centering, and normalization quite easily, via `km.center()` and  `km.normalize()`, as well as query its diagonal with `km.diagonal()`

## Usage in kernel machines <a name='usage_kernel_machines'></a>

Once a kernel matrix is computed, you could pass it on to any kernel learning algorithms in place of the original sample. For example, to `SVC` or `KernelRidge` in `scikit-learn` by specifying the kernel=`precomputed` anf supplying the `km.full` instead of the `X`. Let's generate a toy sample first to compute a kernel matrix first:

In [73]:
from sklearn.datasets import make_classification
sample_data, labels = make_classification()

rbf = GaussianKernel()
toy_km = KernelMatrix(rbf)
toy_km.attach_to(sample_data)
toy_km

KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (100, 20)

then, prepare the kernel machine:

In [74]:
from sklearn.svm import SVC
svm = SVC(kernel='precomputed')
svm

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

And then, training the kernel machine is as simple as:

In [75]:
toy_km.shape

(100, 100)

toy_km.shape

In [76]:
svm.fit(toy_km.full, labels)
svm.decision_function

<bound method BaseSVC.decision_function of SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)>

And the resulting classifier is as good as anything else. 

As the input to svm `.fit()` was a precomputed kernel matrix, not original samples, we now need to provide a kernel matrix that is a dot product between train and test sets! This can easily be achieved by attaching these two samples to the same instance. For example:

In [80]:
test_data, test_labels = make_classification(n_samples=40, n_features=20)

new_km = KernelMatrix(rbf)
new_km.attach_to(sample_one=test_data, sample_two=sample_data) # notice test_set goes first

Now `new_km.full` has the full kernel matrix from the application of kernel function, with pair-wise dot products between points across the two samples - the first one with 40 points, and the second one with 100 points, each in the 20-dimensional space. You can verify that is indeed the case with the shape of the new kernel matrix! 

In [81]:
new_km.shape

(40, 100)

Now you can make predictions on the test set easily via:

In [83]:
pred_y = svm.predict(new_km.full)
pred_y

array([1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1])

Now, we can check the performance of this toy exercise on randomly generated data:

In [85]:
from sklearn.metrics import confusion_matrix
confusion_matrix(test_labels, pred_y)

array([[ 8, 12],
       [13,  7]])

which shows the performance is horrible - but we were expecting this to be only a proof of concept exercise!

## Attributes for KernelMatrix <a name="attr_km"></a>

Another cool feature of the `KernelMatrix` class is the ability to attach arbitary user-defined attributes, for easy identification of a given kernel matrix. This is especially handy when one has to traverse among a large collection of kernel matrices, and programmatic identification is necessary!

For example, you can identify the KM with the source of the dataset:

In [None]:
km.set_attr('source', 'random')

or anything else you wish, like properties:

In [None]:
km.set_attr('properties', ['sigma', 4])

You can easily retrieve all the properties with `.attributes()` which returns the internal dictionary:

In [None]:
km.attributes()

Or just the one you like via `.get_attr()`:

In [None]:
km.get_attr('source')

The utility of these will be more obvious when dealing with [`KernelBucket` and `KernelSet`](#kmcollections)