# Demo of kernel methods library

In this notebook we present the various important components of the `kernelmethods` library, and provide some example usage scenarios.

This library consists of a set of key classes such as `KernelMatrix`, a diverse library of kernel functions, as well as meta classes like `KernelSet` and `KernelBucket` to manage an array of kernel matrices. In addition, a library of kernel operations and related utilities are included.


## Table of Contents <a name="toc"></a>
- [Kernel functions](#kerfuncs)
- [Kernel matrix](#kernelmatrix)
- [Attributes for kernel matrix](#attr_km)
- [Containers](#kmcollections)
- [Usage in kernel machines](#usage_kernel_machines)
- [Drop-in Estimator classes](#kernelmachine)
- [Advanced applications](#advanced)


Let's get some imports and setup out of our way for a smooth operation of this notebook:

In [1]:
import traceback
import warnings
import numpy as np
np.set_printoptions(precision=2)
warnings.filterwarnings("ignore")

## Kernel functions <a name="kerfuncs"></a> 

Let's get started with some **kernel functions**!

A [kernel function](https://en.wikipedia.org/wiki/Positive-definite_kernel) takes in two samples (each represented by an array of values in their raw input space) as inputs and computes their inner product. In a typical machine learning library, these kernel functions are usually directly implemented as mathematical formulas, that blindly compute and return the inner product. They are almost always without any structure, validation or representation associated with them, which can be a recipe for invalid or disastrous implementations. In this library, given kernel functions are key to and at the core of everything, we take a more concerted effort to enforce certain structure, uniform validation and readable representation. We achieve this by defining a `BaseKernelFunction` abstract base class and making each kernel function inherit from it.

The `BaseKernelFunction` enforces each derived kernel:
1. to be callable, with two inputs
2. to have a name and a str representation
3. provides a method to check whether the derived kernel func is a valid kernel i.e. the kernel matrix derived on a random sample is positive semi-definite (PSD)
4. and that it is symmetric (via tests) as required.

These properties can be verified using the built-in kernel functions such as `PolyKernel` and `GaussianKernel` e.g. 

In [2]:
from kernelmethods import PolyKernel, GaussianKernel, LinearKernel

poly = PolyKernel(degree=4)
rbf = GaussianKernel()
# you can print/present then in many ways
print(poly)
print(rbf)
repr(rbf)

polynomial(degree=4,b=0)
gaussian(sigma=2.0)


'gaussian(sigma=2.0)'

Each of these are "children" of `BaseKernelFunction`, and hence have the aforementioned desirable properties and relevant attributes such as *degree* and intercept (*b*), a name and validation on input data:

In [3]:
print([poly.degree, poly.b, poly.name])
# which can also be conveniently presented via its repr or str form
print(poly)

[4, 0, 'polynomial']
polynomial(degree=4,b=0)


They can be called with two vectorial inputs, which returns their input product:

In [4]:
x = np.array([1, 2, 3])
y = np.array([2, 3, 4])
poly(x, y)

160000

There is an internal input validation - which throws a `ValueError` if the data is not of the right type or otherwise invalid

In [5]:
try:
    poly([1, 2, 3], [4, 5, 'a'])
except:
    traceback.print_exc()

Traceback (most recent call last):
  File "<ipython-input-5-7c24f8b6cc80>", line 2, in <module>
    poly([1, 2, 3], [4, 5, 'a'])
  File "/Users/Reddy/dev/kernelmethods/kernelmethods/numeric_kernels.py", line 46, in __call__
    x, y = check_input_arrays(x, y, ensure_dtype=np.number)
  File "/Users/Reddy/dev/kernelmethods/kernelmethods/utils.py", line 31, in check_input_arrays
    y = ensure_ndarray_1D(y, ensure_dtype)
  File "/Users/Reddy/dev/kernelmethods/kernelmethods/utils.py", line 62, in ensure_ndarray_1D
    return ensure_ndarray_size(array, ensure_dtype=ensure_dtype, ensure_num_dim=1)
  File "/Users/Reddy/dev/kernelmethods/kernelmethods/utils.py", line 78, in ensure_ndarray_size
    ''.format(array.dtype, ensure_dtype))
ValueError: input data type <U21 is not compatible with the required <class 'numpy.number'>


More importantly, every kernel function has a method to verify that kernel function is valid (KM induced is PSD!):

In [6]:
poly.is_psd()

True

In addition to using the pre-defined classes, one can easily build new classes either by defining new classes themselves (starting from `BaseKernelFunction` or its derived classes), or by simply specifying a callable and using the `KernelFromCallable` convenience class. For example, you a new type of polynomial kernel without an intercept, that can be achieved via

In [7]:
from kernelmethods.base import BaseKernelFunction, KernelFromCallable

# define that function
def poly_no_intercept(x, y, degree=2):
    return x.dot(y.T) ** degree

new_poly = KernelFromCallable(input_func=poly_no_intercept)
print(new_poly)

poly_no_intercept


You can check that `new_poly` is indeed a KernelFunction:

In [8]:
isinstance(new_poly, BaseKernelFunction)

True

Now, we can check its properties and usability:

In [9]:
new_poly(x, y)

400

You can also quickly check if this new function is a valid [mercer kenel](https://en.wikipedia.org/wiki/Mercer%27s_theorem) or not:

In [10]:
new_poly.is_psd()

True

In [11]:
# you will see the rbf is also pSD
rbf.is_psd()

True

[Go back to table of contents](#toc)

## Kernel matrix <a name="kernelmatrix"></a>

The gram matrix resulting from the pairwise application of the kernel function results in what is called the kernel matrix. This is a key data structure for all the kernel methods and learning algorithms. We designed `KernelMatrix` to make it self-contained, efficient and yet generic. 

You can import it simply by:

In [12]:
from kernelmethods import KernelMatrix

An instance can be created by specifying which function to be used as the kernel, and an optional name:

In [13]:
km = KernelMatrix(rbf)

You can inspect its properties easily, and get an easy to read representation anytime:

In [14]:
km

KernelMatrix: gaussian(sigma=2.0)

Specifying a kernel function is not enough - we usually need to attach and apply it to a sample. Let's create a simple dataset consisting of 10 points with 4 features each:

In [15]:
sample = np.random.rand(10, 4)

np.set_printoptions(precision=2)
print(sample)

[[0.28 0.82 0.66 0.43]
 [0.58 0.43 0.7  0.32]
 [0.91 0.41 0.79 0.77]
 [0.   0.32 1.   0.82]
 [0.85 0.53 0.37 0.76]
 [0.91 0.15 0.13 0.71]
 [0.43 0.9  0.51 0.81]
 [0.15 0.25 0.43 0.52]
 [0.05 0.15 0.63 0.97]
 [0.07 0.25 0.08 0.57]]


Attaching it is as easy as 

In [16]:
km.attach_to(sample)

You can then see it is ready for use, with a clear repr:

In [17]:
km

KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (10, 4)

You can display the full matrix simply with the `.full` attribute:

In [18]:
km.full

array([[1.  , 0.97, 0.92, 0.93, 0.93, 0.86, 0.98, 0.95, 0.91, 0.91],
       [0.97, 1.  , 0.96, 0.92, 0.95, 0.92, 0.94, 0.96, 0.91, 0.91],
       [0.92, 0.96, 1.  , 0.9 , 0.98, 0.94, 0.93, 0.91, 0.9 , 0.85],
       [0.93, 0.92, 0.9 , 1.  , 0.87, 0.82, 0.91, 0.95, 0.98, 0.89],
       [0.93, 0.95, 0.98, 0.87, 1.  , 0.97, 0.96, 0.92, 0.89, 0.9 ],
       [0.86, 0.92, 0.94, 0.82, 0.97, 1.  , 0.89, 0.91, 0.88, 0.91],
       [0.98, 0.94, 0.93, 0.91, 0.96, 0.89, 1.  , 0.93, 0.91, 0.91],
       [0.95, 0.96, 0.91, 0.95, 0.92, 0.91, 0.93, 1.  , 0.97, 0.98],
       [0.91, 0.91, 0.9 , 0.98, 0.89, 0.88, 0.91, 0.97, 1.  , 0.94],
       [0.91, 0.91, 0.85, 0.89, 0.9 , 0.91, 0.91, 0.98, 0.94, 1.  ]])

Notice that the kernel matrices are `normalized` by default, and hence all the diagonal elements are `1.0`. If you have a specific requirement not to normalize it, you can choose `normalized=False` during instantiation.

which then allows retrieval of various properties including size, full kernel matrix or portions of it in a convenient manner.

In [19]:
print('number of elements: {}, \n\tsamples: {} '.format(km.size, km.num_samples))
print('shape of KM: ', km.shape)

number of elements: 100, 
	samples: 10 
shape of KM:  (10, 10)


as well as easily evaluate its frobenius norm:

In [20]:
km.frob_norm

9.309624805148443

In addition, this class offers the following public attributes of KM (including methods):

In [21]:
[ a for a in dir(km)  if not a.startswith('_') ]

['attach_to',
 'attributes',
 'center',
 'centered',
 'diagonal',
 'frob_norm',
 'full',
 'full_sparse',
 'get_attr',
 'kernel',
 'name',
 'normalize',
 'normed_km',
 'num_samples',
 'set_attr',
 'shape',
 'size']

You can also perform common operations such as centering, and normalization quite easily, via `km.center()` and  `km.normalize()`, as well as query its diagonal with `km.diagonal()`.

[Go back to table of contents](#toc)

## Usage in kernel machines <a name='usage_kernel_machines'></a>

Once a kernel matrix is computed, you could pass it on to any kernel learning algorithms in place of the original sample. For example, to `SVC` or `KernelRidge` in `scikit-learn` by specifying the kernel=`precomputed` anf supplying the `km.full` instead of the `X`. Let's generate a toy sample first to compute a kernel matrix first:

In [22]:
from sklearn.datasets import make_classification
sample_data, labels = make_classification()

rbf = GaussianKernel()
toy_km = KernelMatrix(rbf)
toy_km.attach_to(sample_data)
toy_km

KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (100, 20)

then, prepare the kernel machine:

In [23]:
from sklearn.svm import SVC
svm = SVC(kernel='precomputed')
svm

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

And then, training the kernel machine is as simple as:

In [24]:
toy_km.shape

(100, 100)

toy_km.shape

In [25]:
svm.fit(toy_km.full, labels)
svm.decision_function

<bound method BaseSVC.decision_function of SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)>

And the resulting classifier is as good as anything else. 

As the input to svm `.fit()` was a precomputed kernel matrix, not original samples, we now need to provide a kernel matrix that is a dot product between train and test sets! This can easily be achieved by attaching these two samples to the same instance. For example:

In [26]:
test_data, test_labels = make_classification(n_samples=40, n_features=20)

new_km = KernelMatrix(rbf)
new_km.attach_to(sample_one=test_data, sample_two=sample_data) # notice test_set goes first

Now `new_km.full` has the full kernel matrix from the application of kernel function, with pair-wise dot products between points across the two samples - the first one with 40 points, and the second one with 100 points, each in the 20-dimensional space. You can verify that is indeed the case with the shape of the new kernel matrix! 

In [27]:
new_km.shape

(40, 100)

Now you can make predictions on the test set easily via:

In [28]:
pred_y = svm.predict(new_km.full)
pred_y

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1,
       1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1])

Now, we can check the performance of this toy exercise on randomly generated data:

In [29]:
from sklearn.metrics import confusion_matrix
confusion_matrix(test_labels, pred_y)

array([[14,  6],
       [12,  8]])

which shows the performance is horrible - but we were expecting this to be only a proof of concept exercise!

[Go back to table of contents](#toc)

## Attributes for KernelMatrix <a name="attr_km"></a>

Another cool feature of the `KernelMatrix` class is the ability to attach arbitary user-defined attributes, for easy identification of a given kernel matrix. This is especially handy when one has to traverse among a large collection of kernel matrices, and programmatic identification is necessary!

For example, you can identify the KM with the source of the dataset:

In [30]:
km.set_attr('source', 'random')

or anything else you wish, like properties:

In [31]:
km.set_attr('properties', ['sigma', 4])

You can easily retrieve all the properties with `.attributes()` which returns the internal dictionary:

In [32]:
km.attributes()

{'source': 'random', 'properties': ['sigma', 4]}

Or just the one you like via `.get_attr()`:

In [33]:
km.get_attr('source')

'random'

The utility of these will be more obvious when dealing with container classes like [`KernelBucket` and `KernelSet`](#kmcollections)

[Go back to table of contents](#toc)

## Container classes: KernelSet and KernelBucket <a name="kmcollections"></a>

When dealing multiple kernel matrices e.g. as part of multiple kernel learning (MKL), a number of validation and sanity checks need to be performed. Some of these checks include ensuring compatible size of the kernel matrices (KMs), as well as knowing these matrices are generated from the same sample. We refer to such collection of KMs a `KernelSet`. Moreover, accessing a subset of these KMs e.g. filtered by some metric is often necessary while trying to optimize algorithms like MKL. To serve as candidates for optimization, it is often necessary to *sample* and generate a large number of KMs, here referred to as a *bucket*. The ``KernelSet`` and ``KernelBucket`` make these tasks easy and extensible while keeping their rich annotations and structure (meta-data etc).

Let's take a look at their utility:

In [34]:
from kernelmethods import KernelSet, KernelBucket

# let's build 3 kernel matrices
rbf = KernelMatrix(GaussianKernel(sigma=10))
lin = KernelMatrix(LinearKernel())
poly = KernelMatrix(PolyKernel(degree=2))

Now that we have 3 KMs (that are not attached to any samplet yet), we can collect them together with a simple `KernelSet` instantiation:

In [35]:
kset = KernelSet()
kset.append(lin)
kset.append(poly)
kset.append(rbf)
kset

KernelSet(3 kernels, None samples):
	KernelMatrix: linear
	KernelMatrix: polynomial(degree=2,b=0)
	KernelMatrix: gaussian(sigma=10.0) 

Alternatively, this can also be achieved with a single call with a list input:

In [36]:
kset = KernelSet([lin, poly, rbf])
kset

KernelSet(3 kernels, None samples):
	KernelMatrix: linear
	KernelMatrix: polynomial(degree=2,b=0)
	KernelMatrix: gaussian(sigma=10.0) 

And you can see that its string repr is clearly informing of its internal structure. Once we have a such kernel set, we can easily attach a sample to all of them at once with:

In [37]:
kset.attach_to(sample_data)
kset

KernelSet(3 kernels, 100 samples):
	KernelMatrix: linear (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=10.0) (normed=True) on sample (100, 20) 

You can see that sample is attached to each and every kernel matrix inside, and the compatibility for being in the set is determined by number of samples (in this case 100). So if we were to try attaching an incompatible KM, it would result in an error. This error is specifically idenfied as `KMSetAdditionError`. For example:

In [38]:
sample_data_n60 = np.random.rand(60, 10)
poly2 = KernelMatrix(PolyKernel())
poly2.attach_to(sample_data_n60)
try:
    kset.append(poly2)
except:
    traceback.print_exc()

Traceback (most recent call last):
  File "<ipython-input-38-03a7e559bba5>", line 5, in <module>
    kset.append(poly2)
  File "/Users/Reddy/dev/kernelmethods/kernelmethods/base.py", line 1025, in append
    ''.format(KM.num_samples, self.num_samples))
kernelmethods.config.KMSetAdditionError: Dimension of this KM 60 is incompatible with KMSet of 100! 


Such management of a collection of KMs helps you gain confidence in their usage, instead of constantly repeating validation checks all over the place and still be unable to trust the implementation!

Once you constructed it, you can access a single element directly with `[]`, or iterate through them quite easily as with any Iterable in python:

In [39]:
print(kset[1])
print(kset[0])
print('Iterating through ..')
for km in kset:
    print(km)

KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
KernelMatrix: linear (normed=True) on sample (100, 20)
Iterating through ..
KernelMatrix: linear (normed=True) on sample (100, 20)
KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
KernelMatrix: gaussian(sigma=10.0) (normed=True) on sample (100, 20)


Access some elements of the set is quite easy with the `.take()` method. For example:

In [40]:
subset = kset.take([1,2], name='subset')
subset

subset(2 kernels, 100 samples):
	KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=10.0) (normed=True) on sample (100, 20) 

You can easily find the number of KMs in the set with `.size` attribute and the common number of samples with `.num_samples` attribute.

Also, KernelSet lets you apply a single attribute to its collections, making it convenient for programmatic comparison or query later on:

In [41]:
kset.set_attr('source', 'magic')

### KernelBucket

`KernelBucket` is a child class of `KernelSet` that helps populate the set with a chosen range of parameter values for different kernel functions. For example:

In [42]:
kb = KernelBucket()
kb

KernelBucket(13 kernels, None samples):
	KernelMatrix: linear
	KernelMatrix: polynomial(degree=2,b=0)
	KernelMatrix: polynomial(degree=3,b=0)
	KernelMatrix: polynomial(degree=4,b=0)
	KernelMatrix: gaussian(sigma=0.03125)
	KernelMatrix: gaussian(sigma=0.125)
	KernelMatrix: gaussian(sigma=0.5)
	KernelMatrix: gaussian(sigma=2.0)
	KernelMatrix: gaussian(sigma=8.0)
	KernelMatrix: gaussian(sigma=32.0)
	KernelMatrix: laplacian(gamma=2)
	KernelMatrix: laplacian(gamma=8)
	KernelMatrix: laplacian(gamma=32) 

By default, KB adds 3 different types of kernel functions each with a range of their core parameter values. And this bucket would have the same behaviours and properties KernelSet (iteration, attributes, append, access etc). You can also choose to supply your own value ranges e.g.

In [43]:
kb = KernelBucket(rbf_sigma_values=[100, 400, 3435], 
                  laplacian_gamma_values=None,
                  poly_degree_values=None,
                  name='high_sigma_bucket')
kb

high_sigma_bucket(4 kernels, None samples):
	KernelMatrix: linear
	KernelMatrix: gaussian(sigma=100.0)
	KernelMatrix: gaussian(sigma=400.0)
	KernelMatrix: gaussian(sigma=3435.0) 

This library also provides a convenient `make_kernel_bucket` function to populate a bucket with either `exhaustive` or `light` ranges of parameter value tuning:

In [44]:
from kernelmethods.sampling import make_kernel_bucket
kbl = make_kernel_bucket(strategy='light')
kbl

KBucketLight(7 kernels, None samples):
	KernelMatrix: linear
	KernelMatrix: polynomial(degree=2,b=0)
	KernelMatrix: polynomial(degree=3,b=0)
	KernelMatrix: gaussian(sigma=0.125)
	KernelMatrix: gaussian(sigma=0.5)
	KernelMatrix: gaussian(sigma=2.0)
	KernelMatrix: laplacian(gamma=2) 

where you can see a different, but fewer/lighter set of values for parameters.

[Go back to table of contents](#toc)

## Drop-in Estimator Classes via `KernelMachine` <a name="kernelmachine"></a>

Besides being able to use the aforementioned `KernelMatrix` in SVM or another kernel machine, this library makes life even easier by providing drop-in Estimator classes directly. It's called `KernelMachine` and they can be dropped in place of `sklearn.svm.SVC` anywhere. For example:


In [45]:
from kernelmethods import KernelMachine
km = KernelMachine(k_func=GaussianKernel())
km.fit(X=sample_data, y=labels)
km


KernelMachine(k_func=gaussian(sigma=2.0), learner_id='SVR')

And making predictions on new samples is as easy as:

In [46]:
predicted_y = km.predict(sample_data)
print(predicted_y)

[0.9 0.9 0.1 0.1 0.1 0.1 0.1 0.9 0.9 0.9 0.1 0.1 0.1 0.9 0.9 0.9 0.1 0.9
 0.9 0.1 0.9 0.1 0.1 0.9 0.9 0.1 0.1 0.1 0.1 0.1 0.9 0.1 0.9 0.1 0.9 0.1
 0.9 0.1 0.9 0.1 0.9 0.1 0.1 0.9 0.9 0.9 0.9 0.1 0.1 0.1 0.9 0.9 0.1 0.1
 0.1 0.9 0.9 0.9 0.9 0.1 0.1 0.9 0.1 0.9 0.1 0.9 0.9 0.9 0.1 0.9 0.1 0.9
 0.1 0.1 0.9 0.9 0.1 0.9 0.9 0.1 0.1 0.1 0.1 0.9 0.1 0.1 0.1 0.1 0.9 0.1
 0.1 0.9 0.9 0.9 0.9 0.1 0.9 0.1 0.1 0.9]


And if you're not sure which kernel function is optimal for your dataset, you can employ `OptimalKernelSVR`


In [47]:
from kernelmethods import OptimalKernelSVR
opt_km = OptimalKernelSVR('exhaustive')
opt_km.fit(X=sample_data, y=labels)
print(opt_km)



OptimalKernelSVR(k_bucket=KBucketExhaustive(13 kernels, 100 samples):
	KernelMatrix: linear (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=3,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=4,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=0.03125) (n...
	KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=8.0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=32.0) (normed=True) on sample (100, 20)
	KernelMatrix: laplacian(gamma=2) (normed=True) on sample (100, 20)
	KernelMatrix: laplacian(gamma=8) (normed=True) on sample (100, 20)
	KernelMatrix: laplacian(gamma=32) (normed=True) on sample (100, 20) )


From the result of `print()` you can clearly see that `OptimalKernelSVR` is indeed built on `KBucketExhaustive` which ranked 13 kernels (and detailed info such as parameters and values) to choose the best for this particular sample.

You can also easily query which kernel performed best for this sample via the `.opt_kernel` attribute:

In [48]:
opt_km.opt_kernel

KernelMatrix: linear (normed=True) on sample (100, 20)

which seems to be a `linear` kernel function! Occam's razor, ha!

Now we can easily make predictions on the a new sample, and see how it performs:

In [49]:
predicted_y = opt_km.predict(sample_data)
print(predicted_y)

[ 0.93  0.96 -0.06 -0.07 -0.07  0.04  0.12  0.62  0.77  1.1   0.27  0.21
  0.1   0.12  0.91  1.24  0.14  0.86  0.08 -0.29  0.4   0.29  0.21  0.8
  0.9  -0.12  0.1   0.23  0.1   0.1   0.85  0.07  0.73  0.12  0.68 -0.27
  0.65  0.54  1.1   0.1   0.81  0.7  -0.15  0.9   1.55  1.04  0.9  -0.01
  0.37  0.1   1.22  0.83 -0.36  0.06  0.26  1.13  1.    0.68  0.9  -0.1
  0.16  0.76  0.18  0.98  0.04  0.21  0.95  0.75  0.3   0.9   0.05  1.1
  0.24  0.03  0.9   0.94  0.21  0.41  0.94  0.18 -0.1   0.29 -0.55  0.9
  0.1   0.    0.02  0.45  1.   -0.01  0.1   1.02  0.91  0.75  0.75  0.28
  0.67  0.13  0.35  0.9 ]


[Go back to table of contents](#toc)

## Advanced applications (MKL etc) <a name="advanced"></a>

Now that you have a reasonable grasp of the various functions of this library and how to use them, I'd like to present one quick example of how to use this machinery for an advanced application like Multiple Kernel Learning (MKL). Briefly, the technique of MKL boils down to a weighted linear combination of kernel matrices. Most of varitations in different MKL techniques are due to the way these weights are computed (optimization) and pruned (sparsity), in trying to optimize the performance of a kernel machine (often an SVM or SVR) for a given sample. 

If you have a kernel set with `n` KMs whose are weights are in `weight_vec`:

In [50]:
kb_light = make_kernel_bucket(strategy='light')
sample_data, labels = make_classification()
kb_light.attach_to(sample_data)
print(kb_light)

weight_vec = np.random.rand(kb_light.size) # toy example
weight_vec /= weight_vec.sum() # normalizing them!
print(weight_vec)

KBucketLight(7 kernels, 100 samples):
	KernelMatrix: linear (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=3,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=0.125) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=0.5) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (100, 20)
	KernelMatrix: laplacian(gamma=2) (normed=True) on sample (100, 20) 
[0.09 0.03 0.17 0.25 0.19 0.17 0.12]


Then the corresponding composite kernel matrix can computed easily with either the `linear_combination` function:

In [51]:
from kernelmethods.operations import linear_combination
lin_comb = linear_combination(kb_light, weight_vec)
lin_comb

array([[ 1.  ,  0.04,  0.07, ..., -0.03, -0.02,  0.07],
       [ 0.04,  1.  ,  0.01, ..., -0.02,  0.05,  0.01],
       [ 0.07,  0.01,  1.  , ..., -0.01, -0.03,  0.02],
       ...,
       [-0.03, -0.02, -0.01, ...,  1.  , -0.  , -0.  ],
       [-0.02,  0.05, -0.03, ..., -0.  ,  1.  ,  0.02],
       [ 0.07,  0.01,  0.02, ..., -0.  ,  0.02,  1.  ]])

or a special child class of `CompositeKernel` called `WeightedAverageKernel`:

In [52]:
from kernelmethods.base import WeightedAverageKernel
wak = WeightedAverageKernel(kb_light, weight_vec)
wak.fit()
wak

WeightedAverageKernel-->KBucketLight(7 kernels, 100 samples):
	KernelMatrix: linear (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=2,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: polynomial(degree=3,b=0) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=0.125) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=0.5) (normed=True) on sample (100, 20)
	KernelMatrix: gaussian(sigma=2.0) (normed=True) on sample (100, 20)
	KernelMatrix: laplacian(gamma=2) (normed=True) on sample (100, 20) 

from which the KM can be accessed with `wak.composite_KM` or `wak.full`:

In [53]:
wak.composite_KM

array([[ 1.  ,  0.04,  0.07, ..., -0.03, -0.02,  0.07],
       [ 0.04,  1.  ,  0.01, ..., -0.02,  0.05,  0.01],
       [ 0.07,  0.01,  1.  , ..., -0.01, -0.03,  0.02],
       ...,
       [-0.03, -0.02, -0.01, ...,  1.  , -0.  , -0.  ],
       [-0.02,  0.05, -0.03, ..., -0.  ,  1.  ,  0.02],
       [ 0.07,  0.01,  0.02, ..., -0.  ,  0.02,  1.  ]])

You can verify that they both return the same kernel matrix:

In [54]:
np.isclose(wak.composite_KM, lin_comb).all()

True

Composite kernels can be produced in many other ways e.g. using `AverageKernel` or `SumKernel` etc. 

Once you have a desired composite kernel, MKL machine can easily be constructed by passing it as a precomputed kernel to any other toolbox e.g. an Estimator in sklearn:

In [55]:
mkl = SVC(kernel='precomputed')
mkl.fit(X=wak.composite_KM, y=labels)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

That's it :).

Weights for different kernel matrices can also be computed via their *kernel target alignment* to the target values `y`. The `operations` and `ranking` submodules provide different metrics for this purpose such as `centered_alignment` and performance in cross-validation (CV) etc.

Stay tuned for more tutorials, examples and comprehensive docs.

[Go back to table of contents](#toc)