# Overview

Kernel methods owe their name from the use of kernel functions used to measure similarity between two vectors.

Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This implicit calculation of the data into the higher dimensional space without actually calculating the projection is referred to as the **kernel trick**. 

Further kernel functions provide a method to measure the similarity between vectors (data points). For example the value for the a kernel function $\space \kappa(\textbf{x}_i, \textbf{x}_j) $ is relatively high when data is similar and relatively low when dissimilar.

### Instance based learners
Kernel methods are included in a subset of machine learning algorithms termed *instance based learners*. Opposed to other *parameteric methods* where a set of parameters are learnt to map inputs to some output space,  $\space f_\theta(x): x \rightarrow y $ . Instance based learning instead learns weights $ w_i $ for each training example $ \space (\textbf{x}_i, y_i) $ recalling these weights at test/prediction time to quantify similarity between training inputs $\textbf{x}_i$ and test/prediction point $\textbf{x'}$ predicting output for  $ \space \textbf{x'}$. Predictions for $ \hat y$ are calculated by use of the four inputs, 

1. Training input data (indexed by i) 
2. Training outputs (indexed by i) 
3. Training weights (indexed by i) 
4. Test/prediciton vector 

A binary classifier is a concrete example of an *instance based learner* which can be calculaed using the following: 

$$ \hat y = sgn \sum^n_{i=1}y_i w_i k(\textbf{x}_i, x') $$

Where parameters are defined by: 

$ k(\textbf{x}_i, x') $ is the kernel similarity that measures the similarity between training data point i and test/prediction point. 

$ y_i $ is a class label. Either {0,1}. 

$ w_i $ is a learnt weight vector. 

$ sgn $ a function to determine class ie if > 0 positive class else negative class.

$ \hat y $ is the predicted class


In [2]:
# Import needed libraries
import numpy as np

Squared kernel 

$$ \space \kappa(\textbf{x}_i, \textbf{x}_j) =  <\phi(x_i), \phi(x_j)> $$

Then

$$ \phi(x_i) = \textbf{x}_i^2 $$

[x_1 x_2] ([x_1 x_2])T = (x_1^2 +  x_2^2) 

In [9]:
# Create data 
b = np.array([3,5])
a = np.array([8,9])

print(np.dot(a,b) ** 2)

4761
