### Intro

* Utilities to find pairwise distances or sample set affinities.
* Distances: f(a,b)<f(a,c): a,b are more similar.
* Kernels: s(a,b)>s(a,c): a,b are more similar than a,c.

[metrics.pairwise](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise)

### [Cosine Similarity](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html#sklearn.metrics.pairwise.cosine_similarity)

* L2-normalized dot product of two vectors
* Popular choice for finding document similarities as tf-idf vectors

### [Linear Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.linear_kernel.html#sklearn.metrics.pairwise.linear_kernel)

* Special case of a polynomial kernel with degree=1, coef=0.

### [Polynomial Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.polynomial_kernel.html#sklearn.metrics.pairwise.polynomial_kernel)

* Computed result = similarity between two vectors.
* Also considers similarities across dimensions, which aids interaction modeling.



### [Sigmoid Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.sigmoid_kernel.html#sklearn.metrics.pairwise.sigmoid_kernel)

* Also known as hyperbolic tangent or Multilayer Perceptron

### [RBF Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html#sklearn.metrics.pairwise.rbf_kernel)

### [Lapace Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.laplacian_kernel.html#sklearn.metrics.pairwise.laplacian_kernel)

* Variant of RBF kernel

### [Chi-Squared Kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.chi2_kernel.html#sklearn.metrics.pairwise.chi2_kernel)

* Very popular for training nonlinear SVMs - computer vision.
* Also used on histograms (bags) of words.

In [1]:
# kernel computed & fed to SVC

from sklearn.svm import SVC
from sklearn.metrics.pairwise import chi2_kernel
X = [[0, 1], [1, 0], [.2, .8], [.7, .3]]
y = [0, 1, 0, 1]
K = chi2_kernel(X, gamma=.5)
print(K)

svm = SVC(kernel='precomputed').fit(K, y)
svm.predict(K)

[[ 1.          0.36787944  0.89483932  0.58364548]
 [ 0.36787944  1.          0.51341712  0.83822343]
 [ 0.89483932  0.51341712  1.          0.7768366 ]
 [ 0.58364548  0.83822343  0.7768366   1.        ]]


array([0, 1, 0, 1])