From 4b55255f213e5ff262b63684136f1ab3b43fd97a Mon Sep 17 00:00:00 2001 From: Heiko Strathmann Date: Tue, 26 Apr 2016 20:47:37 +0100 Subject: [PATCH] initial sketch for kernel learning example --- .../linear_time_mmd_kernel_selection.rst | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+) create mode 100644 doc/cookbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst diff --git a/doc/cookbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst b/doc/cookbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst new file mode 100644 index 00000000000..436b40c2cf4 --- /dev/null +++ b/doc/cookbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst @@ -0,0 +1,57 @@ +================================== +Linear Time MMD (kernel selection) +================================== + +For the linear time MMD, it is possible to learn the kernel that maximizes the test power. +That is, for a fixed type I error, say :math:`\alpha=0.05`, the type II error is minimized. +Maximising the type II error here is equivalent to picking a kernel :math:`k` that maximizes + +.. math:: + + \argmax_k \frac{\text{MMD}_l}{\sigma_l}, + +where :math:`\text{MMD}_l` is the linear time MMD estimator and :math:`\sigma_l` is its standard deviation, both of which can be estimated in an on-line fashion. + +This allows to select a single kernel among a number of provided baseline kernels. +Furthermore, it is possible to learn the coefficients of a convex combination of :math:`d` baseline kernels :math:`\sum_{i=1}^d \lambda_i k_i` via solving a convex program of the form + +.. math:: + + \argmax_\lambda \lambda^\top Q \lambda \qquad \text{subject to } (\lambda^\top h)=1 \quad \lambda_i\geq 0 + +where :math:`h` is a vector of MMD statistics for each kernel and :math:`Q` is its empirical covariance. + +See :cite:`gretton2012optimal` for details. + +------- +Example +------- + +Imagine we have (streamed) samples from :math:`p` and :math:`q`. +Note that the data used to learn the kernel must be *different* from the data used for the test in order to ensure correct calibration, :cite:`gretton2012optimal` for details. + +We create an instance of :sgclass:`CLinearTimeMMD`, passing it the training data. + +.. sgexample:: linear_time_mmd_kernel_selection.sg:create_instance + +We then specify the desired baseline kernels to consider. + +.. sgexample:: linear_time_mmd_kernel_selection.sg:add_kernels + +The single kernel that maximizes the test power can be learned and extracted using + +.. sgexample:: linear_time_mmd_kernel_selection.sg:select_kernel_single + +Note that in order to extract particular kernel parameters, we need to cast the kernel to its actual type. Similarly, a convex combination of kernels, in the form of :sgclass:`CCombinedKernel`, that maximizes the test power can be learned and extracted using + +.. sgexample:: linear_time_mmd_kernel_selection.sg:select_kernel_combined + +We can perform the test on the last learnt kernel (note again, this must be done on different data) + +.. sgexample:: linear_time_mmd_kernel_selection.sg:perform_test + +---------- +References +---------- +.. bibliography:: ../../references.bib + :filter: docname in docnames