Merge branch 'feature/bigtest' into develop

shogun-toolbox · Feb 19, 2017 · 637743c · 637743c
2 parents 813575d + 2b09a5b
commit 637743c
Show file tree

Hide file tree

Showing 181 changed files with 15,402 additions and 11,041 deletions.
diff --git a/doc/cookbook/source/examples/statistical_testing/linear_time_mmd.rst b/doc/cookbook/source/examples/statistical_testing/linear_time_mmd.rst
@@ -0,0 +1,80 @@
+===============
+Linear Time MMD
+===============
+
+The linear time MMD implements a nonparametric statistical hypothesis test to reject the null hypothesis that to distributions :math:`p` and :math:`q`, each only observed via :math:`n` samples, are the same, i.e. :math:`H_0:p=q`.
+
+The (unbiased) statistic is given by
+
+.. math::
+
+  \frac{2}{n}\sum_{i=1}^n k(x_{2i},x_{2i}) + k(x_{2i+1}, x_{2i+1}) - 2k(x_{2i},x_{2i+1}).
+
+See :cite:`gretton2012kernel` for a detailed introduction.
+
+-------
+Example
+-------
+
+Imagine we have samples from :math:`p` and :math:`q`.
+As the linear time MMD is a streaming statistic, we need to pass it :sgclass:`CStreamingFeatures`.
+Here, we use synthetic data generators, but it is possible to construct :sgclass:`CStreamingFeatures` from (large) files.
+We create an instance of :sgclass:`CLinearTimeMMD`, passing it data and the kernel to use,
+
+.. sgexample:: linear_time_mmd.sg:create_instance
+
+An important parameter for controlling the efficiency of the linear time MMD is block size of the number of samples that is processed at once. As a guideline, set as large as memory allows.
+
+.. sgexample::linear_time_mmd.sg:set_burst
+
+Computing the statistic is done as
+
+.. sgexample::linear_time_mmd.sg:estimate_mmd
+
+We can perform the hypothesis test via computing a test threshold for a given :math:`\alpha`, or by directly computing a p-value.
+
+.. sgexample::linear_time_mmd.sg:perform_test_threshold
+
+---------------
+Kernel learning
+---------------
+
+There are various options to learn a kernel.
+All options allow to learn a single kernel among a number of provided baseline kernels.
+Furthermore, some of these criterions can be used to learn the coefficients of a convex combination of baseline kernels.
+
+There are different strategies to learn the kernel, see :sgclass:`CKernelSelectionStrategy`.
+
+We specify the desired baseline kernels to consider. Note the kernel above is not considered in the selection.
+
+.. sgexample:: linear_time_mmd.sg:add_kernels
+
+IMPORTANT: when learning the kernel for statistical testing, this needs to be done on different data than being used for performing the actual test.
+One way to accomplish this is to manually provide a different set of features for testing.
+In Shogun, it is also possible to automatically split the provided data by specifying the ratio between train and test data, via enabling the train-test mode.
+
+.. sgexample:: linear_time_mmd.sg:enable_train_test_mode
+
+A ratio of 1 means the data is split into half during learning the kernel, and subsequent tests are performed on the second half.
+
+We learn the kernel and extract the result, again see :sgclass:`CKernelSelectionStrategy` more available strategies. Note that the kernel of the mmd itself is replaced.
+If all kernels have the same type, we can convert the result into that type, for example to extract its parameters.
+
+.. sgexample:: linear_time_mmd.sg:select_kernel_single
+
+Note that in order to extract particular kernel parameters, we need to cast the kernel to its actual type.
+
+Similarly, a convex combination of kernels, in the form of :sgclass:`CCombinedKernel` can be learned and extracted as
+
+.. sgexample:: linear_time_mmd.sg:select_kernel_combined
+
+We can perform the test on the last learnt kernel.
+Since we enabled the train-test mode, this automatically is done on the held out test data.
+
+.. sgexample:: linear_time_mmd.sg:perform_test
+
+----------
+References
+----------
+.. bibliography:: ../../references.bib
+    :filter: docname in docnames
diff --git a/doc/cookbook/source/examples/statistical_testing/quadratic_time_mmd.rst b/doc/cookbook/source/examples/statistical_testing/quadratic_time_mmd.rst
@@ -0,0 +1,93 @@
+==================
+Quadratic Time MMD
+==================
+
+The quadratic time MMD implements a nonparametric statistical hypothesis test to reject the null hypothesis that to distributions :math:`p` and :math:`q`, only observed via :math:`n` and :math:`m` samples respectively, are the same, i.e. :math:`H_0:p=q`.
+
+The (biased) test statistic is given by
+
+.. math::
+
+  \frac{1}{nm}\sum_{i=1}^n\sum_{j=1}^m k(x_i,x_i) + k(x_j, x_j) - 2k(x_i,x_j).
+  
+
+See :cite:`gretton2012kernel` for a detailed introduction.
+
+-------
+Example
+-------
+
+Imagine we have samples from :math:`p` and :math:`q`, here in the form of CDenseFeatures (here 64 bit floats aka RealFeatures).
+
+.. sgexample:: quadratic_time_mmd.sg:create_features
+
+We create an instance of :sgclass:`CQuadraticTimeMMD`, passing it data the kernel.
+
+.. sgexample:: quadratic_time_mmd.sg:create_instance
+
+We can select multiple ways to compute the test statistic, see :sgclass:`CQuadraticTimeMMD` for details. 
+The biased statistic is computed as
+
+.. sgexample:: quadratic_time_mmd.sg:estimate_mmd
+
+There are multiple ways to perform the actual hypothesis test, see :sgclass:`CQuadraticTimeMMD` for details. The permutation version simulates from :math:`H_0` via repeatedly permuting the samples from :math:`p` and :math:`q`. We can perform the test via computing a test threshold for a given :math:`\alpha`, or by directly computing a p-value.
+
+.. sgexample:: quadratic_time_mmd.sg:perform_test
+
+----------------
+Multiple kernels
+----------------
+
+It is possible to perform all operations (computing statistics, performing test, etc) for multiple kernels at once, via the :sgclass:`CMultiKernelQuadraticTimeMMD` interface.
+
+.. sgexample:: quadratic_time_mmd.sg:multi_kernel
+
+Note that the results are now a vector with one entry per kernel.
+Also note that the kernels for single and multiple are kept separately.
+
+---------------
+Kernel learning
+---------------
+
+There are various options to learn a kernel.
+All options allow to learn a single kernel among a number of provided baseline kernels.
+Furthermore, some of these criterions can be used to learn the coefficients of a convex combination of baseline kernels.
+
+There are different strategies to learn the kernel, see :sgclass:`CKernelSelectionStrategy`.
+
+We specify the desired baseline kernels to consider. Note the kernel above is not considered in the selection.
+
+.. sgexample:: quadratic_time_mmd.sg:add_kernels
+
+IMPORTANT: when learning the kernel for statistical testing, this needs to be done on different data than being used for performing the actual test.
+One way to accomplish this is to manually provide a different set of features for testing.
+In Shogun, it is also possible to automatically split the provided data by specifying the ratio between train and test data, via enabling the train-test mode.
+
+.. sgexample:: quadratic_time_mmd.sg:enable_train_test_mode
+
+A ratio of 1 means the data is split into half during learning the kernel, and subsequent tests are performed on the second half.
+
+We learn the kernel and extract the result, again see :sgclass:`CKernelSelectionStrategy` more available strategies.
+Note that the kernel of the mmd itself is replaced.
+If all kernels have the same type, we can convert the result into that type, for example to extract its parameters.
+
+.. sgexample:: quadratic_time_mmd.sg:select_kernel_single
+
+Note that in order to extract particular kernel parameters, we need to cast the kernel to its actual type.
+
+Similarly, a convex combination of kernels, in the form of :sgclass:`CCombinedKernel` can be learned and extracted as
+
+.. sgexample:: quadratic_time_mmd.sg:select_kernel_combined
+
+We can perform the test on the last learnt kernel.
+Since we enabled the train-test mode, this automatically is done on the held out test data.
+
+.. sgexample:: quadratic_time_mmd.sg:perform_test_optimized
+
+----------
+References
+----------
+.. bibliography:: ../../references.bib
+    :filter: docname in docnames
+
+:wiki:`Statistical_hypothesis_testing`
diff --git a/doc/cookbook/source/index.rst b/doc/cookbook/source/index.rst
@@ -47,6 +47,15 @@ Regression
 
    examples/regression/**
 
+Statistical Testing
+-------------------
+
+.. toctree::
+   :maxdepth: 1
+   :glob:
+
+   examples/statistical_testing/**
+
 Kernels
 -------
 

diff --git a/doc/cookbook/source/references.bib b/doc/cookbook/source/references.bib
@@ -25,7 +25,7 @@ @book{cristianini2000introduction
   publisher={Cambridge University Press}
 }
 @article{fan2008liblinear,
-  title={LIBLINEAR: A Library for Large Linear Classification},
+  title={{LIBLINEAR: A Library for Large Linear Classification}},
   author={R.E. Fan and K.W. Chang and C.J. Hsieh and X.R. Wang and C.J. Lin},
   journal={Journal of Machine Learning Research},
   volume={9},
@@ -36,7 +36,18 @@ @book{Rasmussen2005GPM
   author = {Rasmussen, C. E. and Williams, C. K. I.},
   title = {Gaussian Processes for Machine Learning},
   year = {2005},
-  publisher = {The MIT Press}
+  publisher = {The MIT Press},
+  year={2008},
+}
+
+@article{gretton2012kernel,
+  title={A kernel two-sample test},
+  author={Gretton, A. and Borgwardt, K.M. and Rasch, M.J. and Sch{\"o}lkopf, B. and Smola, A.},
+  journal={The Journal of Machine Learning Research},
+  volume={13},
+  number={1},
+  pages={723--773},
+  year={2012},
 }
 @article{ueda2000smem,
   title={SMEM Algorithm for Mixture Models},
@@ -102,6 +113,13 @@ @inproceedings{shalev2011shareboost
   pages={1179--1187},
   year={2011}
 }
+
+@inproceedings{gretton2012optimal,
+  author={Gretton, A. and Sriperumbudur, B. and Sejdinovic, D. and Strathmann, H. and Balakrishnan, S. and Pontil, M. and Fukumizu, K.},
+  booktitle={Advances in Neural Information Processing Systems},
+  title={{Optimal kernel choice for large-scale two-sample tests}},
+  year={2012}
+}
 @article{sonnenburg2006large,
   title={Large scale multiple kernel learning},
   author={S. Sonnenburg and G. R{\"a}tsch and C. Sch{\"a}fer and B. Sch{\"o}lkopf},