updated meta examples

shogun-toolbox · Jul 1, 2016 · 3b03d18 · 3b03d18
1 parent 4450b58
commit 3b03d18
Show file tree

Hide file tree

Showing 8 changed files with 101 additions and 115 deletions.
diff --git a/doc/cookbook/source/examples/statistical_testing/linear_time_mmd.rst b/doc/cookbook/source/examples/statistical_testing/linear_time_mmd.rst
@@ -17,30 +17,32 @@ Example
 -------
 
 Imagine we have samples from :math:`p` and :math:`q`.
-As the linear time MMD is a streaming statistic, we need to pass it `CStreamingFeatures`.
-Here, we use synthetic data generators, but it is possible to construct `CStreamingFeatures` from (large) files.
+As the linear time MMD is a streaming statistic, we need to pass it :sgclass:`CStreamingFeatures`.
+Here, we use a synthetic data generator to reproduce a dataset used in :cite:`gretton2012kernel`, but it is possible to construct :sgclass:`CStreamingFeatures` from (large) files.
 
 .. sgexample:: linear_time_mmd.sg:create_features
 
-We create an instance of :sgclass:`CLinearTimeMMD`, passing it data and the kernel to use,
+We create an instance of :sgclass:`CLinearTimeMMD`, passing it data and the kernel to use.
+Note that in streaming scenarios, we assume the data stream is infinite. We therefore specify the number of data to stream for the test.
 
 .. sgexample:: linear_time_mmd.sg:create_instance
 
 An important parameter for controlling the efficiency of the linear time MMD is block size of the number of samples that is processed at once. As a guideline, set as large as memory allows.
 
-.. sgexample::linear_time_mmd.sg:set_burst
+.. sgexample:: linear_time_mmd.sg:set_burst
 
 Computing the statistic is done as
 
-.. sgexample::linear_time_mmd.sg:estimate_mmd
+.. sgexample:: linear_time_mmd.sg:estimate_mmd
 
+There are multiple ways to perform the actual hypothesis test, see :sgclass:`CLinearTimeMMD` for details. This default version is based on asymptotic normality of the test statistic.
 We can perform the hypothesis test via computing the rejection threshold
 
-.. sgexample::linear_time_mmd.sg:perform_test_threshold
+.. sgexample:: linear_time_mmd.sg:perform_test_threshold
 
 Alternatively, we can compute the p-value for the above value of the statistic
 
-.. sgexample::linear_time_mmd.sg:perform_test_p_value
+.. sgexample:: linear_time_mmd.sg:perform_test_p_value
 
 ----------
 References

diff --git a/...okbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst b/...okbook/source/examples/statistical_testing/linear_time_mmd_kernel_selection.rst
@@ -2,56 +2,54 @@
 Linear Time MMD (kernel selection)
 ==================================
 
-For the linear time MMD, it is possible to learn the kernel that maximizes the test power.
-That is, for a fixed type I error, say :math:`\alpha=0.05`, the type II error is minimized.
-Maximising the type II error here is equivalent to picking a kernel :math:`k` that maximizes 
-
-.. math::
-
-  \argmax_k \frac{\text{MMD}_l}{\sigma_l},
+There are various options to learn a kernel for the linear time MMD.
+All options allow to learn a single kernel among a number of provided baseline kernels.
+Furthermore, some of these criterions can be used to learn the coefficients of a convex combination of baseline kernels.
 
-where :math:`\text{MMD}_l` is the linear time MMD estimator and :math:`\sigma_l` is its standard deviation, both of which can be estimated in an on-line fashion.
-
-This allows to select a single kernel among a number of provided baseline kernels. 
-Furthermore, it is possible to learn the coefficients of a convex combination of :math:`d` baseline kernels :math:`\sum_{i=1}^d \lambda_i k_i` via solving a convex program of the form
-
-.. math::
-
-  \argmax_\lambda \lambda^\top Q \lambda \qquad \text{subject to } (\lambda^\top h)=1 \quad \lambda_i\geq 0
-  
-where :math:`h` is a vector of MMD statistics for each kernel and :math:`Q` is its empirical covariance.
-
+See the :doc:`linear_time_mmd` cookbook for details on the test itself.
 See :cite:`gretton2012optimal` for details.
 
 -------
 Example
 -------
 
-Imagine we have (streamed) samples from :math:`p` and :math:`q`. 
-Note that the data used to learn the kernel must be *different* from the data used for the test in order to ensure correct calibration, :cite:`gretton2012optimal` for details.
+Imagine we have samples from :math:`p` and :math:`q`. 
+We create an instance of :sgclass:`CQuadraticTimeMMD`, passing it the data.
 
-We create an instance of :sgclass:`CLinearTimeMMD`, passing it the training data.
-
-.. sgexample:: linear_time_mmd_kernel_selection.sg:create_instance
+.. sgexample:: linear_time_mmd_kernel_selection:create_instance
 
 We then specify the desired baseline kernels to consider.
 
-.. sgexample:: linear_time_mmd_kernel_selection.sg:add_kernels
+.. sgexample:: linear_time_mmd_kernel_selection:add_kernels
+
+IMPORTANT: when learning the kernel for statistical testing, this needs to be done on different data than being used for performing the actual test.
+One way to accomplish this is to manually provide a different set of features for testing.
+In Shogun, it is also possible to automatically split the provided data by specifying the ratio between train and test data, via enabling the train-test mode.
+
+.. sgexample:: linear_time_mmd_kernel_selection:enable_train_mode
 
-The single kernel that maximizes the test power can be learned and extracted using
+A ratio of 1 means the data is split into half during learning the kernel, and subsequent tests are performed on the second half.
 
-.. sgexample:: linear_time_mmd_kernel_selection.sg:select_kernel_single
+We learn the kernel and extract the result, again see :sgclass:`CKernelSelectionStrategy` more available strategies.
+If all kernels have the same type, we can convert the result into that type, for example to extract its parameters.
 
-Note that in order to extract particular kernel parameters, we need to cast the kernel to its actual type. Similarly, a convex combination of kernels, in the form of :sgclass:`CCombinedKernel`, that maximizes the test power can be learned and extracted using
+.. sgexample:: linear_time_mmd_kernel_selection:select_kernel_single
 
-.. sgexample:: linear_time_mmd_kernel_selection.sg:select_kernel_combined
+Note that in order to extract particular kernel parameters, we need to cast the kernel to its actual type.
 
-We can perform the test on the last learnt kernel (note again, this must be done on different data)
+Similarly, a convex combination of kernels, in the form of :sgclass:`CCombinedKernel` can be learned and extracted as
 
-.. sgexample:: linear_time_mmd_kernel_selection.sg:perform_test
+.. sgexample:: linear_time_mmd_kernel_selection:select_kernel_combined
+
+We can perform the test on the last learnt kernel.
+Since we enabled the train-test mode, this automatically is done on the held out test data.
+
+.. sgexample:: linear_time_mmd_kernel_selection:perform_test
 
 ----------
 References
 ----------
 .. bibliography:: ../../references.bib
     :filter: docname in docnames
+
+:doc:`linear_time_mmd`
diff --git a/doc/cookbook/source/examples/statistical_testing/quadratic_time_mmd.rst b/doc/cookbook/source/examples/statistical_testing/quadratic_time_mmd.rst
@@ -25,25 +25,14 @@ We create an instance of :sgclass:`CQuadraticTimeMMD`, passing it data the kerne
 
 .. sgexample:: quadratic_time_mmd.sg:create_instance
 
-We can select multiple ways to compute the test statistic, see :sgclass:`CQuadraticTimeMMD` for details. Unbiased statistic
-
-.. sgexample:: quadratic_time_mmd.sg:estimate_mmd_unbiased
-
-Biased statistic
+We can select multiple ways to compute the test statistic, see :sgclass:`CQuadraticTimeMMD` for details. 
+The biased statistic is computed as
 
 .. sgexample:: quadratic_time_mmd.sg:estimate_mmd_biased
 
 There are multiple ways to perform the actual hypothesis test, see :sgclass:`CQuadraticTimeMMD` for details. The permutation version simulates from :math:`H_0` via repeatedly permuting the samples from :math:`p` and :math:`q`:
 
-.. sgexample:: quadratic_time_mmd.sg:perform_test_permutation
-
-The spectrum version simulates from :math:`H_0` via approximating the Eigenspectrum of the underlying kernel:
-
-.. sgexample:: quadratic_time_mmd.sg:perform_test_spectrum
-
-The Gamma version fit a Gamma cumulative distribution function to :math:`H_0` via moment matching:
-
-.. sgexample:: quadratic_time_mmd.sg:perform_test_gamma
+.. sgexample:: quadratic_time_mmd.sg:perform_test
 
 ----------
 References

diff --git a/doc/cookbook/source/references.bib b/doc/cookbook/source/references.bib
@@ -22,7 +22,7 @@ @book{Rasmussen2005GPM
   author = {Rasmussen, C. E. and Williams, C. K. I.},
   title = {Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)},
   year = {2005},
-  publisher = {The MIT Press}
+  publisher = {The MIT Press},
   year={2008},
 }
 
@@ -96,6 +96,7 @@ @inproceedings{shalev2011shareboost
   booktitle={Advances in Neural Information Processing Systems},
   pages={1179--1187},
   year={2011}
+}
 
 @inproceedings{gretton2012optimal,
   author={Gretton, A. and Sriperumbudur, B. and Sejdinovic, D. and Strathmann, H. and Balakrishnan, S. and Pontil, M. and Fukumizu, K.},

diff --git a/examples/meta/src/statistical_testing/linear_time_mmd.sg b/examples/meta/src/statistical_testing/linear_time_mmd.sg
@@ -9,24 +9,24 @@ GaussianKernel kernel()
 mmd.set_kernel(kernel)
 mmd.set_p(features_p)
 mmd.set_q(features_q)
-mmd.set_num_samples_p(100)
-mmd.set_num_samples_q(100)
+mmd.set_num_samples_p(1000)
+mmd.set_num_samples_q(1000)
 #![create_instance]
 
 #![set_burst]
-mmd.set_num_blocks_per_burst(1000)
+mmd.set_num_blocks_per_burst(100)
 #![set_burst]
 
 #![estimate_mmd]
-Real statistic = mmd.compute_statistic()
+real statistic = mmd.compute_statistic()
 #![estimate_mmd]
 
 #![perform_test_threshold]
-Real alpha = 0.05
-Real threshold = mmd.compute_threshold(alpha)
+real alpha = 0.05
+real threshold = mmd.compute_threshold(alpha)
 #![perform_test_threshold]
 
 #![perform_test_p_value]
-Real p_value = mmd.compute_p_value(statistic)
+real p_value = mmd.compute_p_value(statistic)
 #![perform_test_p_value]
 
diff --git a/examples/meta/src/statistical_testing/linear_time_mmd_kernel_selection.sg b/examples/meta/src/statistical_testing/linear_time_mmd_kernel_selection.sg
@@ -1,14 +1,12 @@
-GaussianBlobsDataGenerator features_p_train()
-GaussianBlobsDataGenerator features_q_train()
-GaussianBlobsDataGenerator features_p_test()
-GaussianBlobsDataGenerator features_q_test()
+GaussianBlobsDataGenerator features_p()
+GaussianBlobsDataGenerator features_q()
 
 #![create_instance]
 LinearTimeMMD mmd()
-mmd.set_p(features_p_train)
-mmd.set_q(features_q_train)
-mmd.set_num_samples_p(100)
-mmd.set_num_samples_q(100)
+mmd.set_p(features_p)
+mmd.set_q(features_q)
+mmd.set_num_samples_p(1000)
+mmd.set_num_samples_q(1000)
 mmd.set_num_blocks_per_burst(100)
 #![create_instance]
 
@@ -21,23 +19,29 @@ mmd.add_kernel(kernel2)
 mmd.add_kernel(kernel3)
 #![add_kernels]
 
+# remove once enums are included automatically
+KernelSelectionStrategy temp()
+
+#![enable_train_mode]
+mmd.set_train_test_mode(true)
+mmd.set_train_test_ratio(1)
+#![enable_train_mode]
+
 #![select_kernel_single]
-mmd.select_kernel(KSM_MAXIMIZE_POWER)
+mmd.set_kernel_selection_strategy(enum EKernelSelectionMethod.KSM_MAXIMIZE_POWER)
+mmd.select_kernel()
 GaussianKernel learnt_kernel_single = GaussianKernel:obtain_from_generic(mmd.get_kernel())
-Real width = learnt_kernel_single.get_width()
+real width = learnt_kernel_single.get_width()
 #![select_kernel_single]
 
 #![select_kernel_combined]
-mmd.select_kernel(KSM_MAXIMIZE_POWER, true)
+mmd.set_kernel_selection_strategy(enum EKernelSelectionMethod.KSM_MAXIMIZE_POWER, true)
+mmd.select_kernel()
 CombinedKernel learnt_kernel_combined = CombinedKernel:obtain_from_generic(mmd.get_kernel())
-RealVector weights = learnt_kernel_combined.get_subkernel_weights()
+RealVector learnt_weights = learnt_kernel_combined.get_subkernel_weights()
 #![select_kernel_combined]
 
 #![perform_test]
-mmd.set_p(features_p_test)
-mmd.set_q(features_q_test)
-mmd.set_num_samples_p(100)
-mmd.set_num_samples_q(100)
-Real statistic = mmd.compute_statistic()
-Real p_value = mmd.compute_p_value(statistic)
+real statistic = mmd.compute_statistic()
+real p_value = mmd.compute_p_value(statistic)
 #![perform_test]
diff --git a/examples/meta/src/statistical_testing/quadratic_time_mmd.sg b/examples/meta/src/statistical_testing/quadratic_time_mmd.sg
@@ -14,32 +14,18 @@ Real alpha = 0.05
 #![create_instance]
 
 #![estimate_mmd_unbiased]
-mmd.set_statistic_type(ST_UNBIASED_FULL)
+mmd.set_statistic_type(enum EStatisticType.ST_UNBIASED_FULL)
 Real statistic_unbiased = mmd.compute_statistic()
 #![estimate_mmd_unbiased]
 
 #![estimate_mmd_biased]
-mmd.set_statistic_type(ST_BIASED_FULL)
+mmd.set_statistic_type(enum EStatisticType.ST_BIASED_FULL)
 Real statistic_biased = mmd.compute_statistic()
 #![estimate_mmd_biased]
 
-#![perform_test_permutation]
-mmd.set_null_approximation_method(NAM_PERMUTATION)
+#![perform_test]
+mmd.set_null_approximation_method(enum ENullApproximationMethod.NAM_PERMUTATION)
 mmd.set_num_null_samples(200)
-Real threshold_permutation = mmd.compute_threshold(alpha)
-Real p_value_permutation = mmd.compute_p_value(statistic_biased)
-#![perform_test_permutation]
-
-#![perform_test_spectrum]
-mmd.set_null_approximation_method(NAM_MMD2_SPECTRUM)
-mmd.set_num_null_samples(200)
-mmd.spectrum_set_num_eigenvalues(5)
-Real threshold_spectrum = mmd.compute_threshold(alpha)
-Real p_value_spectrum = mmd.compute_p_value(statistic_biased)
-#![perform_test_spectrum]
-
-#![perform_test_gamma]
-mmd.set_null_approximation_method(NAM_MMD2_GAMMA)
-Real threshold_gamma = mmd.compute_threshold(alpha)
-Real p_value_gamma = mmd.compute_p_value(statistic_biased)
-#![perform_test_gamma]
+Real threshold = mmd.compute_threshold(alpha)
+Real p_value = mmd.compute_p_value(statistic_biased)
+#![perform_test]
diff --git a/examples/meta/src/statistical_testing/quadratic_time_mmd_kernel_selection.sg b/examples/meta/src/statistical_testing/quadratic_time_mmd_kernel_selection.sg
@@ -1,17 +1,15 @@
-CSVFile f_features_p_train("../../data/two_sample_test_gaussian.dat")
-CSVFile f_features_q_train("../../data/two_sample_test_laplace.dat")
-CSVFile f_features_p_test("../../data/two_sample_test_gaussian2.dat")
-CSVFile f_features_q_test("../../data/two_sample_test_laplace2.dat")
+CSVFile f_features_p("../../data/two_sample_test_gaussian.dat")
+CSVFile f_features_q("../../data/two_sample_test_laplace.dat")
 
 #![create_features]
-RealFeatures features_p_train(f_features_p_train)
-RealFeatures features_q_train(f_features_q_train)
-RealFeatures features_p_test(f_features_p_test)
-RealFeatures features_q_test(f_features_q_test)
+RealFeatures features_p(f_features_p)
+RealFeatures features_q(f_features_q)
 #![create_features]
 
 #![create_instance]
-QuadraticTimeMMD mmd(features_p_train, features_q_train)
+QuadraticTimeMMD mmd()
+mmd.set_p(features_p)
+mmd.set_q(features_q)
 #![create_instance]
 
 #![add_kernels]
@@ -23,21 +21,29 @@ mmd.add_kernel(kernel2)
 mmd.add_kernel(kernel3)
 #![add_kernels]
 
+#![enable_train_mode]
+mmd.set_train_test_mode(true)
+mmd.set_train_test_ratio(1)
+#![enable_train_mode]
+
+# remove once enums are included automatically
+KernelSelectionStrategy temp()
+
 #![select_kernel_single]
-mmd.select_kernel(KSM_MAXIMIZE_MMD)
+mmd.set_kernel_selection_strategy(enum EKernelSelectionMethod.KSM_MAXIMIZE_MMD)
+mmd.select_kernel()
 GaussianKernel learnt_kernel_single = GaussianKernel:obtain_from_generic(mmd.get_kernel())
-Real width = learnt_kernel_single.get_width()
+real learnt_width = learnt_kernel_single.get_width()
 #![select_kernel_single]
 
 #![select_kernel_combined]
-mmd.select_kernel(KSM_MAXIMIZE_MMD, true)
+mmd.set_kernel_selection_strategy(enum EKernelSelectionMethod.KSM_MAXIMIZE_MMD, true)
+mmd.select_kernel()
 CombinedKernel learnt_kernel_combined = CombinedKernel:obtain_from_generic(mmd.get_kernel())
-RealVector weights = learnt_kernel_combined.get_subkernel_weights()
+RealVector learnt_weights = learnt_kernel_combined.get_subkernel_weights()
 #![select_kernel_combined]
 
 #![perform_test]
-mmd.set_p(features_p_test)
-mmd.set_q(features_q_test)
-Real statistic = mmd.compute_statistic()
-Real p_value = mmd.compute_p_value(statistic)
+real statistic = mmd.compute_statistic()
+real p_value = mmd.compute_p_value(statistic)
 #![perform_test]