You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See B-test: A Non-parametric, Low Variance Kernel Two-sample Test
This is yet another kernel two-sample test that combines the strengths of the linear time MMD and the quadratic time mmd (both implemented in Shogun, see class documentation)
The idea is to compute the quadratic time MMD on blocks of the data and then average them in a linear time-way.
This task is to add another kernel-two-sample-test class that implements this test statistic. (Code could possibly re-use CQuadraticTimeMMD for computing those blocks). The null distribution can be approximated as a Gaussian as for the linear time MMD, but permutation should also be possible.
The idea is to re-use the existing Linear and Quadratic Time MMD implementation. If you read the code of CLinearTimeMMD, you will see that we already stream data in blocks and then compute the statistic on those blocks. The change that is made here, is that we do use CQuadraticTimeMMD to compute the statistic on the block, and then average those. Therefore, the overall structure will look a lot like compute_statistic in CLinearTimeMMD, and will use CQuadraticTimeMMD as a tool to do that.
As this is slightly technical from the implementation point of view, there will probably be some discussions -- feel free to suggest your ideas how to do this nicely.
This is a medium to hard entrance task as it involves creating a new class, implementing a new (though simple) algorithm, unit testing against matlab implementations (see paper), etc
deliverables:
CBTestMMD class
the above should be fully documented (including math) with reference to the paper
two methods for approximating the null distribution: permutation and gaussian
unit tests for all methods against the reference implementation from the paper (matlab)
(optional: kernel selection for MMD should work with this statistic, thats slightly more advanced)
libshogun example for API usage
python modular API usage
(optional: a few examples in the notebook showing the tradeoff of block size vs speed-and accuracy, reproducing the plots in the paper)
This is an entrance task (and part of the project) of http://www.shogun-toolbox.org/page/Events/gsoc2014_ideas#variable_interactions
See B-test: A Non-parametric, Low Variance Kernel Two-sample Test
This is yet another kernel two-sample test that combines the strengths of the linear time MMD and the quadratic time mmd (both implemented in Shogun, see class documentation)
The idea is to compute the quadratic time MMD on blocks of the data and then average them in a linear time-way.
This task is to add another kernel-two-sample-test class that implements this test statistic. (Code could possibly re-use CQuadraticTimeMMD for computing those blocks). The null distribution can be approximated as a Gaussian as for the linear time MMD, but permutation should also be possible.
The idea is to re-use the existing Linear and Quadratic Time MMD implementation. If you read the code of CLinearTimeMMD, you will see that we already stream data in blocks and then compute the statistic on those blocks. The change that is made here, is that we do use CQuadraticTimeMMD to compute the statistic on the block, and then average those. Therefore, the overall structure will look a lot like compute_statistic in CLinearTimeMMD, and will use CQuadraticTimeMMD as a tool to do that.
As this is slightly technical from the implementation point of view, there will probably be some discussions -- feel free to suggest your ideas how to do this nicely.
This is a medium to hard entrance task as it involves creating a new class, implementing a new (though simple) algorithm, unit testing against matlab implementations (see paper), etc
deliverables:
ask @karlnapf or @sejdino if questions
The text was updated successfully, but these errors were encountered: