Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement B-test MMD #1928

Open
karlnapf opened this issue Mar 4, 2014 · 3 comments
Open

Implement B-test MMD #1928

karlnapf opened this issue Mar 4, 2014 · 3 comments

Comments

@karlnapf
Copy link
Member

karlnapf commented Mar 4, 2014

This is an entrance task (and part of the project) of http://www.shogun-toolbox.org/page/Events/gsoc2014_ideas#variable_interactions

See B-test: A Non-parametric, Low Variance Kernel Two-sample Test

This is yet another kernel two-sample test that combines the strengths of the linear time MMD and the quadratic time mmd (both implemented in Shogun, see class documentation)
The idea is to compute the quadratic time MMD on blocks of the data and then average them in a linear time-way.

This task is to add another kernel-two-sample-test class that implements this test statistic. (Code could possibly re-use CQuadraticTimeMMD for computing those blocks). The null distribution can be approximated as a Gaussian as for the linear time MMD, but permutation should also be possible.
The idea is to re-use the existing Linear and Quadratic Time MMD implementation. If you read the code of CLinearTimeMMD, you will see that we already stream data in blocks and then compute the statistic on those blocks. The change that is made here, is that we do use CQuadraticTimeMMD to compute the statistic on the block, and then average those. Therefore, the overall structure will look a lot like compute_statistic in CLinearTimeMMD, and will use CQuadraticTimeMMD as a tool to do that.
As this is slightly technical from the implementation point of view, there will probably be some discussions -- feel free to suggest your ideas how to do this nicely.

This is a medium to hard entrance task as it involves creating a new class, implementing a new (though simple) algorithm, unit testing against matlab implementations (see paper), etc

deliverables:

  • CBTestMMD class
  • the above should be fully documented (including math) with reference to the paper
  • two methods for approximating the null distribution: permutation and gaussian
  • unit tests for all methods against the reference implementation from the paper (matlab)
  • (optional: kernel selection for MMD should work with this statistic, thats slightly more advanced)
  • libshogun example for API usage
  • python modular API usage
  • (optional: a few examples in the notebook showing the tradeoff of block size vs speed-and accuracy, reproducing the plots in the paper)

ask @karlnapf or @sejdino if questions

@lambday
Copy link
Member

lambday commented Mar 5, 2014

Working on this one.

@lambday
Copy link
Member

lambday commented Jun 29, 2014

Status check:

Done

TODO

  • python modular example
  • kernel selection with CBTestMMD
  • plots for the hypothesis testing notebook

@karlnapf
Copy link
Member Author

Lets keep an eye on this one, done soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants