Lea Goetz edited this page Feb 23, 2016 · 4 revisions
Clone this wiki locally

Large-scale statistical testing


Difficulty & Requirements

Medium. You need to know

  • Applied Statistical Hypothesis Testing
  • Linear Algebra
  • Linear Algebra in C++ Helpful
  • Gaussian Process basics, SVM basics


In this project, we aim to implement large-scale statistical testing into SHOGUN. There is some prior work mainly on kernel mean embeddings. We will implement standard statistical tests used in large scale manner in genome-wide association studies. This includes Fisher's exact test and mixed models (Lippert et al.). The major focus of the project is on multiple testing, so multiple testing threshold corrections will be implemented, starting with the family-wise error rate (FWER) and probably also the (augmented) false discovery rate (FDR). If there is enough time, extensions of multiple testing to GPs and SVMs can be tackled.

Waypoints and initial work

  • Check / debug current Fisher's exact test implementation
  • implement chi square test and trend test
  • Implement FWER computation using Young-Westfall permutation procedure
  • Mixed Model Testing with kernel matrix (Lipper et al)


  • Implement FDR computation using Young-Westfall permutation procedure
  • SVM based testing, GP based testing

Useful ressources

Get back to the main projects page.