A reliable leaderboard algorithm for machine learning competitions
Latest commit 6640f98 May 19, 2015 @mrtzh fixed typo
Failed to load latest commit information.
csvs initial commit May 19, 2015
examples initial commit May 19, 2015
src eliminating Losses module May 19, 2015
test initial commit May 19, 2015
LICENSE.md fixed typo May 19, 2015
README.md typos May 19, 2015



A realiable leaderboard for machine learning competitions

MIT Licensed. See LICENSE.md.


Open a Julia prompt and call: Pkg.clone("https://github.com/mrtzh/Ladder.jl.git")


See this blog post for a discussion on the problem of overfitting to the public leaderboard in a data science competition.

This is the code repository for this paper. Here's a bibtex reference:

  author    = {Avrim Blum and Moritz Hardt},
  title     = {The Ladder: {A} Reliable Leaderboard for Machine Learning Competitions},
  journal   = {CoRR},
  volume    = {abs/1502.04585},
  year      = {2015},
  url       = {http://arxiv.org/abs/1502.04585},
  timestamp = {Mon, 02 Mar 2015 14:17:34 +0100},
  biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/BlumH15},
  bibsource = {dblp computer science bibliography, http://dblp.org}

If you use the code, we encourage you to cite our paper.


The basic usage is as follows:

using Ladder
# these are the labels corresponding to your holdout data set
holdoutlabels = [1.0, 0.0, 1.0, 1.0, 1.0, 0.0]
# create ladder instance around holdout labels
l = ladder(holdoutlabels)
# create submission
submission1 = Submission("sub1","teamA",[0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
score!(l,submission1,Ladder.loss01) # returns: 0.6666666666666666
# create another submission
submission2 = Submission("sub2","teamA",[1.0, 0.0, 0.0, 0.0, 0.0, 0.0])
score!(l,submission2,Ladder.loss01) # returns: 0.6666666666666666
# Ladder judged that there was no significant improvement
# create another submission
submission3 = Submission("sub3","teamA",[1.0, 0.0, 1.0, 0.0, 0.0, 0.0])
score!(l,submission2,Ladder.loss01) # 0.3333333333333333

See examples/photo.jl for a comprehensive example on Kaggle's Photo Quality Prediction challenge. The data set is not yet available, but will most likely be released by Kaggle in the near future.

Other usage

You can also use the Ladder mechanism to keep track of your own progress in a data science project and avoid overfitting to your holdout set. This can be useful in situations where you repeatedly evaluate candidate models against a holdout set.