This is a python implementation of the AdaRank algorithm (Xu and Li, 2007) with early stopping.
The structure of the code follows closely to the scikit-learn style, but still there are some
differences in the estimator/metrics API (e.g.
fit() method takes three arguments
qid rather than just two).
Four ranking metrics are implemented: P@k, AP, DCG@k, and nDCG@k
trec_eval and Burges et al. versions).
The following code will run AdaRank for 100 iterations optimizing for NDCG@10. When no improvement is made within the previous 10 iterations, the algorithm will stop.
from adarank import AdaRank from metrics import NDCGScorer scorer = NDCGScorer(k=10) model = AdaRank(max_iter=100, estop=10, scorer=scorer).fit(X, y, qid) pred = model.predict(X_test, qid_test) print scorer(y_test, pred, qid_test).mean()
See test.py for more advanced examples.
Burges et al. Learning to rank using gradient descent. In Proceedings of ICML '05, pages 89–96. ACM, 2005.
Xu and Li. AdaRank: a boosting algorithm for information retrieval. In Proceedings of SIGIR '07, pages 391–398. ACM, 2007.