Python learning to rank (LTR) toolkit
Switch branches/tags
Nothing to show
Clone or download
jma127 Merge pull request #10 from stiebels/master
added MSLR-WEB dataset compatibility to LETOR converter; added pairwise transformation function
Latest commit 78fa0eb Nov 5, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs bump doc version Jul 21, 2017
pyltr added pairwise_transform function Nov 4, 2017
.gitignore ignored installation files Oct 14, 2016
.travis.yml fix travis May 14, 2017
AUTHORS.txt future work Jul 21, 2017
FUTUREWORK.rst future work Jul 21, 2017
LICENSE.txt Initial commit Aug 18, 2015
README.rst author file Jun 29, 2017
TODO.txt Initial commit Aug 18, 2015
runtests_py2.sh rename test scripts May 12, 2017
runtests_py3.sh rename test scripts May 12, 2017
setup.py author & version bump Jul 21, 2017

README.rst

pyltr

pypi version Build status

pyltr is a Python learning-to-rank toolkit with ranking models, evaluation metrics, data wrangling helpers, and more.

This software is licensed under the BSD 3-clause license (see LICENSE.txt).

The author may be contacted at ma127jerry <@t> gmail with general feedback, questions, or bug reports.

Example

Import pyltr:

import pyltr

Import a LETOR dataset (e.g. MQ2007 ):

with open('train.txt') as trainfile, \
        open('vali.txt') as valifile, \
        open('test.txt') as evalfile:
    TX, Ty, Tqids, _ = pyltr.data.letor.read_dataset(trainfile)
    VX, Vy, Vqids, _ = pyltr.data.letor.read_dataset(valifile)
    EX, Ey, Eqids, _ = pyltr.data.letor.read_dataset(evalfile)

Train a LambdaMART model, using validation set for early stopping and trimming:

metric = pyltr.metrics.NDCG(k=10)

# Only needed if you want to perform validation (early stopping & trimming)
monitor = pyltr.models.monitors.ValidationMonitor(
    VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
    metric=metric,
    n_estimators=1000,
    learning_rate=0.02,
    max_features=0.5,
    query_subsample=0.5,
    max_leaf_nodes=10,
    min_samples_leaf=64,
    verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

Evaluate model on test data:

Epred = model.predict(EX)
print 'Random ranking:', metric.calc_mean_random(Eqids, Ey)
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)

Features

Below are some of the features currently implemented in pyltr.

Models

  • LambdaMART (pyltr.models.LambdaMART)
    • Validation & early stopping
    • Query subsampling

Metrics

  • (N)DCG (pyltr.metrics.DCG, pyltr.metrics.NDCG)
    • pow2 and identity gain functions
  • ERR (pyltr.metrics.ERR)
    • pow2 and identity gain functions
  • (M)AP (pyltr.metrics.AP)
  • Kendall's Tau (pyltr.metrics.KendallTau)
  • AUC-ROC -- Area under the ROC curve (pyltr.metrics.AUCROC)

Data Wrangling

  • Data loaders (e.g. pyltr.data.letor.read)
  • Query groupers and validators (pyltr.util.group.check_qids, pyltr.util.group.get_groups)

Running Tests

Use the run_tests.sh script to run all unit tests.

Building Docs

cd into the docs/ directory and run make html. Docs are generated in the docs/_build directory.

Contributing

Quality contributions or bugfixes are gratefully accepted. When submitting a pull request, please update AUTHOR.txt so you can be recognized for your work :).

By submitting a Github pull request, you consent to have your submitted code released under the terms of the project's license (see LICENSE.txt).