Skip to content
A python library for conducting interleaving, which compares two or multiple rankers based on observed user clicks by interleaving their results.
Branch: master
Clone or download
tmanabe Merge pull request #64 from mpkato/circleci-20-test
Upgrading the repository for CircleCI 2.0
Latest commit 7907f7d Oct 15, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Customize .circleci/config.yml for Python 3.6 Oct 15, 2018
docs Fix a Sphinx warning, _static doesn't exist. Aug 30, 2017
interleaving
tests Merge pull request #60 from mpkato/feature/57 Oct 11, 2018
.gitignore
LICENSE
README.md
circle.yml
setup.py Customize .circleci/config.yml for Python 3.6 Oct 15, 2018

README.md

Interleaving

A python library for conducting interleaving, which comparing two or multiple rankers based on observed user clicks by interleaving their results.

Circle CI

Introduction

A/B testing is a well-known technique for comparing two or more systems based on user behaviors in a production environment, and has been used for improving the quality of systems in many services. Interleaving, which can be an alternative to A/B testing for comparing rankings, has shown x100 efficiency compared to A/B testing1, 2. Since the efficiency matters a lot in particular for many alternatives in comparison, interleaving is a promising technique for user-based ranking evaluation. This library aims to provide most of the algorithms that have been proposed in the literature.

Interleaving algorithms

Interleaving for two rankers

  • Balanced interleaving3
  • Team draft interleaving4
  • Probabilistic interleaving5
  • Optimized interleaving6

Interleaving for multiple rankers

  • Team draft multileaving7
  • Probabilistic multileaving8
  • Optimized multileaving7
  • Roughly optimized multileaving9
  • Pairwise preference multileaving10

Note that probabilistic interleaving and probabilistic multileaving use different strategies to select a ranker from which a document is selected. In the original papers, probabilistic interleaving samples a ranker with replacement, i.e. one of the two rankers is sampled at every document selection. Probabilistic multileaving samples a ranker without replacement. Let D be a set of all the rankers. A ranker is sampled from D without replacement. When D is empty, all the rankers are put into D again. Probabilistic has an keyword argument replace by which either of these strategies can be used.

Prerequisites

  • Numpy
  • Scipy
  • Pulp

Installation

interleaving and its prerequisites can be installed by

$ pip install git+https://github.com/mpkato/interleaving.git

An alternative can be

$ git clone git+https://github.com/mpkato/interleaving.git
$ cd interleaving
$ python setup.py install

Usage

>>> import interleaving
>>>
>>> a = [1, 2, 3, 4, 5] # Ranking 1
>>> b = [4, 3, 5, 1, 2] # Ranking 2
>>> method = interleaving.TeamDraft([a, b]) # initialize an interleaving method
>>>
>>> ranking = method.interleave() # interleaving
>>> ranking
[1, 4, 2, 3, 5]
>>>
>>> clicks = [0, 2] # observed clicks, i.e. documents 1 and 2 are clicked
>>> result = interleaving.TeamDraft.evaluate(ranking, clicks)
>>> result # (0, 1) indicates Ranking 1 won Ranking 2.
[(0, 1)]
>>>
>>> clicks = [1, 3] # observed clicks, i.e. documents 4 and 3 are clicked
>>> result = interleaving.TeamDraft.evaluate(ranking, clicks)
>>> result # (1, 0) indicates Ranking 2 won Ranking 1.
[(1, 0)]
>>>
>>> clicks = [0, 1] # observed clicks, i.e. documents 1 and 4 are clicked
>>> result = interleaving.TeamDraft.evaluate(ranking, clicks)
>>> result # if (0, 1) or (1, 0) does not appear in the result,
>>>        # it indicates a tie between Rankings 1 and 2.
[]

Note

The ranking sampling algorithm of optimized multileaving7 and roughly optimized multileaving9 may take a long time or even runs into an inifinite loop. To work around this problem, this implementation supports secure_sampling flag to limit the number of sampling attempts to sample_num.

>>> import interleaving
>>> interleaving.Optimized([[1, 2], [2, 3]], sample_num=4, secure_sampling=True)

References

  1. Chapelle et al. "Large-scale Validation and Analysis of Interleaved Search Evaluation." ACM TOIS 30.1 (2012): 6.
  2. Schuth, Hofmann, Radlinski. "Predicting Search Satisfaction Metrics with Interleaved Comparisons." SIGIR 2015.
  3. Joachims. "Evaluating retrieval performance using clickthrough data". Text Mining 2003.
  4. Radlinski, Kurup, and Joachims. "How does clickthrough data reflect retrieval quality?" CIKM 2008.
  5. Hofmann, Whiteson, and de Rijke. "A probabilistic method for inferring preferences from clicks." CIKM 2011.
  6. Radlinski and Craswell. "Optimized Interleaving for Online Retrieval Evaluation." WSDM 2013.
  7. Schuth et al. "Multileaved Comparisons for Fast Online Evaluation." CIKM 2014.
  8. Schuth et al. "Probabilistic Multileave for Online Retrieval Evaluation." SIGIR 2015.
  9. Manabe et al. "A Comparative Live Evaluation of Multileaving Methods on a Commercial cQA Search", SIGIR 2017.
  10. Oosterhuis and de Rijke. "Sensitive and Scalable Online Evaluation with Theoretical Guarantees", CIKM 2017.

License

MIT License (see LICENSE file).

You can’t perform that action at this time.