autoqrels
is a tool for automatically inferring query relevance assessments (qrels).
Currently, it supports the one-shot labeling approach (1SL) presented in MacAvaney and Soldaini, One-Shot Labeling for Automatic Relevance Estimation, SIGIR 2023.
This package adheres to the ir-measures
API, which means it can
be directly used by various tools, such as PyTerrier.
You can install autoqrels
using pip:
pip install autoqrels
You can also work with the repository locally:
git clone https://github.com/seanmacavaney/autoqrels.git
cd autoqrels
python setup.py develop
The primary interface in autoqrels
is autoqrels.Labeler
. A Labeler
exposes a
method, infer_qrels(run, qrels)
, which returns a new set of qrels that covers the
provided run:
run
is a Pandas DataFrame with the columnsquery_id
(str),doc_id
(str), andscore
(float)qrels
is a Pandas DataFrame with the columnsquery_id
(str),doc_id
(str), andrelevance
(int)- The return value is a Pandas DataFrame with the columns
query_id
(str),doc_id
(str), andrelevance
(float)
Labeler
s also expose several measure definitions compatible with ir_measures
:
labeler.SDCG@k
,
labeler.RBP(p=persistence)
,
labeler.P@k
.
These measures can be used to calculate the corresponding effectivness, with the
addition of the labeler's inferred qrels. See the ir-measures documentation
for more details.
We'll now explore the available Labeler
implementations.
Reproduction: See repro instructions in repro/oneshot
.
One-shot labelers work over a single known relevant document per query. An error is raised if multiple relevant documents are provided.
Example:
import autoqrels
import ir_datasets
dataset = ir_datasets.load('msmarco-passage/trec-dl-2019')
duot5 = autoqrels.oneshot.DuoT5(dataset=dataset, cache_path='data/duot5.cache.json.gz')
# measures:
duot5.SDCG@10
duot5.P@10
duot5.RBP
If you use this work, please cite:
@inproceedings{autoqrels,
author = {MacAvaney, Sean and Soldaini, Luca},
title = {One-Shot Labeling for Automatic Relevance Estimation},
booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year = {2023},
url = {https://arxiv.org/abs/2302.11266}
}