Computes URisk, TRisk and p-values for risk-sensitive IR evaluation.
Switch branches/tags
Nothing to show
Clone or download
Latest commit 08883e6 Jan 22, 2018
Permalink
Failed to load latest commit information.
LICENSE.txt Added readme and license.txt Jan 9, 2018
README.md Fix p-value Jan 22, 2018
calc_pvalue.R Initial commit Jan 9, 2018
trisk.rb Fix p-value Jan 22, 2018
trisk_gdeval.sh Fix p-value Jan 22, 2018
trisk_trec.sh Fix p-value Jan 22, 2018

README.md

TRisk

This tool computes URisk, TRisk and p-values for either trec_eval outputs or gdeval outputs. It requires Ruby and R.

The TRisk measure was first introduced in the 2014 SIGIR paper Hypothesis Testing for the Risk-sensitive Evaluation of Retrieval Systems written by B. Taner Dinçer, Craig Macdonald and Iadh Ounis. This tool was built and used to report figures in the 2017 ADCS paper Risk-reward trade-offs in Rank Fusion by Rodger Benham and J. Shane Culpepper.

The tool supports outputting TRisk values in both inferential mode (across all topics), and exploratory mode (per-topic).

Changelog

22/01/2018 - Fixed issue where the absolute value of negative t-values was not taken, resulting in probabilities outside of [0,1]. Fixed issue where script would not work without executing inside the folder.

Usage

trec_eval

To calculate the TRisk scores for a run compared to a baseline using the AP value reported in trec_eval:

trec_eval -q -M 1000 $QRELS $RUN | grep -E "^map" > run.trec
trec_eval -q -M 1000 $QRELS $BASELINE | grep -E "^map" > baseline.trec
./trisk_trec.sh run.trec baseline.trec > out.txt

Where out.txt now appears as something similar:

alpha,urisk,trisk,pvalue
0.0,0.0767,8.8167,0.0
1.0,0.0615,5.8329,0.0
2.0,0.0462,3.6263,0.0003
3.0,0.031,2.0425,0.0422
4.0,0.0157,0.888,0.375
5.0,0.0005,0.0231,0.982
6.0,-0.0148,-0.643,1.48
7.0,-0.03,-1.1688,1.76
8.0,-0.0453,-1.5932,1.89
9.0,-0.0605,-1.942,1.95
10.0,-0.0758,-2.2334,1.97

The default mode is inferential mode, i.e. TRisk values are formulated across all topics. To get per-topic values supply --per-topic as the final argument:

./trisk_trec.sh run.trec baseline.trec --per-topic
alpha,topic,trisk,pvalue
0.0,301,0.988,0.324
0.0,302,0.484,0.629
0.0,303,-0.015,1.01
0.0,304,0.146,0.884
...

gdeval

To calculate the TRisk scores for a run compared to a baseline using the NDCG@10 evaluation metric:

perl gdeval.pl -k 10 -j 4 $QRELS $RUN > run.gdeval
perl gdeval.pl -k 10 -j 4 $QRELS $BASELINE > baseline.gdeval
./trisk_gdeval.sh run.gdeval baseline.gdeval ndcg@10 > out.txt

Similarly, supply --per-topic as the final argument as above to get exploratory values.

License

This tool is licensed under the MIT open-source. See LICENSE.txt for more details.