This tool computes URisk, TRisk and p-values for either trec_eval outputs or gdeval outputs. It requires Ruby and R.
The TRisk measure was first introduced in the 2014 SIGIR paper Hypothesis Testing for the Risk-sensitive Evaluation of Retrieval Systems written by B. Taner Dinçer, Craig Macdonald and Iadh Ounis. This tool was built and used to report figures in the 2017 ADCS paper Risk-reward trade-offs in Rank Fusion by Rodger Benham and J. Shane Culpepper.
The tool supports outputting TRisk values in both inferential mode (across all topics), and exploratory mode (per-topic).
22/01/2018 - Fixed issue where the absolute value of negative t-values was not taken, resulting in probabilities outside of [0,1]. Fixed issue where script would not work without executing inside the folder.
To calculate the TRisk scores for a run compared to a baseline using the AP value reported in trec_eval:
trec_eval -q -M 1000 $QRELS $RUN | grep -E "^map" > run.trec trec_eval -q -M 1000 $QRELS $BASELINE | grep -E "^map" > baseline.trec ./trisk_trec.sh run.trec baseline.trec > out.txt
out.txt now appears as something similar:
alpha,urisk,trisk,pvalue 0.0,0.0767,8.8167,0.0 1.0,0.0615,5.8329,0.0 2.0,0.0462,3.6263,0.0003 3.0,0.031,2.0425,0.0422 4.0,0.0157,0.888,0.375 5.0,0.0005,0.0231,0.982 6.0,-0.0148,-0.643,1.48 7.0,-0.03,-1.1688,1.76 8.0,-0.0453,-1.5932,1.89 9.0,-0.0605,-1.942,1.95 10.0,-0.0758,-2.2334,1.97
The default mode is inferential mode, i.e. TRisk values are formulated across all topics. To get per-topic values supply
--per-topic as the final argument:
./trisk_trec.sh run.trec baseline.trec --per-topic alpha,topic,trisk,pvalue 0.0,301,0.988,0.324 0.0,302,0.484,0.629 0.0,303,-0.015,1.01 0.0,304,0.146,0.884 ...
To calculate the TRisk scores for a run compared to a baseline using the NDCG@10 evaluation metric:
perl gdeval.pl -k 10 -j 4 $QRELS $RUN > run.gdeval perl gdeval.pl -k 10 -j 4 $QRELS $BASELINE > baseline.gdeval ./trisk_gdeval.sh run.gdeval baseline.gdeval ndcg@10 > out.txt
--per-topic as the final argument as above to get exploratory values.
This tool is licensed under the MIT open-source. See LICENSE.txt for more details.