Provides tools to measure the reliability of an Information Retrieval test collection. It allows users to estimate reliability using Generalizability Theory and map those estimates onto well-known indicators such as Kendall tau correlation or sensitivity.
For a full background please refer to Julián Urbano, Mónica Marrero and Diego Martín, "On the Measurement of Test Collection Reliability", ACM SIGIR, 2013.
You may install the stable release from CRAN
install.packages("gt4ireval")
or the latest development version from GitHub
devtools::install_github("julian-urbano/gt4ireval", ref = "develop")
A full user manual in available in the package vignette.
As a very simple example, we can analyze the TREC-3 Ad hoc data:
head(adhoc3)
# sys1 sys2 sys3 sys4 sys5 sys6 sys7 ...
# 1 0.2830 0.5163 0.4810 0.5737 0.5184 0.4945 0.5013 ...
# 2 0.0168 0.5442 0.3987 0.2964 0.6115 0.2354 0.1689 ...
# ...
We first run a G-study,
ah3.g <- gstudy(adhoc3, drop = 0.25)
ah3.g
# Summary of G-Study
#
# Systems Queries Interaction
# ----------- ----------- -----------
# Variance 0.0028117 0.028093 0.010152
# Variance(%) 6.8482 68.425 24.727
# ---
# Mean Sq. 0.15074 0.85296 0.010152
# Sample size 30 50 1500
then a D-study,
dstudy(ah3.g, queries = c(50, 100, 150))
# Summary of D-Study
#
# Call:
# queries = 50
# stability = 0.95
# alpha = 0.025
#
# Stability:
# Erho2 Phi
# ----------------------------------- -----------------------------------
# Queries Expected Lower Upper Expected Lower Upper
# ----------- ----------- ----------- ----------- ----------- ----------- -----------
# 50 0.93265 0.89311 0.96287 0.78613 0.66141 0.88039
# 100 0.96515 0.94354 0.98109 0.88026 0.79621 0.93639
# 150 0.97649 0.96164 0.98731 0.91686 0.85423 0.95668
#
# Required number of queries:
# Erho2 Phi
# ----------------------------------- -----------------------------------
# Stability Expected Lower Upper Expected Lower Upper
# ----------- ----------- ----------- ----------- ----------- ----------- -----------
# 0.95 69 37 114 259 130 487
and possibly map onto AP correlation, for instance,
gt2tauAP(Erho2 = c(0.93, 0.95, 0.98))
# [1] 0.7487836 0.8150692 0.9226192
gt4ireval
is released under the terms of the MIT License.
If you use this code in your work, please cite the following paper:
@inproceedings{Urbano2013measurement,
author = {Urbano, Juli\'{a}n and Marrero, M\'{o}nica and Mart\'{\i}n, Diego},
booktitle = {International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {393--402},
title = {{On the Measurement of Test Collection Reliability}},
year = {2013}
}
This work is supported by an A4U postdoctoral grant and a Juan de la Cierva postdoctoral fellowship.