Correlation Coefficients for Information Retrieval
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R
logo
man
tests
.Rbuildignore
.gitignore
.travis.yml
CHANGELOG
DESCRIPTION
LICENSE
NAMESPACE
README.md
cran-comments.md
ircor.Rproj

README.md

Travis-CI Build Status License CRAN version CRAN downloads

ircor

Provides implementation of various correlation coefficients of common use in Information Retrieval, such as Kendall and AP correlation coefficients, with and without ties.

For reference please refer to Julián Urbano and Mónica Marrero, "The Treatment of Ties in AP Correlation", ACM ICTIR, 2017.

Installation

You may install the stable release from CRAN

install.packages("ircor")

or the latest development version from GitHub

devtools::install_github("julian-urbano/ircor", ref = "develop")

Usage

tau and tauAP implement the Kendall tau and Yilmaz tauAP correlation coefficients, where no ties are allowed between items:

x <- c(0.06, 0.2, 0.27, 0.37, 0.57, 0.63, 0.66, 0.9, 0.91, 0.94)
y <- c(0.37, 0.06, 0.2, 0.27, 0.57, 0.66, 0.63, 0.91, 0.9, 0.94)
tau(x,y)
# 0.7777778
tauAP(x,y)
# 0.7491182

In tauAP it is important to use the correct sorting order. By default, items are sorted in decreasing order, as should be for instance if the scores represent system effectiveness. When they should be sorted in increasing order, decreasing should be set to FALSE:

# these two calls are equivalent
tauAP(x,y)
# 0.7491182
tauAP(-x,-y, decreasing = FALSE)
# 0.7491182

tau_a and tauAP_a are versions to use when x represents a true ranking without ties, and y represents a ranking estimated by an observer who is allowed to produce ties. They can be used as a measure of accuracy of the observer with respect to the true ranking.

y <- round(y*5)/5 # simulate ties in y
tau_a(x,y)
# 0.7111111
tauAP_a(x,y)
# 0.6074515

tau_b and tauAP_b are versions to use under the assumption that both x and y represent rankings estimated by two observers who may produce ties. They can be used as a measure of agreement between the observers:

x <- round(x*5)/5 # simulate ties in x as well
tau_b(x,y)
# 0.75
tauAP_b(x,y)
# 0.6269841

License

ircor is released under the terms of the MIT License.

If you use this code in your work, please cite the following paper:

@inproceedings{urbano2017ties,
  author = {Urbano, Juli{\'{a}}n and Marrero, M{\'{o}}nica},
  booktitle = {ACM SIGIR International Conference on the Theory of Information Retrieval},
  pages = {321--324},
  title = {{The Treatment of Ties in AP Correlation}},
  year = {2017}
}