Skip to content
Suite of universal indexes for Highly Repetitive Document Collections
C++ C Shell HTML Makefile MATLAB Other
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
uiHRDC
Dockerfile
LICENSE.md
README.md

README.md

uiHRDC

uiHRDC is a C/C++ reproducibility framework which comprises a varied set of techniques for indexing highly repetitive document collections, and all scripts required to replicate the experimental setup proposed in:

  • F. Claude, A. Fariña, M.A. Martinez-Prieto, and G. Navarro. Universal Indexes for Highly Repetitive Document Collections. Information Systems 61:1-23, 2016. (https://doi.org/10.1016/j.is.2016.04.002)

uiHRDC includes non-positional and positional inverted indexes, which perform multiple forms of compression, and three families of self-indexes. A more detailled description of all these techniques can be found in the aforementioned paper.

This repository contains a Dockerfile which creates the reproducibility environment, including all dependencies required to compile and run our (self-)indexes. On the other hand, the folder uiHRDC organizes the corresponding sources, and also provides some test collections and query patterns to evaluate different retrieval operations.

More information about each proposed technique can be found in its directory. Nevertheless, if you have any doubt or need more information, please feel free to contact us:


NOTE: We have also been invited to write reproducibility companion paper in Information Systems. In this second paper we give a brief summary of the techniques in the previous paper, and include further details regarding how our experiments can be reproduced by using our uiHRDC framework. For more details please see/cite:

  • A. Fariña, M.A. Martinez-Prieto, F. Claude, and G. Navarro. On the Reproducibility of Experiments of Indexing Repetitive Document Collections. Information Systems. To appear, 2019.
You can’t perform that action at this time.