Skip to content

Suite of universal indexes for Highly Repetitive Document Collections

License

Notifications You must be signed in to change notification settings

migumar2/uiHRDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

uiHRDC

uiHRDC is a C/C++ reproducibility framework which comprises a varied set of techniques for indexing highly repetitive document collections, and all scripts required to replicate the experimental setup proposed in:

  • F. Claude, A. Fariña, M.A. Martinez-Prieto, and G. Navarro. Universal Indexes for Highly Repetitive Document Collections. Information Systems. Volume 61, pages 1-23, 2016. (https://doi.org/10.1016/j.is.2016.04.002)

uiHRDC includes non-positional and positional inverted indexes, which perform multiple forms of compression, and three families of self-indexes. A more detailled description of all these techniques can be found in the aforementioned paper.

This repository contains a Dockerfile which creates the reproducibility environment, including all dependencies required to compile and run our (self-)indexes. On the other hand, the folder uiHRDC organizes the corresponding sources, and also provides some test collections and query patterns to evaluate different retrieval operations.

More information about each proposed technique can be found in its directory. Nevertheless, if you have any doubt or need more information, please feel free to contact us:


NOTE: We were also invited to write reproducibility companion paper in Information Systems. In this second paper we give a brief summary of the techniques in the previous paper, and include further details regarding how our experiments can be reproduced by using our uiHRDC framework. For more details please see/cite:

  • A. Fariña, M.A. Martinez-Prieto, F. Claude, G. Navarro, J.J Lastra-Díaz, N. Prezza, and D. Seco. On the Reproducibility of Experiments of Indexing Repetitive Document Collections. Information Systems, volume 83, pages 181-194, 2019. (https://doi.org/10.1016/j.is.2019.03.007)

About

Suite of universal indexes for Highly Repetitive Document Collections

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published