Sample dataset for use with HiCAL
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
athome4.qrel.sample
athome4.topics.sample
athome4_sample.tgz
functions.py
process.py

README.md

Description of sampled test collection

The text collection in this repository is a sample of the Athome4 collection, which was used in the TREC 2016 Total Recall Track [1]. The original dataset contains 290,000 Jeb Bush emails and 34 topics.

We provided 9 topics (athome4.topics.sample), 50000 sampled documents (athome4_sample.tgz), and sampled relevance judgments (athome4.qrel.sample) for this sampled test collection.

Extract paragraphs for full documents

python3 process.py athome4_sample.tgz

[1] Grossman, Maura R., Gordon V. Cormack, and Adam Roegiest. "TREC 2016 Total Recall Track Overview." TREC. 2016.