Skip to content
No description, website, or topics provided.
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CMR_char_offset_qrels.txt
CMR_char_offset_queries.txt
README.md
doc_ids.txt
doc_sentence_char_offsets.tar.gz

README.md

CoreferentMentionRetrieval

This is the test collection for the task of Coreferent Mention Retrieval defined by Sankepally et al. in "A Test Collection for Coreferent Mention Retrieval." ACM SIGIR 2018.
Following are the file names and their contents:

  • doc_ids.txt : The Document IDs from the subset of documents in the TAC 2014 EDL collection (LDC catalog number: LDC2014E13)
  • doc_sentence_char_offsets.tar.gz : This compressed file contains file with sentence boundaries. When uncompressed it is 150MBs in size. Character offsets for the sentence boundaries for all documents in doc_ids.txt are specified in this format: [docid]:sentence_start_offset:sentence_ending_offset [unique sentence identifier].
  • CMR_char_offset_queries.txt : Each line has a mention query in the following format:
    [query ID] [query type] [doc_id:sentence_start_offset:sentence_ending_offset:token_start_offset:token_ending_offset] [query mention string]
  • CMR_char_offset_qrels.txt: Each line has a mention query in the following format:
    [query ID] [query type] [doc_id:sentence_start_offset:sentence_ending_offset] [binary relevance judgment]

Evaluation

You can use the latest trec_eval for evaluation.
After making sure your results file is in TREC format and has no duplicate lines, you can run:
./trec_eval -q -M100 -m infAP batch2_char_offset_qrels.txt [your_result_file_name]

You can’t perform that action at this time.