TAC KBP English Entity Linking Comprehensive Training and Evaluation Data 2010
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
TAC_KBP_English_Entity_Linking_Comprehensive_Training_and_Evaluation_Data_2010.7z

README.md

What About?

The Knowledge Base Population (KBP) Track at TAC 2010 will explore extraction of information about entities with reference to an external knowledge source. Using basic schema for persons, organizations, and locations, nodes in an ontology must be created and populated using unstructured information found in text. A collection of Wikipedia Infoboxes will serve as a rudimentary initial knowledge representation.

What Contains?

./contents.txt

./eval/tac_kbp_2010_english_entity_linking_evaluation_queries.xml

This file contains 2250 queries. Each query entry consists of the following fields:

<query id> - A query ID formatted as the letters "EL" plus
             a six-digit zero-padded integer (e.g., "EL000014").

<name>     - The full namestring of the query entity.

<docid>    - An ID for a document in ./data/2010/eval/source_documents/
             from which the namestring was extracted.

./eval/tac_kbp_2010_english_entity_linking_evaluation_KB_links.tab

This file contains the responses for each query as identified by human annotators at LDC. The KB links file is tab delimited, with 5 fields total. The column descriptions are as follows:

1. query ID    - The ID for the query detailed in
                 tac_kbp_2010_english_entity_linking_evaluation_queries.xml
                 to which the subsequent information pertains

2. entity ID   - An entity node ID or unique NIL ID, correspondent
                 to entity linking annotation and NIL-coreference
                 (clustering) annotation respectively. If the entity
                 node ID begins with "E", the text refers to an
                 entity in the Knowledge Base (TAC KBP Reference
                 Knowledge Base - LDC2014T16). If the given query is
                 not linked to an entity in the Knowledge Base (KB),
                 then it is given a NIL-ID, which consists of "NIL"
                 plus a four-digit zero-padded sequentially assigned
                 integer (e.g. NIL0001, NIL0002). Both the entities
                 with an entity node ID of "E" type and "NIL" type
                 are assumed to be co-referenced (clustered), with
                 the same "E" type ID or the same "NIL" ID if they
                 refer to the same entity. Each "E" type ID and NIL
                 ID is distinct from one another.

3. entity-type - GPE, ORG, or PER type indicator for the entity

4. web-search  - (YES/NO) indicating whether the annotator made
                 use of web searches in order to make the linking
                 judgment.

5. genre       - WL/NW indicating the source genre of the
                 document for the query (WL for web data, NW for
                 newswire data).

./eval/source_documents/*

This directory contains all of the source documents listed in the of tac_kbp_2010_english_entity_linking_evaluation_queries.xml

./data/2010/IAA_study_results/tac_kbp_2010_english_entity_linking_IAA_queries_ann{1,2,3}.xml

These files contain the queries used in the TAC 2010 KBP Entity Linking Inter-Annotator Agreement Study. These queries were selected from the TAC 2010 KBP entity linking evaluation query set. To perform the annotation for the study, LDC recruited 3 annotators who had not previously worked on KBP-related annotation.

Each of these three files contain the same 200 queries. Each query entry consists of the following fields:

<query id> - A query ID formatted as the letters "EL" plus
             a six-digit zero-padded integer (e.g., "EL000001").

<name>     - The full namestring of the query entity.

<docid>    - An ID for a document in ./data/2010/IAA_study_results/source_documents/
             from which the namestring was extracted.

<entity>   - An entity node ID or 'NIL'. Entity node IDs that 
             begin with "E" refer to an entity in the Knowledge 
             Base (TAC KBP Reference Knowledge Base - LDC2014T16).
             If the given query is not linked to an entity in the 
             Knowledge Base (KB), then it is marked 'NIL'. This
             is the only field of which the contents may differ
             between the three queries files.

./IAA_study_results/tac_kbp_2010_english_entity_linking_IAA_judgments_ann{1,2,3}.tab

This file contains the responses for each query as identified by the annotators who participated in the IAA study. The judgments files are tab delimited, with 3 fields total. The column descriptions are as follows:

1. query ID    - The ID for the query detailed in
                 tac_kbp_2010_english_entity_linking_IAA_queries_ann{1,2,3}.xml
                 to which the subsequent information pertains

2. judgment    - (YES/NO/MAYBE/BAD) indicating the annotator's
                 judgment of whether: the entity has a valid KB
                 link (YES); the entity does not have a valid KB
                 link (NO); the annotator is unsure about the KB
                 linking (MAYBE); or the annotator has judged the
                 entity to be unsuitable for entity linking (BAD)

3. use-web     - (YES/NO) indicating whether the annotator made
                 use of web searches in order to make the
                 judgment in column 2

./IAA_study_results/tac_kbp_2010_english_entity_linking_IAA_KB_links.tab

This file contains the responses for each query in tac_kbp_2010_english_entity_linking_IAA_queries_ann{1,2,3}.xml as identified by the original LDC annotators who produced the queries for the 2010 evaluation. The KB links file is tab delimited, with 5 fields total. The column descriptions are as follows:

1. query ID    - The ID for the query detailed in
                 tac_kbp_2010_english_entity_linking_IAA_queries_ann{1,2,3}.xml
                 to which the subsequent information pertains

2. entity ID   - An entity node ID or 'NIL'. Entity node IDs that
                 begin with "E" refer to an entity in the Knowledge
                 Base (TAC KBP Reference Knowledge Base - LDC2014T16).
                 If the given query is not linked to an entity in the
                 Knowledge Base (KB), then it is marked 'NIL'. All the 
                 entities with an entity node ID of "E" type are 
                 assumed to be co-referenced (clustered) with the 
                 other entities that have the same "E" type ID. Each 
                 "E" type ID is distinct from one another.

3. entity-type - GPE, ORG, or PER type indicator for the entity

4. web-search  - (YES/NO) indicating whether the annotator made
                 use of web searches in order to make the linking
                 judgment.

5. genre       - WL/NW indicating the source genre of the
                 document for the query (WL for web data, NW for
                 newswire data).

./IAA_study_results/source_documents/*

This directory contains all of the source documents listed in the of tac_kbp_2010_english_entity_linking_IAA_queries_ann{1,2,3}.xml

./training/tac_kbp_2010_english_entity_linking_training_queries.xml

This file contains 1500 queries. Each query entry consists of the following fields:

<query id> - A query ID formatted as the letters "EL" plus
             a five-digit zero-padded integer (e.g., "EL00003").

<name>     - The full namestring of the query entity.

<docid>    - An ID for a document in ./data/2010/training/source_documents/
             from which the namestring was extracted.

<entity>   - An entity node ID or 'NIL'. Entity node IDs that
             begin with "E" refer to an entity in the Knowledge
             Base (TAC KBP Reference Knowledge Base - LDC2014T16).
             If the given query is not linked to an entity in the
             Knowledge Base (KB), then it is marked 'NIL'.

./training/tac_kbp_2010_english_entity_linking_training_KB_links.tab

This file contains the responses for each query as identified by human annotators at LDC. The KB links file is tab delimited, with 3 fields total. The column descriptions are as follows:

1. query ID    - The ID for the query detailed in
                 tac_kbp_2010_english_entity_linking_training_queries.xml
                 to which the subsequent information pertains

2. entity ID   - An entity node ID or 'NIL'. Entity node IDs that
                 begin with "E" refer to an entity in the Knowledge
                 Base (TAC KBP Reference Knowledge Base - LDC2014T16).
                 If the given query is not linked to an entity in the
                 Knowledge Base (KB), then it is marked 'NIL'. All the
                 entities with an entity node ID of "E" type are
                 assumed to be co-referenced (clustered) with the
                 other entities that have the same "E" type ID. Each
                 "E" type ID is distinct from one another.

3. entity-type - GPE, ORG, or PER type indicator for the entity

./training/source_documents/*

This directory contains all of the source documents listed in the of tac_kbp_2010_english_entity_linking_training_queries.xml