Gold standard records for Arabic event data
We provide two "event detection" datasets for Arabic language event coding, one for ASSAULT events and one for PROTEST events. These events are coded using the PLOVER ontology, which is similar to the CAMEO ontology. THese files can be used to test an automated coder's ability to recognize these two event types in Arabic text.
assault_gsr.csv each have the following columns:
accept: the number of annotators accepting the event label as true
event_type: "ASSAULT" or "PROTEST", depending on the file
id: the ID number of the sentence
label: one of "yes", "easy no", "difficult no", or "ambiguous", depending on the set of labels provided by annotators. "yes" is unanimous accept, "easy no" is unanimous reject, "hard no" is mostly reject with a dissenting accept, and "ambiguous" are entries with insufficent labels to be sure.
reject: the number of annotators who rejected the label.
text: the text shown to the annotator and that should be provided to the event detection system
total: the total number of annotations provided on the sentence.
Span Recognition and Labeling
Another set of files (
information for the gold standard recognized events, consisting of the event
verb, the source actor and target actor spans, and resulting CAMEO actor codes
verb_gold report the common identified by
all coders as part of the span.
Petrarch validation format
The two files are also available in XML format, suitable for use in UniversalPetrarch.