MPEDS Annotation Interface
This is the annotation interface used in creating datasets for the Machine-learning Protest Event Data System (MPEDS). While applied to the specific task of coding for protest events, this can also be used for the development of other types of event datasets.
This system is built in Python using the Flask microframework. It can source articles parsed from Lexis-Nexis (using the
split-ln.py script), Apache Solr, or XML files formatted in News Industry Text Format, such as the LDC's New York Times Annotated Corpus.
To populate the database with example information, first run the setup script.
This will add five users: an admin (admin), two first-pass coders (coder1p_1, coder1p_2), and two second-pass coders (coder2p_1, coder2p_2). They will all have the password
default). It will add a variable hierarchy for second-pass coding. It will also enter metadata for all the articles in the
example-articles directory, and queue them up for the first-pass coders.
Then run the Flask test server with the following.
- MPEDS: Automating the Generation of Protest Event Data. 2017. SocArXiv
This is a product in early alpha stages. Features we hope to have working eventually:
- Template system for variables
- Ability to specify multiple article sources
- Generalizing an n-pass structure and control flow
- Ability for multiple database integration
- Cross-browser compatibility
If you're a movement or event data scholar and have a specific project for which you think this would be a good tool, shoot Alex Hanna (email@example.com) a message.
Development of this interface has been supported by a National Science Foundation Graduate Research Fellowship and National Science Foundation grant SES-1423784. Thanks to Emanuel Ubert and Katie Fallon for working with this system since its inception, and to many undergraduate annotators who have put a lot of time working with and refining this system.