Skip to content

Commit

Permalink
Merge pull request #33 from openeventdata/paper
Browse files Browse the repository at this point in the history
Paper
  • Loading branch information
johnb30 committed Nov 1, 2016
2 parents b849348 + 343e8aa commit 8589baa
Show file tree
Hide file tree
Showing 2 changed files with 73 additions and 0 deletions.
26 changes: 26 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
%% This BibTeX bibliography file was created using BibDesk.
%% http://bibdesk.sourceforge.net/
%% Created for John Beieler at 2016-10-23 17:47:58 -0400
%% Saved with string encoding Unicode (UTF-8)
@unpublished{tabari,
Author = {Philip A. Schrodt},
Date-Added = {2016-10-23 21:47:51 +0000},
Date-Modified = {2016-10-23 21:47:55 +0000},
Note = {Paper presented at the International Studies Association, Chicago, 21-24 February 2001},
Title = {Automated Coding of International Event Data Using Sparse Parsing Techniques},
Year = {2001}}

@unpublished{cameo,
Author = {Deborah J. Gerner and Philip A. Schrodt and Omur Yilmaz and Rajaa Abu-Jabr},
Date-Added = {2016-10-23 21:47:14 +0000},
Date-Modified = {2016-10-23 21:47:14 +0000},
Note = {American Political Science Association, Boston, August 2002},
Title = {Conflict and Mediation Event Observations (CAMEO): A New Event Data Framework for the Analysis of Foreign Policy Interactions},
Year = {2001}}
47 changes: 47 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: 'PETRARCH2: Another Event Coding Program'
tags:
- event coding
- natural language processing
- computational linguistics
authors:
- name: Clayton Norris
orcid: 0000-0001-5907-757X
affiliation: 1
- name: Philip Schrodt
orcid: 0000-0003-3495-4198
affiliation: 2
- name: John Beieler
orcid: 0000-0001-7811-4399
affiliation: 3
affiliations:
- name: University of Chicago
index: 1
- name: Parus Analytics
index: 2
- name: Human Language Technology Center of Excellence<br />Johns Hopkins University
index: 3
date: 23 October 2016
bibliography: paper.bib
---

# Summary

The PETRARCH2 coding program implements a new coding algorithm, based on a
syntactic constiuency parse, to extract who-did-what-to-whom political event data from
structured news stories. Events are coded according to the CAMEO [@cameo] coding
ontology. This software improves upon previous-generation coding software
such as TABARI [@tabari] by using a deep syntactic parse rather than shallow
parsing.

At the level of assigning codes, PETRARCH2 is largely dictionary based, working from extensive
dictionaries of verb phrases to identify the type of event, and noun phrases to
identify both the actor (generally a proper noun such as the name of a country or
leader) and agent (generally a common noun identifying a role such as "police" or
"protesters"). These dictionaries incorporate the synonym sets from WordNet, are
open source, and are included in the distribution.

PETRARCH2 has primarily been run using Treebank output from the Stanford CoreNLP
system. It can be integrated with other software on the https://github.com/openeventdata/ site
to handle either continuous near-real-time coding or batch coding, as well as
auxiliary programs for geolocation and simple deduplication.

0 comments on commit 8589baa

Please sign in to comment.