Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Petrarch2 take care of Event Coreference Resolution? #16

Closed
Madity opened this issue Jun 10, 2016 · 4 comments
Closed

Does Petrarch2 take care of Event Coreference Resolution? #16

Madity opened this issue Jun 10, 2016 · 4 comments

Comments

@Madity
Copy link

Madity commented Jun 10, 2016

No description provided.

@johnb30
Copy link
Member

johnb30 commented Jun 10, 2016

It depends on what you mean by event coreference resolution. If you mean something like cross-doc or cross-sent linkage of events, then no. If you mean will PETRARCH2 return multiples of the same coded event per sentence, then also no.

@philip-schrodt
Copy link
Contributor

By event co-resolution, do you mean determine if multiple texts that code to the same event tuple refer to the same thing? If so, no: in most of the work up until the past five years, event data sets were generally coded from a single source (typically Reuters or Agence France Press for the machine-coded data, New York Times in the human-coded systems prior to that), and this wasn't a big issue because it was fairly easy to detect multiple stories reporting on the same actions. With the advent of sets generated from large numbers of sources (ICEWS, Phoenix) it is a very big issue, and the "one-a-day" filter method that most systems use (including the Phoenix pipeline; ICEWS apparently does no deduplication) has some decided drawbacks: this paper (http://eventdata.parusanalytics.com/papers.dir/Schrodt.TAD-NYU.EventData.pdf) discusses the issue in detail. There's an emerging consensus that we need to do document-level resolution first, either by de-duplication (large NLP literature on this) or clustering (some method similar to Google News or European Media Monitor), but we haven't worked out any open source solutions for this yet.

@Madity
Copy link
Author

Madity commented Jun 11, 2016

Thanks for the clarification!
@philip-schrodt I was thinking on the grounds of sets generated from a large number of sources. Can anything be done on this grounds because with the advent of big data, datasets coded from a single source may not be sufficient, since considering the sets from multiple sources would provide more insight into the events.

@johnb30
Copy link
Member

johnb30 commented Jun 12, 2016

PETRARCH does that within a sentence. For cross-sentence things we apply a daily one-a-day filter to the final output generated. See the phoenix_pipeline for more details on that. Specifically, this script. In other words, PETRARCh aims to do one thing: code event data. Pre- or post-processing is designed to occur elsewhere.

@johnb30 johnb30 closed this as completed Jun 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants