Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use geographic binary relation info/context #5

Closed
ahalterman opened this issue Jun 28, 2016 · 2 comments
Closed

Use geographic binary relation info/context #5

ahalterman opened this issue Jun 28, 2016 · 2 comments

Comments

@ahalterman
Copy link
Member

Lots of sentences with geographic information are structured like "X, a town 30 km south of Y", or "X, a neighborhood in Y". In both cases, we want to:

  • code X, not Y
  • but potentially use Y to help find X

Neither MITIE's binary relation detection nor Freebase have this. We could use parse info, but that would be tricky and require lots of labeled examples. Thoughts?

@philip-schrodt
Copy link

How many combinations of "X, etc Y" are there?: I'm guessing a dictionary-based approach would be fairly effective. Presumably the phrases follow the usual rank-size distribution. We don't really need a parse since we've got the commas, and we can automatically generate the candidate phrases with a simple regex search (maybe with some simple markup first. But not a full parse).

Alternatively, try to generate a conditional random field model or something similar to catch these. But I'd try getting the candidate phrases first and see how many we've got.

@PTB-OEDA
Copy link
Member

Maryam here at UTD has already tried the CRF approach to get at subnational
locations. It was not a very successful exercise.

Spoke with her and Andy about this today. He is making other modifications
to Mordecai as well.

Further issue is that we need more labelled training data. Talking with C.
Fariss about this via email to see if we can employ some HR text data he
has recently published.

On Tue, Jun 28, 2016 at 4:39 PM, Philip Schrodt notifications@github.com
wrote:

How many combinations of "X, etc Y" are there?: I'm guessing a
dictionary-based approach would be fairly effective. Presumably the phrases
follow the usual rank-size distribution. We don't really need a parse since
we've got the commas, and we can automatically generate the candidate
phrases with a simple regex search (maybe with some simple markup first.
But not a full parse).

Alternatively, try to generate a conditional random field model or
something similar to catch these. But I'd try getting the candidate phrases
first and see how many we've got.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AJrP1lrVyfI9ZpWoouEO8vb55hnCsIpSks5qQZR-gaJpZM4JAhfH
.

Patrick T. Brandt
Professor
Political Science
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Personal site: http://www.utdallas.edu/~pbrandt
MSBVAR site: http://yule.utdallas.edu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants