Use geographic binary relation info/context #5

ahalterman · 2016-06-28T20:51:13Z

Lots of sentences with geographic information are structured like "X, a town 30 km south of Y", or "X, a neighborhood in Y". In both cases, we want to:

code X, not Y
but potentially use Y to help find X

Neither MITIE's binary relation detection nor Freebase have this. We could use parse info, but that would be tricky and require lots of labeled examples. Thoughts?

philip-schrodt · 2016-06-28T21:39:10Z

How many combinations of "X, etc Y" are there?: I'm guessing a dictionary-based approach would be fairly effective. Presumably the phrases follow the usual rank-size distribution. We don't really need a parse since we've got the commas, and we can automatically generate the candidate phrases with a simple regex search (maybe with some simple markup first. But not a full parse).

Alternatively, try to generate a conditional random field model or something similar to catch these. But I'd try getting the candidate phrases first and see how many we've got.

PTB-OEDA · 2016-06-28T21:42:17Z

Maryam here at UTD has already tried the CRF approach to get at subnational
locations. It was not a very successful exercise.

Spoke with her and Andy about this today. He is making other modifications
to Mordecai as well.

Further issue is that we need more labelled training data. Talking with C.
Fariss about this via email to see if we can employ some HR text data he
has recently published.

On Tue, Jun 28, 2016 at 4:39 PM, Philip Schrodt notifications@github.com
wrote:

How many combinations of "X, etc Y" are there?: I'm guessing a
dictionary-based approach would be fairly effective. Presumably the phrases
follow the usual rank-size distribution. We don't really need a parse since
we've got the commas, and we can automatically generate the candidate
phrases with a simple regex search (maybe with some simple markup first.
But not a full parse).

Alternatively, try to generate a conditional random field model or
something similar to catch these. But I'd try getting the candidate phrases
first and see how many we've got.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AJrP1lrVyfI9ZpWoouEO8vb55hnCsIpSks5qQZR-gaJpZM4JAhfH
.

Patrick T. Brandt
Professor
Political Science
School of Economic, Political and Policy Sciences
University of Texas at Dallas
Personal site: http://www.utdallas.edu/~pbrandt
MSBVAR site: http://yule.utdallas.edu

ahalterman added the enhancement label Oct 26, 2017

ahalterman closed this as completed Jun 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use geographic binary relation info/context #5

Use geographic binary relation info/context #5

ahalterman commented Jun 28, 2016

philip-schrodt commented Jun 28, 2016

PTB-OEDA commented Jun 28, 2016

Use geographic binary relation info/context #5

Use geographic binary relation info/context #5

Comments

ahalterman commented Jun 28, 2016

philip-schrodt commented Jun 28, 2016

PTB-OEDA commented Jun 28, 2016