Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use libpostal parses for venue queries where available #1380

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

We're currently not using libpostal parses for venues, if we see a venue parse we're falling back to the native parser.
I don't remember the history of this but it seems wrong to me 🤷‍♂

I noticed this when looking into some bug reports, one example being "Café Pelias".
There are two things currently going wrong with this query:

  • libpostal parses it correctly but the query fails with the message No query to call ES with. Skipping
  • upon falling back to the native parser it parses it incorrectly (I'll open a separate issue for this on that repo)

So regarding the first point, I don't see why we would throw away the venue parse here from libpostal:

"parsed_text": {
  "query": "Café Pelias"
}

The query label has actually been mapped from the libpostal house field in controller/libpostal, but this field indicates a venue name.

As you can see from the PR edits, we don't currently consider these venue parses for query generation and I'm not sure why, I believe that libpostal is still superior to the native parser when it comes to venue queries and has always been better than addressit was?

Thoughts?

@missinglink
Copy link
Member Author

The only reason I can think of for the existing behaviour is if libpostal erroneously identifies things as house and we were trying to guard against that?

@missinglink
Copy link
Member Author

missinglink commented Sep 23, 2020

I rebased this and put it up on dev today, it fixes the "vanity addresses" issue we've been discussing:

Screenshot 2020-09-23 at 11 33 39

Screenshot 2020-09-23 at 11 33 04

Screenshot 2020-09-23 at 11 35 50

cc/ @blackmad

@missinglink
Copy link
Member Author

linked pelias/acceptance-tests#533

@missinglink
Copy link
Member Author

missinglink commented Sep 23, 2020

I ran the full acceptance test suite on this today and there were actually quite a few improvements, but at the same time it highlighted some issues.

diff of changes vs. production: https://www.diffchecker.com/5Faotyih (ignore any errors related to /v1/reverse)

screenshots of some issues inherited from libpostal:

Screenshot 2020-09-23 at 15 26 06

Screenshot 2020-09-23 at 15 21 23

@orangejulius
Copy link
Member

Yeah, I suspect there are two reasons why this was never implemented in the past:

  • a lot of our early libpostal work was done with little concern for venues, we were really thinking mostly about addresses
  • There are surely many cases where libpostal doesn't do a great job accurately detecting venues. Either false positives or false negatives would impact results in ways that are difficult to fix.

The first reason is obviously not a good one, but I imagine the hard part of actually merging this will be ensuring there aren't too many cases where, for example, something that is very much not a venue query, like one for an admin area or address, will be made worse.

@missinglink
Copy link
Member Author

Right, so the question is "which parser does a better job of venues?" and the answer is "no" 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants