-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ensure parent endonyms exist for all countries and megacities #314
Conversation
Couple of open questions:
|
a6b3722
to
ec53081
Compare
caa16df
to
4559a14
Compare
I spent some more time testing this today, it works great, but there's another class of problem I hadn't considered which can be resolved with the same method. What I didn't realize is that the inverse of this issue is also a problem, where WOF uses the endonym as the primary label rather than English, which I had assumed to be a policy. So for example I expected to find Köln with the The issue in Pelias (autocomplete) is that you can find a record with The fix is very simple, actually I already wrote the code but had left it commented out: The new commit 1387a75 shows the changes this line makes to the dictionaries. I'll re-run the build and test again to ensure it's ready to merge |
Hi there, This PR reminds me of another one I did a few years ago pelias/whosonfirst#492 but I added all exonyms on WOF documents. The result was a bit disappointing for a world build
Endonyms seems to be a good first step anyway 👍 related: pelias/api#1296 |
This looks good, it adds about ~1% volume to the disk requirements and possibly some additional build time. Since this is behind a feature flag and demonstrates that the test cases pass, I'm happy to squash-and-merge this. There's still some opportunity to extend this PR in the future, since I know there's things other developers might want to add. |
0193bd4
to
acecaf6
Compare
This PR attempts to resolve a long-standing issue in Pelias where parent properties can only be specified in English (or in the 'default language').
For example querying for a country directly works fine, you can query for
Germany
,Deutschland
orAllemagne
to find Germany, the search logic usually targets the 'default language' and the target language of theUser-Agent
.The issue is when using the country name in support of another query, such as the example 10 Torstraße Germany which works as expected, but the query 10 Torstraße Deutschland fails.
This is really not ideal since it's very English-centric, in this German example it's particularly odd that the official language of the country isn't supported but English is.
The reason for this dates back to the original schema design back in ~2014, where the parent properties weren't modelled with the idea of multiple languages like the
name.*
fields were, so it's been tricky to fix.Coupled with that was the design of the PIP service and this repo wof-admin-lookup, the service is designed in such a way that it only ever loads and serves a single name for a place, changing this interface would be a breaking change that I don't have the bandwidth to tackle at the moment.
This PR provides some relief to the situation by providing dictionaries of Endonyms for countries and mega cities which will optionally be added as aliases to every record (under a pelias/config flag).
It's not clear at this stage what effect adding multiple aliases to half a billion records will have on the size of the index, performance and query quality, so for now I've pared it down to just countries and megacities.
In the future, depending on the success of this PR we can expand to cover Exonyms (likely only a subset of languages), however it may be preferable to reconsider the schema design at that point rather than clump all languages in the same field.
how it works:
src/data/aliases/country-language-map.json
file is generated using the providedsql
file from a WOF bundlesrc/data/aliases/{placetype}-endonyms.psv
file containing aliasesimports.adminLookup.useEndonyms
is enabled, the code in this PR is activatedparent.*
properties, add any missing aliases based on the parent ID previously assigned