Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filtering against dependency_a when using boundary.country parameter #1622

Merged
merged 1 commit into from
May 4, 2022

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented Apr 28, 2022

The boundary.country parameter is supported on all major Pelias API endpoints and adds a hard filter based on a given 2 or 3 character country code.

The behavior is pretty straightforward, but the underlying data model for country-like parents is not quite so simple.

While most records have a parent.country_a property with the country code of a parent, a few have parent.dependency_a. This includes places like US territories outside the 50 states, the French Overseas Territories, and various others that have their own country codes but may not be a completely sovereign country of their own.

In practice however, this distinction isn't super useful to most people. See the country_code parameter we added in #1541 for another case where it's helpful to provide an interface that glosses over some of the details.

Functionality

This PR makes the boundary.country param look for a matching country code in either the parent.dependency_a property or parent.country_a, using an Elasticsearch multi_match query. Previously only parent.country_a was considered.

There's really nothing tricky to it, the multi_match query allows either field to contain the desired country code.

As a side note, I realized that our autocomplete queries were using the parent.country_a.ngram field due to our work in #1264. While this should make little practical difference it's technically not needed, as the boundary.country param doesn't need to consider partial inputs. As a side effect this PR fixes that ever so slight departure from the ideal.

A note on implementation

As it stands, this PR includes a departure from the previous implementation, which used a boundary.country specific view in the pelias-query library.

Instead, the generic multi_match view is used, and all the Pelias API specific logic is contained here. It also means there's no need to release a companion PR to pelias-query. In the past we've found the dance of developing PRs to API and pelias-query in parallel to be a bit of extra work, and this pattern would eliminate that need. I'd love to see us moving towards having all the Pelias-specific query logic here in the API, while pelias-query contains only or mostly Elasticsearch-specific stuff.

That said it does mean that the way this filter parameter works is now different from most of the others, so it's worth discussing if that's ok with us.

Still to come

The /v1/reverse endpoint technically supports boundary.country as well (though the use case is fairly limited). This will come in a subsequent PR as the "coarse" part of the reverse endpoint doesn't support boundary.country at all. It might make sense to have further discussion there, including the possibility of removing the boundary.country param from the reverse endpoint.

orangejulius added a commit to pelias/acceptance-tests that referenced this pull request Apr 28, 2022
This adds tests cases for the `boundary.country` parameter on the
`search`, and `structured` API endpoints. Previously, we only tested
`autocomplete`.

In addition, it adds tests for records that only have
`parent.dependency_a` values, not `parent.country_a`, which require
pelias/api#1622 to pass.
Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I don't recall the specifics of the ngram field and its effect on matching the first character of a country code. Not really relevant for filtering but possible relevant if the subquery is used in a query context?

@orangejulius orangejulius marked this pull request as ready for review May 4, 2022 18:52
@orangejulius orangejulius merged commit 418bf04 into master May 4, 2022
@orangejulius orangejulius deleted the boundary_country_dependency branch May 4, 2022 18:54
orangejulius added a commit to pelias/wof-admin-lookup that referenced this pull request May 10, 2022
…ere possible

After pelias/api#1622, which extends the
`boundary.country` API parameter to include the `depencency` placetype,
we noticed that venue and address records often have a 2-character
country code when they are part of a dependency.

The `boundary.country` parameter currently checks only for 3-character
codes, and so the best solution is to ensure the relevant property on
all records in Elasticsearch is a 3-character country code whenever
possible.

This change expands on the logic for selecting an abbreviation for an
admin record to be used for point in polygon lookup. Logic specific to
countries is expanded to include dependencies, and some additional
possible fields that might contain 3-character codes are checked.

Because the abbreviaton logic is now a bit more substantial, it's
extracted into its own function. It's also been made a bit less
redundant and hopefully more clear.
orangejulius added a commit to pelias/wof-admin-lookup that referenced this pull request May 11, 2022
…ere possible

After pelias/api#1622, which extends the
`boundary.country` API parameter to include the `depencency` placetype,
we noticed that venue and address records often have a 2-character
country code when they are part of a dependency.

The `boundary.country` parameter currently checks only for 3-character
codes, and so the best solution is to ensure the relevant property on
all records in Elasticsearch is a 3-character country code whenever
possible.

This change expands on the logic for selecting an abbreviation for an
admin record to be used for point in polygon lookup. Logic specific to
countries is expanded to include dependencies, and some additional
possible fields that might contain 3-character codes are checked.

Because the abbreviaton logic is now a bit more substantial, it's
extracted into its own function. It's also been made a bit less
redundant and hopefully more clear.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants