Skip to content

feat(addressUsingIdsQuery): Add slop parameter#101

Closed
orangejulius wants to merge 1 commit intomasterfrom
address_search_with_ids_slop
Closed

feat(addressUsingIdsQuery): Add slop parameter#101
orangejulius wants to merge 1 commit intomasterfrom
address_search_with_ids_slop

Conversation

@orangejulius
Copy link
Member

This adds support for the match_phrase slop parameter.

The default is set to 0, so as it stands, this addition will not change the outcome of any queries.

This adds support for the `match_phrase` [slop
parameter](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html).

The default is set to 0, so as it stands, this addition will not change
the outcome of any queries.
orangejulius added a commit to pelias/api that referenced this pull request Jun 19, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |

// default address:street:slop to 0 if unset
if (!vs.isset('address:street:slop')) {
vs.var('address:street:slop', 0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@missinglink, I put this here as a shortcut. I notice that most of our code in API makes copies of the variable store rather than changing variables in views. Do you think that would be a better approach here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that if the variable was unset then we totally omit it from the query, rather than trying to set it to the default in ES.

The default slop set by elasticsearch is 0 but this could change at any time.

That's how most of the views I've written work, they have the 'mandatory' properties in the main blob and then a bunch of if conditions below which set optional params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to set defaults then the correct place to do this would be in the search_defaults file in API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg. #99

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I prefer the opposite, I explicitly wrote it intending to ensure the slop parameter is always in the query so that it's clear what its set to. As far as pelias/query is concerned, I believe the default should be set to zero. If Elasticsearch changes the default slop value that would be a huge pain and I'd prefer we avoid that (extremely slim) possibility.

The default in search_defaults in the API is going to be 1 to get the behavior we want, and that is handled in pelias/api#1322

Copy link
Member

@missinglink missinglink Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general principal I think that we should reserve the option of omission of optional properties.

There is a valid case for a developer wishing to rely on the default behaviour rather than specifying it explicitly.

I can see the benefit of setting it to 0, mainly for clarity (at the cost of verbosity).

Although I feel it is still better as an optional parameter where the caller (pelias/api) is able to make the decision about whether it will be included or excluded from the query.

Just my 2c :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny thing is that our existing phrase view does it as you suggest:

https://github.com/pelias/query/blob/master/view/phrase.js

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to me in the existing phrase.js view that we have two optional properties (slop and cutoff_frequency) which are handled differently.

I think we agree that we wouldn't want to set a default value for cutoff_frequency fuzziness et al?

There is something about slop which makes it feel like it's mandatory even though it isn't.

@orangejulius
Copy link
Member Author

I intend to add slop to more match_phrase queries in other Pelias query types soon. However to keep the PR small and the impacts easy to spot, I'd like to modify only a single Pelias query at a time.

orangejulius added a commit to pelias/api that referenced this pull request Jun 19, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |
orangejulius added a commit that referenced this pull request Oct 3, 2019
This change builds on the previous series of PRs
(#112,
and #111) to support the `slop`
parameter on the `match_phrase` queries responsible for handling street
addresses.

Replaces #101
Connects pelias/api#1322
orangejulius added a commit to pelias/api that referenced this pull request Oct 3, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |
orangejulius added a commit to pelias/api that referenced this pull request Oct 3, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |

Since pelias/query#113 has been merged, this
commit will result in changes to how queries behave.
@orangejulius orangejulius deleted the address_search_with_ids_slop branch October 3, 2019 18:39
orangejulius added a commit to pelias/api that referenced this pull request Oct 3, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |

Since pelias/query#113 has been merged, this
commit will result in changes to how queries behave.
orangejulius added a commit to pelias/api that referenced this pull request Oct 3, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |

Since pelias/query#113 has been merged, this
commit will result in changes to how queries behave.
orangejulius added a commit to pelias/api that referenced this pull request Oct 4, 2019
Combined with pelias/query#101, this change
allows for improved matching of streets when using the
addressSearchUsingIDs query (the first query executed by the search
endpoint).

Essentially, by allowing a `slop` of 1 in the `match_phrase` clauses of
the Elasticsearch query, its now possible to omit a single word _in the
middle_ of a street name and still match it. Omitting words at the start
or end of the street name has been supported since
pelias/schema#310.

Some examples of where this helps:

| Actual street name | Newly matching input text |
| ---                | ---                       |
| SE Martin Luther King Jr Blvd | Martin Luther King Blvd |
| Carrer de Balmes | Carrer Balmes |

Since pelias/query#113 has been merged, this
commit will result in changes to how queries behave.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants