autocomplete milestone #526

Merged
merged 23 commits into from Apr 29, 2016

Conversation

Projects
None yet
3 participants
@missinglink
Member

missinglink commented Apr 25, 2016

This PR contains all the work that's gone in to the Autocomplete Improvements Milestone, it is paired with a corresponding schema update.

The PR is focused around fixing bugs and flaky behavior in /v1/autocomplete that's been reported by our users, I've also taken the opportunity to triage all the incoming requests for autocomplete, clean up some messy parts and add more code coverage.

The major difference in behavior is due to how we handle analysis and especially synonym substitution for autocomplete, an example of a query which was not previously possible using phrase matching:

improved analysis

/v1/autocomplete?text=vic uni wellington

1)  Victoria University of Wellington, New Zealand
...

more information about the analysis changes can be found in: pelias/schema#105

final token is a stopword bug

This reported bug has been squashed: pelias/pelias#211

/v1/autocomplete?text=green lane

 1) Green Lane, PA, USA
 2) Green Lane Farms, Lower Allen, PA, USA
...

improved server-side tokenizer

Improved parsing of source data containing commas or slashes as delimiters, such as in Bedell Street/133rd Avenue. see: pelias/schema#113

improved client-side tokenizer

In order to generated the best/most efficient queries possible, we've added a client-size tokenizer which mimics the functionality of the server-side tokenizer. see: #529

improved handling of ordinals

This allows users to partially type an address such as "33rd street" as "33r", see pelias/schema#96

improved admin matching, particularly for non-delimited queries

this PR also includes a fix for parent.borough not being targeted during admin patching. see #527

/v1/autocomplete?text=200 dean st brooklyn

 1) 200 Dean Street, Brooklyn, New York, NY, USA

indexing of single digit numbers

in the past we did not index single digit numbers, they are now present in the index without causing a major increase in disk/ram usage.

/v1/autocomplete?text=grolmanstr 8

1)  Grolmanstr. 8, Köln, Germany
...

tighter 'focus' clustering and improved localized matching

see: http://missinglink.embed.s3.amazonaws.com/pelias_clustering.png

house number can be provided in 1,2 or 3rd position

mainly for for Germanic addresses schemes. see #489

/v1/autocomplete?text=wilmersdorfer str 51

1)  Wilmersdorfer Straße 51, Kiel, Germany
...

boosting of exact matches

an additional query segment was added to ensure that outputs that more closely match the input text appear higher than those which only partially match.

query/autocomplete_defaults.js
@@ -82,6 +82,10 @@ module.exports = _.merge({}, peliasQuery.defaults, {
'admin:neighbourhood:field': 'parent.neighbourhood',
'admin:neighbourhood:boost': 200,
+ 'admin:borough:analyzer': 'peliasAdmin',
+ 'admin:borough:field': 'parent.borough',
+ 'admin:borough:boost': 800,

This comment has been minimized.

@dianashk

dianashk Apr 25, 2016

Contributor

why so high?

@dianashk

dianashk Apr 25, 2016

Contributor

why so high?

This comment has been minimized.

@missinglink

missinglink Apr 28, 2016

Member

I simply copied the value from 'country', I will drop it down to a value for 'region' instead if you feel it's too high. What would you suggest as a good value here?

@missinglink

missinglink Apr 28, 2016

Member

I simply copied the value from 'country', I will drop it down to a value for 'region' instead if you feel it's too high. What would you suggest as a good value here?

query/search.js
@@ -30,6 +30,7 @@ query.score( peliasQuery.view.admin('country_a') );
query.score( peliasQuery.view.admin('region') );
query.score( peliasQuery.view.admin('region_a') );
query.score( peliasQuery.view.admin('county') );
+query.score( peliasQuery.view.admin('borough') );

This comment has been minimized.

@orangejulius

orangejulius Apr 25, 2016

Member

Should we remove this to avoid having any effect on search from this PR?

@orangejulius

orangejulius Apr 25, 2016

Member

Should we remove this to avoid having any effect on search from this PR?

This comment has been minimized.

@missinglink

missinglink Apr 28, 2016

Member

moving this to #527

@missinglink

missinglink Apr 28, 2016

Member

moving this to #527

query/text_parser.js
@@ -9,6 +9,7 @@ or postalcode because we should only try to match those when we're sure that's w
*/
var adminFields = placeTypes.concat([
'region_a',
+ 'borough'

This comment has been minimized.

@dianashk

dianashk Apr 25, 2016

Contributor

this doesn't need to be done since placeTypes already has borough in the list.

@dianashk

dianashk Apr 25, 2016

Contributor

this doesn't need to be done since placeTypes already has borough in the list.

This comment has been minimized.

@missinglink

missinglink Apr 28, 2016

Member

good catch!

@missinglink

missinglink Apr 28, 2016

Member

good catch!

query/view/boost_exact_matches.js
+
+ // split the 'input:name' on whitespace
+ var name = vs.var('input:name').get(),
+ tokens = name.split(' ');

This comment has been minimized.

@orangejulius

orangejulius Apr 25, 2016

Member

perhaps split on any whitespace so that multiple spaces, tabs, or any other weird stuff is handled ok? Or would that be cleaned up by a sanitizer before this?

@orangejulius

orangejulius Apr 25, 2016

Member

perhaps split on any whitespace so that multiple spaces, tabs, or any other weird stuff is handled ok? Or would that be cleaned up by a sanitizer before this?

This comment has been minimized.

@dianashk

dianashk Apr 25, 2016

Contributor

this could result in empty tokens if there are multiple spaces in a row. just doesn't feel like a very clean solution.

@dianashk

dianashk Apr 25, 2016

Contributor

this could result in empty tokens if there are multiple spaces in a row. just doesn't feel like a very clean solution.

@orangejulius

This comment has been minimized.

Show comment
Hide comment
@orangejulius

orangejulius Apr 26, 2016

Member

We noticed a regression from pelias/pelias#164. When searching for "New York, NY" from Stamford, CT, autocomplete for "New York, NY" used to show the city of NYC first, now it doesn't

Autocomplete for New York, NY

There's now an acceptance test for this: pelias/acceptance-tests#232

Member

orangejulius commented Apr 26, 2016

We noticed a regression from pelias/pelias#164. When searching for "New York, NY" from Stamford, CT, autocomplete for "New York, NY" used to show the city of NYC first, now it doesn't

Autocomplete for New York, NY

There's now an acceptance test for this: pelias/acceptance-tests#232

@missinglink missinglink added this to the Autocomplete Improvements milestone Apr 29, 2016

@missinglink missinglink referenced this pull request in pelias/pelias Apr 29, 2016

Open

regression: "New York, NY" with focus point #330

@orangejulius

This comment has been minimized.

Show comment
Hide comment
Member

orangejulius commented Apr 29, 2016

👍

@orangejulius orangejulius merged commit e04efed into master Apr 29, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@orangejulius orangejulius deleted the missinglink branch May 25, 2016

je-l pushed a commit to nlsfi/pelias-api that referenced this pull request Aug 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment