Add new 'leaf' queries by orangejulius · Pull Request #109 · pelias/query

orangejulius · 2019-09-06T18:16:03Z

This PR constitutes a major rearrangement of the pelias/query module internals to have a bit more consistency. It introduces a couple new concepts, but makes no breaking changes.

Major changes

Commonly used Elasticsearch leaf queries now all have their own views

It took some time to come up with a word to explain what a match/match_phrase/term query is, in comparison to say, a Pelias-specific query generated by the autocomplete endpoint.

The 'leaf' terminology comes from the main Elasticsearch Query DSL docs page, and seems fits well. It sets us up to create well organized views for 'compound' queries like bool in the future too.

These new 'leaf' views also have the ability to be instantiated multiple times by configuring where they look in the VariableStore for configuration settings. This means a lot of sophisticated query construction can now be done without creating task-specific views at all.

Each of these views will accept only parameters that are allowed for the given query type, and will omit parameters that are unset. Therefore, this PR closes #101 and addresses the feedback given there regarding default values.

Leaf query views have been created for match, match_phrase, and match_all

New Internal functions to encapsulate leaf view functionality for reuse

There was at least one "view" (for the terms query) that did not use variable store values, and merely returned a query object directly. This is nice to have to reduce duplication, but exposing it to consumers of the pelias/query module would result in queries that do not change from request to request. This could have lead to confusing and hard to debug issues, so having a clear separation of functionality is important.

All leaf queries now have these functions but they are not exported by the module. Instead they are used in other, more Pelias specific views (such as sources, layers, categories, etc), and will hopefully be used even more over time.

Better separation between `match` and `match_phrase` views

In Elasticsearch 6 and beyond, Elasticsearch does not support conflating the match and match_phrase queries as we currently do, so these new leaf views implement separate functionality for match and match_phrase, and will generate queries that are compatible with ES6.

Next steps

This PR itself does not make any breaking changes, but it does allow a new set of tools to be used in pelias/api that can move us forward.

There is currently a large amount of domain-specific, ES6 incompatible query generation code in pelias/API. Once this PR is merged, the API can start leveraging these new query types to reduce code complexity and improve Elasticsearch compatibility in follow up work.

Connects pelias/pelias#719

This 'view' was not really a view, because it did not produce results based on the variable store. Instead it simply took parameters and returned an object that was a valid terms query. This is a good feature to have, but it should be internal to this library only.

This doesn't change the output at all, but makes the code simpler

missinglink

Looks good! I added a few comments.

missinglink · 2019-09-09T13:25:08Z

lib/leaf/match.js

+    }
+  };
+
+  const extra_params = ['boost',


can we define this array out of the function scope to avoid re-allocating memory on every invocation?

also, unusual formatting 🤷‍♂ almost missed the first element

I can't believe that V8 wouldn't be able to perform this optimization itself, and either way, I feel like the readability benefit of having the array close to where it is used outweighs any potential performance changes, so I'd like to leave this code as is.

missinglink · 2019-09-09T13:29:04Z

lib/leaf/match.js

+    'minimum_should_match',
+    'zero_terms_query'];
+
+  extra_params.forEach(function(param) {


sorry for sounding pedantic, could we do the if(extra) check before the iteration, in the most simple use case this would perform 10 iterations unnecessarily.

it's not really a huge deal, but it's nice to try to optimize performance on primitive building blocks like this because the effects will propagate up and multiply.

could use fat arrow function syntax here for succinctness?

Same as above: I bet V8 can do this for us, and wrapping the entire loop in another if statement will make it that much harder to read. I'd rather not change it.

missinglink · 2019-09-09T13:31:01Z

lib/leaf/terms.js

@@ -0,0 +1,17 @@
+module.exports = function( property, value, parameters ){


here it's called parameters while in the others it's called extra, is that intentional?

Yeah, i meant to rename them both to parameters, which is a better name. This is now fixed.

This is a reusable view to build `match_phrase` queries

This one's real simple

This view can be instantiated many times

missinglink · 2019-09-11T12:47:57Z

Here's the perf test https://jsperf.com/leaf-queries-perf/1

It so fast that even though the code calls for an array memory allocation on every function call, the effect is so tiny that it's not worth worrying about.

The tests also highlight that there is a minor perf benefit to having the immutable data defined outside the function scope, I suspect that the v8 engine is still having to do the memory allocation and deallocation on every entry/exit to the code path.

Looking at how fast it is, It's not worth making a fuss over, but I'm going to remain sceptical that v8 will be able to optimize things like this down to a no-op 😄

missinglink · 2019-09-11T12:49:43Z

Sorry for hijacking this awesome PR to rant about js performance 🙇

This has been shown to slightly improve performance in microbenchmarks https://jsperf.com/leaf-queries-perf/1

orangejulius · 2019-09-11T17:50:51Z

Interesting, I'm really surprised that V8 (also Spidermonkey) isn't smart enough to handle that. Looks like most of the perf gain comes from moving the array out of the function, so I've done that now.

They're just microbenchmark numbers, but we might as well go with them.

This uses the match_phrase helper function defined in #109 to cut down on duplication when creating `match_phrase` queries. Since the structure of the `match_phrase` query is defined in one place, it makes the resulting view code a bit more concise. It will also make optional parameter handling easier and more consistent.

orangejulius added 3 commits August 6, 2019 16:23

feat(terms): Simplify terms query generation

89d45cb

This doesn't change the output at all, but makes the code simpler

feat(terms): Allow extra parameters including boost

bf0c689

orangejulius force-pushed the leaf-queries-and-libs branch from 678c037 to 83f27f5 Compare September 6, 2019 18:17

orangejulius changed the title ~~Leaf queries and libs~~ Add new 'leaf' queries Sep 6, 2019

orangejulius requested a review from missinglink September 6, 2019 18:26

orangejulius force-pushed the leaf-queries-and-libs branch from 83f27f5 to 74d86c1 Compare September 6, 2019 18:27

missinglink approved these changes Sep 9, 2019

View reviewed changes

orangejulius added 5 commits September 10, 2019 08:38

feat(match_phrase): Add match_phrase leaf query function

955fc71

feat(leaf): Add match leaf query function

e632a5d

feat(view): Add match_phrase leaf view

dafca97

This is a reusable view to build `match_phrase` queries

feat(view): Add match_all leaf query view

05d7c93

This one's real simple

feat(view): Add generic match leaf view

b650df8

This view can be instantiated many times

orangejulius force-pushed the leaf-queries-and-libs branch from 74d86c1 to b650df8 Compare September 10, 2019 12:38

Move static arrays out of function calls

83509c6

This has been shown to slightly improve performance in microbenchmarks https://jsperf.com/leaf-queries-perf/1

orangejulius merged commit 7c70ae0 into master Sep 11, 2019

orangejulius deleted the leaf-queries-and-libs branch September 11, 2019 17:51

orangejulius mentioned this pull request Oct 1, 2019

ES6 compatible queries pelias/api#1354

Merged

orangejulius mentioned this pull request Oct 3, 2019

feat(match_phrase): Use match_phrase lib #111

Merged

Joxit mentioned this pull request Oct 25, 2019

Add multi_match leaf query function #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new 'leaf' queries#109

Add new 'leaf' queries#109
orangejulius merged 9 commits intomasterfrom
leaf-queries-and-libs

orangejulius commented Sep 6, 2019 •

edited

Loading

Uh oh!

missinglink left a comment

Uh oh!

missinglink Sep 9, 2019

Uh oh!

orangejulius Sep 10, 2019

Uh oh!

missinglink Sep 9, 2019

Uh oh!

missinglink Sep 9, 2019

Uh oh!

orangejulius Sep 10, 2019

Uh oh!

missinglink Sep 9, 2019

Uh oh!

orangejulius Sep 10, 2019

Uh oh!

missinglink commented Sep 11, 2019

Uh oh!

missinglink commented Sep 11, 2019

Uh oh!

orangejulius commented Sep 11, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,17 @@
		module.exports = function( property, value, parameters ){

Conversation

orangejulius commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major changes

Commonly used Elasticsearch leaf queries now all have their own views

New Internal functions to encapsulate leaf view functionality for reuse

Better separation between match and match_phrase views

Next steps

Uh oh!

missinglink left a comment

Choose a reason for hiding this comment

Uh oh!

missinglink Sep 9, 2019

Choose a reason for hiding this comment

Uh oh!

orangejulius Sep 10, 2019

Choose a reason for hiding this comment

Uh oh!

missinglink Sep 9, 2019

Choose a reason for hiding this comment

Uh oh!

missinglink Sep 9, 2019

Choose a reason for hiding this comment

Uh oh!

orangejulius Sep 10, 2019

Choose a reason for hiding this comment

Uh oh!

missinglink Sep 9, 2019

Choose a reason for hiding this comment

Uh oh!

orangejulius Sep 10, 2019

Choose a reason for hiding this comment

Uh oh!

missinglink commented Sep 11, 2019

Uh oh!

missinglink commented Sep 11, 2019

Uh oh!

orangejulius commented Sep 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

orangejulius commented Sep 6, 2019 •

edited

Loading

Better separation between `match` and `match_phrase` views

orangejulius commented Sep 11, 2019 •

edited

Loading