Skip to content

Add new 'leaf' queries#109

Merged
orangejulius merged 9 commits intomasterfrom
leaf-queries-and-libs
Sep 11, 2019
Merged

Add new 'leaf' queries#109
orangejulius merged 9 commits intomasterfrom
leaf-queries-and-libs

Conversation

@orangejulius
Copy link
Member

@orangejulius orangejulius commented Sep 6, 2019

This PR constitutes a major rearrangement of the pelias/query module internals to have a bit more consistency. It introduces a couple new concepts, but makes no breaking changes.

Major changes

Commonly used Elasticsearch leaf queries now all have their own views

It took some time to come up with a word to explain what a match/match_phrase/term query is, in comparison to say, a Pelias-specific query generated by the autocomplete endpoint.

The 'leaf' terminology comes from the main Elasticsearch Query DSL docs page, and seems fits well. It sets us up to create well organized views for 'compound' queries like bool in the future too.

These new 'leaf' views also have the ability to be instantiated multiple times by configuring where they look in the VariableStore for configuration settings. This means a lot of sophisticated query construction can now be done without creating task-specific views at all.

Each of these views will accept only parameters that are allowed for the given query type, and will omit parameters that are unset. Therefore, this PR closes #101 and addresses the feedback given there regarding default values.

Leaf query views have been created for match, match_phrase, and match_all

New Internal functions to encapsulate leaf view functionality for reuse

There was at least one "view" (for the terms query) that did not use variable store values, and merely returned a query object directly. This is nice to have to reduce duplication, but exposing it to consumers of the pelias/query module would result in queries that do not change from request to request. This could have lead to confusing and hard to debug issues, so having a clear separation of functionality is important.

All leaf queries now have these functions but they are not exported by the module. Instead they are used in other, more Pelias specific views (such as sources, layers, categories, etc), and will hopefully be used even more over time.

Better separation between match and match_phrase views

In Elasticsearch 6 and beyond, Elasticsearch does not support conflating the match and match_phrase queries as we currently do, so these new leaf views implement separate functionality for match and match_phrase, and will generate queries that are compatible with ES6.

Next steps

This PR itself does not make any breaking changes, but it does allow a new set of tools to be used in pelias/api that can move us forward.

There is currently a large amount of domain-specific, ES6 incompatible query generation code in pelias/API. Once this PR is merged, the API can start leveraging these new query types to reduce code complexity and improve Elasticsearch compatibility in follow up work.

Connects pelias/pelias#719

This 'view' was not really a view, because it did not produce results
based on the variable store. Instead it simply took parameters and
returned an object that was a valid terms query.

This is a good feature to have, but it should be internal to this
library only.
This doesn't change the output at all, but makes the code simpler
@orangejulius orangejulius changed the title Leaf queries and libs Add new 'leaf' queries Sep 6, 2019
Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I added a few comments.

}
};

const extra_params = ['boost',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define this array out of the function scope to avoid re-allocating memory on every invocation?

also, unusual formatting 🤷‍♂ almost missed the first element

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe that V8 wouldn't be able to perform this optimization itself, and either way, I feel like the readability benefit of having the array close to where it is used outweighs any potential performance changes, so I'd like to leave this code as is.

'minimum_should_match',
'zero_terms_query'];

extra_params.forEach(function(param) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for sounding pedantic, could we do the if(extra) check before the iteration, in the most simple use case this would perform 10 iterations unnecessarily.

it's not really a huge deal, but it's nice to try to optimize performance on primitive building blocks like this because the effects will propagate up and multiply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use fat arrow function syntax here for succinctness?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: I bet V8 can do this for us, and wrapping the entire loop in another if statement will make it that much harder to read. I'd rather not change it.

@@ -0,0 +1,17 @@
module.exports = function( property, value, parameters ){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it's called parameters while in the others it's called extra, is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i meant to rename them both to parameters, which is a better name. This is now fixed.

@missinglink
Copy link
Member

Here's the perf test https://jsperf.com/leaf-queries-perf/1

It so fast that even though the code calls for an array memory allocation on every function call, the effect is so tiny that it's not worth worrying about.

The tests also highlight that there is a minor perf benefit to having the immutable data defined outside the function scope, I suspect that the v8 engine is still having to do the memory allocation and deallocation on every entry/exit to the code path.

Looking at how fast it is, It's not worth making a fuss over, but I'm going to remain sceptical that v8 will be able to optimize things like this down to a no-op 😄

@missinglink
Copy link
Member

Sorry for hijacking this awesome PR to rant about js performance 🙇

This has been shown to slightly improve performance in microbenchmarks

https://jsperf.com/leaf-queries-perf/1
@orangejulius
Copy link
Member Author

orangejulius commented Sep 11, 2019

Interesting, I'm really surprised that V8 (also Spidermonkey) isn't smart enough to handle that. Looks like most of the perf gain comes from moving the array out of the function, so I've done that now.

They're just microbenchmark numbers, but we might as well go with them.

@orangejulius orangejulius merged commit 7c70ae0 into master Sep 11, 2019
@orangejulius orangejulius deleted the leaf-queries-and-libs branch September 11, 2019 17:51
orangejulius added a commit that referenced this pull request Oct 3, 2019
This uses the match_phrase helper function defined in
#109 to cut down on duplication when
creating `match_phrase` queries.

Since the structure of the `match_phrase` query is defined in one place,
it makes the resulting view code a bit more concise.

It will also make optional parameter handling easier and more
consistent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants