Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure query resources are fetched asynchronously during rewrite #25791

Merged
merged 20 commits into from Jul 20, 2017

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Jul 19, 2017

The QueryRewriteContext used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.

Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction

The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.
@s1monw s1monw added :Search/Search Search-related issues that do not fall into other categories das awesome >enhancement review v6.0.0 labels Jul 19, 2017
@s1monw s1monw requested review from colings86 and javanna July 19, 2017 13:13
@@ -27,6 +27,8 @@
*/
public interface Rewriteable<T> {

int MAX_REVIEW_ROUNDS = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/REWIEW/REWRITE/ ?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the part of the code base that I'm most familiar with so I could easily miss something, but it looks good to me.

@s1monw s1monw requested a review from jimczi July 19, 2017 19:19
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some query builders (geo and terms) we could avoid fetching the same data for each shard requests but that's another story.
LGTM for the asynchronous fetch and the light rewrite for canMatch

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 comments. LGTM

}

@Override
protected void doWriteTo(StreamOutput out) throws IOException {
if (documentSupplier != null) {
throw new IllegalStateException("document supplier must be non-null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document supplier must be null instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct ++ I will fix and add a test


/**
* Rewrites the given {@link Rewriteable} into its primitive form. Rewriteables that for instance fetch resources from remote hosts or
* can simplify / optimize itself should do their heavy lifting during {@link #rewrite(QueryRewriteContext)}. This method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean {@link #rewriteAndFetch(QueryRewriteContext)} instead of {@link #rewrite(QueryRewriteContext)}? Because normal rewrite doesn't fetch any resource?

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few minors, LGTM though

return asyncActions.isEmpty() == false;
}

public void executeAsyncActions(ActionListener listener) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActionListener<?> ?

}
}
};
ArrayList<BiConsumer<Client, ActionListener<?>>> biConsumers = new ArrayList<>(asyncActions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: List on the left side instead of ArrayList

if (asyncActions.isEmpty()) {
listener.onResponse(null);
} else {
CountDown done = new CountDown(asyncActions.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe call it countDown instead of done?

listener.onResponse(null);
} else {
CountDown done = new CountDown(asyncActions.size());
ActionListener internalListener = new ActionListener() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActionListener<?> here too?

return supplier.get() == null ? this : new GeoShapeQueryBuilder(this.fieldName, supplier.get()).relation(relation).strategy
(strategy);
} else if (this.shape == null) {
AtomicReference<ShapeBuilder> supplier = new AtomicReference<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if AtomicReference is needed here given that we write from a single thread. It's just a visibility problem hence SetOnce would be a good fit, which has also the advantage of checking that we do set it only once (thanks for the suggestion!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that already thanks for the idea

* The action listener is guaranteed to be executed on the search thread-pool
*/
private void rewriteShardRequest(ShardSearchRequest request, ActionListener<ShardSearchRequest> listener) {
Rewriteable.rewriteAndFetch(request.getRewriteable(), indicesService.getRewriteContext(request::nowInMillis),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to address the unchecked warning on this line. To be honest I tried that myself and I didn't succeed :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see that

@s1monw s1monw merged commit 5e629cf into elastic:master Jul 20, 2017
@s1monw s1monw deleted the fetch_query_builder_docs_async branch July 20, 2017 15:23
s1monw added a commit that referenced this pull request Jul 21, 2017
This change rewrites search requests on the coordinating node before
we send requests to the individual shards. This will reduce the rewrite load
and object creation for each rewrite on the executing nodes and will fetch
resources only once instead of N times once per shard for queries like `terms`
query with index lookups. (among percolator and geo-shape)

Relates to #25791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das awesome >enhancement :Search/Search Search-related issues that do not fall into other categories v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants