Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the overhead of timeouts and low-level search cancellation. #25776

Merged
merged 3 commits into from Jul 19, 2017

Conversation

Projects
None yet
5 participants
@jpountz
Copy link
Contributor

commented Jul 18, 2017

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.

Reduce the overhead of timeouts and low-level search cancellation.
Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.

@jpountz jpountz requested review from imotov and jimczi Jul 18, 2017

@jimczi
Copy link
Member

left a comment

I left some comments but I like the approach. I think we need to make sure that the new behavior does not add extra overhead with multiple segments (throwing a CollectionTerminatedException rather than a TimeLimitedException).

@@ -134,6 +150,43 @@ public Weight createWeight(Query query, boolean needsScores, float boost) throws
}

@Override
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector) throws IOException {
final Weight cancellablWeight;

This comment has been minimized.

Copy link
@jimczi

jimczi Jul 18, 2017

Member

nit: missing e

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 18, 2017

Author Contributor

good catch

final long time = counter.get();
if (time > maxTime) {
queryResult.searchTimedOut(true);
throw new CollectionTerminatedException();

This comment has been minimized.

Copy link
@jimczi

jimczi Jul 18, 2017

Member

You could use a TimeExceededException to stop the collection on all segments ? Otherwise you also need to check the timeout when the leafCollector is created like the CancellableCollector does ? If you prefer the second option you can remove the try/catch around the searcher.search below since this code does not throw TimeExceededException anymore.

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 18, 2017

Author Contributor

Do you think it is necessary? The number of segments should be bounded so checking all of them should not be much more costly than stopping for all of them at once, at allows to keep things a bit simpler?

I only did things this way for cancellation so that we still check on a per-segment basis of low-level cancellation is disabled.

This comment has been minimized.

Copy link
@jimczi

jimczi Jul 19, 2017

Member

I don't think we should use the CollectionTerminatedException for this purpose. We have a special handling for this exception in the collectors but that's for the leaf level only.
When the timeout is detected we should be able to stop the search immediately but if we have to build every scorer first it might be costly. Using a different exception that we catch at the higher level when we call searcher.search feels simpler to me and you don't need two levels of cancellation ?

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 19, 2017

Author Contributor

oops it looks like our comments crossed. Yes I agree with you, I had not fully understood what you meant in your previous comment and thought it would be more complicated.

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 19, 2017

Author Contributor

For the record, I did not reuse the existing TimeExceededException because its constructor is private.


// we use the BooleanScorer window size as a base interval in order to make sure that we do not
// slow down boolean queries
private static final int INITIAL_INTERVAL = 1 << 11;

This comment has been minimized.

Copy link
@jimczi
@imotov

imotov approved these changes Jul 18, 2017

Copy link
Member

left a comment

Nice!

@dakrone

This comment has been minimized.

Copy link
Member

commented Jul 18, 2017

@jpountz what is the bug here? should this be marked as "enhancement" instead? Is there an issue with the existing behavior?

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2017

I guess I saw it as a performance bug. I'm fine with making it an enhancement instead.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2017

@jimczi I pushed a new commit that should address your concern.

@jimczi

jimczi approved these changes Jul 19, 2017

Copy link
Member

left a comment

LGTM

@jpountz jpountz merged commit 55ad318 into elastic:master Jul 19, 2017

1 of 2 checks passed

elasticsearch-ci Build started sha1 is merged.
Details
CLA Commit author is a member of Elasticsearch
Details

@jpountz jpountz deleted the jpountz:fix/cancellation_overhead branch Jul 19, 2017

@jpountz jpountz added >enhancement v5.6.0 and removed >bug labels Jul 19, 2017

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Jul 19, 2017

@dakrone I made it an enhancement.

jpountz added a commit that referenced this pull request Jul 21, 2017

Reduce the overhead of timeouts and low-level search cancellation. (#…
…25776)

Setting a timeout or enforcing low-level search cancellation used to make us
wrap the collector and check either the current time or whether the search
task was cancelled for every collected document. This can be significant
overhead on cheap queries that match many documents.

This commit changes the approach to wrap the bulk scorer rather than the
collector and exponentially increase the interval between two consecutive
checks in order to reduce the overhead of those checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.