Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot restore operations throttle more than specified #13828

Merged
merged 1 commit into from Oct 3, 2015

Conversation

Projects
None yet
4 participants
@ywelsch
Copy link
Contributor

ywelsch commented Sep 28, 2015

Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

ywelsch commented Sep 29, 2015

@mikemccand can you have a look at this PR?

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Sep 30, 2015

LGTM, this is a very bad bug! That small 4096 buffer means we are way over-throttling recovery :( We should push this back to 1.7.x?

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Sep 30, 2015

Are we directly calling RateLimiter.pause anywhere else in ES?

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

ywelsch commented Sep 30, 2015

I checked the other places, and they are all properly guarded. Maybe RateLimiter.pause could have an assertion?

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Sep 30, 2015

Maybe RateLimiter.pause could have an assertion?

Well, it's called (correctly) by RateLimitingInputStream ... maybe we could ban ES from directly calling RateLimiter.pause using our forbidden APIs?

@ywelsch ywelsch force-pushed the ywelsch:fix-snapshot-restore-throttling branch Sep 30, 2015

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

ywelsch commented Sep 30, 2015

I'm very much in favour of back porting this to 1.7.x, as I saw a 10x slowdown on a production cluster.

I added your suggestions. Not closing the RateLimitingInputStream is ok, as it just delegates the close to PartSliceStream (which is now auto-closed).

Just banning RateLimiter.pause does not work as it is used in other places of ES (e.g. RecoverySourceHandler).

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Sep 30, 2015

Just banning RateLimiter.pause does not work as it is used in other places of ES (e.g. RecoverySourceHandler).

OK thanks, and it looks like it does the right thing (checks getMinPauseCheckBytes).

@mikemccand

This comment has been minimized.

Copy link
Contributor

mikemccand commented Sep 30, 2015

LGTM, thanks @ywelsch!

@ywelsch

This comment has been minimized.

Copy link
Contributor Author

ywelsch commented Oct 1, 2015

@clintongormley which ES versions should we push this back to?

@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Oct 2, 2015

@ywelsch i'm good with 1.7.x and above

@ywelsch ywelsch force-pushed the ywelsch:fix-snapshot-restore-throttling branch Oct 3, 2015

@ywelsch ywelsch added the v2.2.0 label Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

@ywelsch ywelsch force-pushed the ywelsch:fix-snapshot-restore-throttling branch to 03a4e22 Oct 3, 2015

@ywelsch ywelsch changed the title Restore snapshot operations throttle more than specified Snapshot restore operations throttle more than specified Oct 3, 2015

ywelsch added a commit that referenced this pull request Oct 3, 2015

Merge pull request #13828 from ywelsch/fix-snapshot-restore-throttling
Snapshot restore operations throttle more than specified

@ywelsch ywelsch merged commit cdb0037 into elastic:master Oct 3, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details

ywelsch added a commit that referenced this pull request Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this pull request Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this pull request Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this pull request Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

@clintongormley clintongormley added v2.0.0-rc1 and removed v2.0.0-rc1 labels Oct 7, 2015

@clintongormley clintongormley removed the v2.1.0 label Nov 22, 2015

@pickypg pickypg added the v2.1.0 label Feb 27, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.