Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

Merged
merged 6 commits into from
Jan 16, 2015

Conversation

mikemccand
Copy link
Contributor

This removes all index/indices store level rate limiting (indices.store.throttle.type, indices.store.throttle.max_bytes_per_sec, index.store.throttle.type, index.store.throttle.max_bytes_per_sec) and cuts over to Lucene's auto-throttle boolean (default: on) in ConcurrentMergeScheduler added in https://issues.apache.org/jira/browse/LUCENE-6119

I added a new live boolean setting (index.merge.scheduler.auto_throttle), default is on (like Lucene).

This also removes throttle_time from store stats, and adds total_throttled_time, total_stopped_time (total time when large merges were stopped so smaller merges could finish) and total_throttled_bytes_per_sec (the current bytes/sec throttle) stats to merge stats. Merge logging also shows the stopped/throttle time per merge.

Recovery/snapshot/restore still have their rate limiters and still default to 20 MB/sec.

This should also fix the slowdowns / index size increase from http://benchmarks.elasticsearch.org and I think improve indexing performance at the defaults since auto throttling should prevent the merge backlog and index throttling.

Closes #9133

@mikemccand mikemccand self-assigned this Jan 11, 2015
@@ -176,5 +235,10 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeVLong(current);
out.writeVLong(currentNumDocs);
out.writeVLong(currentSizeInBytes);
if (out.getVersion().onOrAfter(Version.V_2_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrading from 1.x to 2.x will require a full cluster restart, so a 2.x node will never chat with a < 2.x node. Therefor the version checks are redundant and can be removed. (assuming that this change only goes into master)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! OK I'll remove the version check...

@rmuir
Copy link
Contributor

rmuir commented Jan 11, 2015

looks good

@s1monw
Copy link
Contributor

s1monw commented Jan 11, 2015

LGTM too. I wonder if somebody had disabled the merge throtteling in 1.x and they upgrade should they also get this disable. I am leaning towards keeping it true / enable anyway I just wanted to throw it out there

@mikemccand
Copy link
Contributor Author

I wonder if somebody had disabled the merge throtteling in 1.x and they upgrade should they also get this disable. I am leaning towards keeping it true / enable anyway I just wanted to throw it out there

I think it's OK to keep the default (true) in this case: auto-throttle should still move to un-throttled if merges can't keep up. For http://benchmarks.elasticsearch.org I will run the fast/fastupdates with auto-throttle enabled.

@s1monw
Copy link
Contributor

s1monw commented Jan 11, 2015

ok fair enough

@bleskes
Copy link
Contributor

bleskes commented Jan 12, 2015

@mikemccand I was wondering how this change works if there are multiple shards on a node - previously the rate limiting was a node level settings, making sure that we do not overload the node even it has multiple shards on it merging together. If I understand correctly it is no possible that a heavy loaded index will consume more and more node resources before starting to push back on indexing, potentially starving other shards that has nothing to do with this index. I might be missing something here.

Second think I wondering what we should do when someone turns off auto_throttling - to me it seems we should fall back to non-auto throttling (like we had) as opposed to no throttling at all? if that's what we want - might as well call it index.merge.scheduler.throttle . Again - I might be missing something here.

@mikemccand
Copy link
Contributor Author

I was wondering how this change works if there are multiple shards on a node

Yes, this is a difference: the auto-throttle is only within a single
shard vs current rate limiter which can set aggregate for the whole
node.

But I think this is dangerous: each shard really needs its merges to
keep up with the indexing rate against that shard, so it remains
healthy. So I think each shard should look out for itself / increase
its throttle to keep up w/ indexing it's seeing.

Second think I wondering what we should do when someone turns off auto_throttling - to me it seems we should fall back to non-auto throttling

I think hardwired merge throttling is too simplistic: who can properly
predict what merge MB/sec is required? So net/net I don't think we
should keep this feature long term.

But short term: I agree we should keep it. It's safer to have both
options, e.g. in case there are horrible bugs in auto-throttling, apps
can easily fallback. I'll rework the PR to keep both...

@mikemccand mikemccand changed the title Replace store throttling with Lucene's auto-throttle Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler Jan 14, 2015
@mikemccand
Copy link
Contributor Author

OK I put back the fixed rate limiting, but left it off by default in favor of auto-throttle. I think it's ready.

@jpountz
Copy link
Contributor

jpountz commented Jan 14, 2015

Since the intention is to move away from fixed rate limiting and using auto throttling instead, maybe documentation about fixed rate limiting should be removed in order not to encourage using it?

Otherwise LGTM!

@mikemccand
Copy link
Contributor Author

That makes sense @jpountz, I'll remove the docs.

@mikemccand mikemccand merged commit def2d34 into elastic:master Jan 16, 2015
mikemccand added a commit that referenced this pull request Jan 16, 2015
This adds a new boolean (index.merge.scheduler.auto_throttle) dynamic
setting, default true (matching Lucene), to adaptively set the IO rate
limit for merges over time.

This is more flexible than the previous fixed rate throttling because
it responds depending on the incoming merge rate, so search-heavy
applications that are not doing much indexing will see merges heavily
throttled while indexing-heavy cases will lighten the throttle so
merges can keep up within incoming indexing.

The fixed rate throttling is still available as a fallback if things
go horribly wrong.

Closes #9243

Closes #9133
@clintongormley clintongormley added v2.0.0-beta1 >breaking :Core/Infra/Core Core issues without another label >feature :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload and removed :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload labels Jan 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose ConcurrentMergeScheduler's auto IO throttle & stats
7 participants