Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

mikemccand · 2015-01-11T11:09:48Z

This removes all index/indices store level rate limiting (indices.store.throttle.type, indices.store.throttle.max_bytes_per_sec, index.store.throttle.type, index.store.throttle.max_bytes_per_sec) and cuts over to Lucene's auto-throttle boolean (default: on) in ConcurrentMergeScheduler added in https://issues.apache.org/jira/browse/LUCENE-6119

I added a new live boolean setting (index.merge.scheduler.auto_throttle), default is on (like Lucene).

This also removes throttle_time from store stats, and adds total_throttled_time, total_stopped_time (total time when large merges were stopped so smaller merges could finish) and total_throttled_bytes_per_sec (the current bytes/sec throttle) stats to merge stats. Merge logging also shows the stopped/throttle time per merge.

Recovery/snapshot/restore still have their rate limiters and still default to 20 MB/sec.

This should also fix the slowdowns / index size increase from http://benchmarks.elasticsearch.org and I think improve indexing performance at the defaults since auto throttling should prevent the merge backlog and index throttling.

Closes #9133

… per-merge stop/throttle/rate

martijnvg · 2015-01-11T11:43:29Z

src/main/java/org/elasticsearch/index/merge/MergeStats.java

@@ -176,5 +235,10 @@ public void writeTo(StreamOutput out) throws IOException {
        out.writeVLong(current);
        out.writeVLong(currentNumDocs);
        out.writeVLong(currentSizeInBytes);
+        if (out.getVersion().onOrAfter(Version.V_2_0_0)) {


Upgrading from 1.x to 2.x will require a full cluster restart, so a 2.x node will never chat with a < 2.x node. Therefor the version checks are redundant and can be removed. (assuming that this change only goes into master)

Aha! OK I'll remove the version check...

rmuir · 2015-01-11T12:55:38Z

looks good

s1monw · 2015-01-11T13:14:01Z

LGTM too. I wonder if somebody had disabled the merge throtteling in 1.x and they upgrade should they also get this disable. I am leaning towards keeping it true / enable anyway I just wanted to throw it out there

mikemccand · 2015-01-11T14:16:21Z

I wonder if somebody had disabled the merge throtteling in 1.x and they upgrade should they also get this disable. I am leaning towards keeping it true / enable anyway I just wanted to throw it out there

I think it's OK to keep the default (true) in this case: auto-throttle should still move to un-throttled if merges can't keep up. For http://benchmarks.elasticsearch.org I will run the fast/fastupdates with auto-throttle enabled.

s1monw · 2015-01-11T14:52:49Z

ok fair enough

bleskes · 2015-01-12T14:46:19Z

@mikemccand I was wondering how this change works if there are multiple shards on a node - previously the rate limiting was a node level settings, making sure that we do not overload the node even it has multiple shards on it merging together. If I understand correctly it is no possible that a heavy loaded index will consume more and more node resources before starting to push back on indexing, potentially starving other shards that has nothing to do with this index. I might be missing something here.

Second think I wondering what we should do when someone turns off auto_throttling - to me it seems we should fall back to non-auto throttling (like we had) as opposed to no throttling at all? if that's what we want - might as well call it index.merge.scheduler.throttle . Again - I might be missing something here.

mikemccand · 2015-01-12T17:32:05Z

I was wondering how this change works if there are multiple shards on a node

Yes, this is a difference: the auto-throttle is only within a single
shard vs current rate limiter which can set aggregate for the whole
node.

But I think this is dangerous: each shard really needs its merges to
keep up with the indexing rate against that shard, so it remains
healthy. So I think each shard should look out for itself / increase
its throttle to keep up w/ indexing it's seeing.

Second think I wondering what we should do when someone turns off auto_throttling - to me it seems we should fall back to non-auto throttling

I think hardwired merge throttling is too simplistic: who can properly
predict what merge MB/sec is required? So net/net I don't think we
should keep this feature long term.

But short term: I agree we should keep it. It's safer to have both
options, e.g. in case there are horrible bugs in auto-throttling, apps
can easily fallback. I'll rework the PR to keep both...

mikemccand · 2015-01-14T13:53:22Z

OK I put back the fixed rate limiting, but left it off by default in favor of auto-throttle. I think it's ready.

jpountz · 2015-01-14T14:03:10Z

Since the intention is to move away from fixed rate limiting and using auto throttling instead, maybe documentation about fixed rate limiting should be removed in order not to encourage using it?

Otherwise LGTM!

mikemccand · 2015-01-14T15:02:21Z

That makes sense @jpountz, I'll remove the docs.

This adds a new boolean (index.merge.scheduler.auto_throttle) dynamic setting, default true (matching Lucene), to adaptively set the IO rate limit for merges over time. This is more flexible than the previous fixed rate throttling because it responds depending on the incoming merge rate, so search-heavy applications that are not doing much indexing will see merges heavily throttled while indexing-heavy cases will lighten the throttle so merges can keep up within incoming indexing. The fixed rate throttling is still available as a fallback if things go horribly wrong. Closes #9243 Closes #9133

mikemccand added 2 commits January 10, 2015 16:38

first cut

31e6acf

expose current CMS throttle in merge stats; fix tests, docs; also log…

1aad275

… per-merge stop/throttle/rate

mikemccand self-assigned this Jan 11, 2015

martijnvg reviewed Jan 11, 2015
View reviewed changes

jpountz added v2.0.0-beta1 release highlight and removed release highlight v2.0.0-beta1 labels Jan 12, 2015

mikemccand added 3 commits January 13, 2015 14:21

remove version checks

ff3e2cb

put back fixed throttling, but off by default

87d8fbf

put back fixed throttling, but off by default

107099a

mikemccand changed the title ~~Replace store throttling with Lucene's auto-throttle~~ Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler Jan 14, 2015

don't mention fixed throttling in the docs

def2d34

mikemccand merged commit def2d34 into elastic:master Jan 16, 2015

clintongormley added release highlight and removed >breaking labels Jan 19, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

mikemccand commented Jan 11, 2015

martijnvg Jan 11, 2015

mikemccand Jan 11, 2015

rmuir commented Jan 11, 2015

s1monw commented Jan 11, 2015

mikemccand commented Jan 11, 2015

s1monw commented Jan 11, 2015

bleskes commented Jan 12, 2015

mikemccand commented Jan 12, 2015

mikemccand commented Jan 14, 2015

jpountz commented Jan 14, 2015

mikemccand commented Jan 14, 2015

Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243

Conversation

mikemccand commented Jan 11, 2015

martijnvg Jan 11, 2015

Choose a reason for hiding this comment

mikemccand Jan 11, 2015

Choose a reason for hiding this comment

rmuir commented Jan 11, 2015

s1monw commented Jan 11, 2015

mikemccand commented Jan 11, 2015

s1monw commented Jan 11, 2015

bleskes commented Jan 12, 2015

mikemccand commented Jan 12, 2015

mikemccand commented Jan 14, 2015

jpountz commented Jan 14, 2015

mikemccand commented Jan 14, 2015