-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose auto-IO-throttle from Lucene's ConcurrentMergeScheduler #9243
Conversation
… per-merge stop/throttle/rate
@@ -176,5 +235,10 @@ public void writeTo(StreamOutput out) throws IOException { | |||
out.writeVLong(current); | |||
out.writeVLong(currentNumDocs); | |||
out.writeVLong(currentSizeInBytes); | |||
if (out.getVersion().onOrAfter(Version.V_2_0_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrading from 1.x to 2.x will require a full cluster restart, so a 2.x node will never chat with a < 2.x node. Therefor the version checks are redundant and can be removed. (assuming that this change only goes into master)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! OK I'll remove the version check...
looks good |
LGTM too. I wonder if somebody had disabled the merge throtteling in 1.x and they upgrade should they also get this disable. I am leaning towards keeping it |
I think it's OK to keep the default (true) in this case: auto-throttle should still move to un-throttled if merges can't keep up. For http://benchmarks.elasticsearch.org I will run the fast/fastupdates with auto-throttle enabled. |
ok fair enough |
@mikemccand I was wondering how this change works if there are multiple shards on a node - previously the rate limiting was a node level settings, making sure that we do not overload the node even it has multiple shards on it merging together. If I understand correctly it is no possible that a heavy loaded index will consume more and more node resources before starting to push back on indexing, potentially starving other shards that has nothing to do with this index. I might be missing something here. Second think I wondering what we should do when someone turns off auto_throttling - to me it seems we should fall back to non-auto throttling (like we had) as opposed to no throttling at all? if that's what we want - might as well call it |
Yes, this is a difference: the auto-throttle is only within a single But I think this is dangerous: each shard really needs its merges to
I think hardwired merge throttling is too simplistic: who can properly But short term: I agree we should keep it. It's safer to have both |
OK I put back the fixed rate limiting, but left it off by default in favor of auto-throttle. I think it's ready. |
Since the intention is to move away from fixed rate limiting and using auto throttling instead, maybe documentation about fixed rate limiting should be removed in order not to encourage using it? Otherwise LGTM! |
That makes sense @jpountz, I'll remove the docs. |
This adds a new boolean (index.merge.scheduler.auto_throttle) dynamic setting, default true (matching Lucene), to adaptively set the IO rate limit for merges over time. This is more flexible than the previous fixed rate throttling because it responds depending on the incoming merge rate, so search-heavy applications that are not doing much indexing will see merges heavily throttled while indexing-heavy cases will lighten the throttle so merges can keep up within incoming indexing. The fixed rate throttling is still available as a fallback if things go horribly wrong. Closes #9243 Closes #9133
This removes all index/indices store level rate limiting (indices.store.throttle.type, indices.store.throttle.max_bytes_per_sec, index.store.throttle.type, index.store.throttle.max_bytes_per_sec) and cuts over to Lucene's auto-throttle boolean (default: on) in ConcurrentMergeScheduler added in https://issues.apache.org/jira/browse/LUCENE-6119
I added a new live boolean setting (index.merge.scheduler.auto_throttle), default is on (like Lucene).
This also removes throttle_time from store stats, and adds total_throttled_time, total_stopped_time (total time when large merges were stopped so smaller merges could finish) and total_throttled_bytes_per_sec (the current bytes/sec throttle) stats to merge stats. Merge logging also shows the stopped/throttle time per merge.
Recovery/snapshot/restore still have their rate limiters and still default to 20 MB/sec.
This should also fix the slowdowns / index size increase from http://benchmarks.elasticsearch.org and I think improve indexing performance at the defaults since auto throttling should prevent the merge backlog and index throttling.
Closes #9133