Store IO throttling throttles far more than asked #6018

Closed
mikemccand opened this Issue May 2, 2014 · 2 comments

Comments

Projects
None yet
3 participants
@mikemccand
Contributor

mikemccand commented May 2, 2014

I've been digging into the "merges can fall behind" at high indexing
rates, and I discovered some serious issues with the IO throttling,
which we recently (#5902) up'd from 20 MB/sec to 50 MB/sec by default.

Net/net I think when we ask for 50 MB/sec today we are really
throttling at something like 8 MB/sec!

Details:

I indexed a bunch of small log-file type docs into 1 shard, 0
replicas, using 1 sync _bulk client, to the point where it did it's
first big-ish merge (611 MB, 440K docs); the merge does not use CFS so
it's really writing 611 MB. I'm using a fast SSD.

With no throttling (index.store.throttle.type=none), the merge takes
20.8 seconds.

With the default 50 MB/sec merge throttling, it takes 72.1 sec, which
far too long (611 MB / 50 = 12.2 sec). The rate limiter enforces the
instantaneous rate, so at worse the merge time should have been 20.8 +
12.2 = 33 sec but likely much less than that because merging takes
CPU time.

So I dug in and discovered one problem, I think caused by the
super.flush and then delegate.flush in BufferedChecksumIndexOutput,
where the RateLimiter is always alternately called first on 8192 bytes
then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes,
the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but
still too long. (I think once we cutover to Lucene's checksums this 0
byte issue will be fixed?)

System.nanoTime is actually quite costly, so I suspect the overhead of
just checking whether to pause, and of calling Thread.sleep, is way
too much when the pause time is small. So I change SimpleRateLimiter to
just accumulate the incoming bytes and then once it crosses 1 msec
worth at the specified rate, invoke the pause logic.

This really improved it: now the merge takes 25.7 sec at 50 MB/sec
throttle, and 64.9 sec at 10 MB/sec throttle. These times seem correct.

I'll also open a Lucene issue to fix this, and make an XRateLimiter
for ES in the meantime.

@mikemccand mikemccand self-assigned this May 2, 2014

@rmuir

This comment has been minimized.

Show comment
Hide comment
@rmuir

rmuir May 2, 2014

Contributor

So I dug in and discovered one problem, I think caused by the
super.flush and then delegate.flush in BufferedChecksumIndexOutput,
where the RateLimiter is always alternately called first on 8192 bytes
then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes,
the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but
still too long. (I think once we cutover to Lucene's checksums this 0
byte issue will be fixed?)

It will solve it in that place: the BufferedChecksumIndexOutput. Note that RateLimitedIndexOutput itself has a similar construction, we should ensure its not doing anything stupid too.

Another similar expensive call is any digester's updateBytes(). We want to avoid calling this native method frequently (e.g. for zero bytes) for the same reasons.

Contributor

rmuir commented May 2, 2014

So I dug in and discovered one problem, I think caused by the
super.flush and then delegate.flush in BufferedChecksumIndexOutput,
where the RateLimiter is always alternately called first on 8192 bytes
then on 0 bytes. If I fix RateLimiter to just ignore those 0 bytes,
the merge time with 50 MB/sec throttle drops to 49.9 sec: better, but
still too long. (I think once we cutover to Lucene's checksums this 0
byte issue will be fixed?)

It will solve it in that place: the BufferedChecksumIndexOutput. Note that RateLimitedIndexOutput itself has a similar construction, we should ensure its not doing anything stupid too.

Another similar expensive call is any digester's updateBytes(). We want to avoid calling this native method frequently (e.g. for zero bytes) for the same reasons.

@mikemccand

This comment has been minimized.

Show comment
Hide comment
@mikemccand

mikemccand May 5, 2014

Contributor

I opened https://issues.apache.org/jira/browse/LUCENE-5641 to fix this in Lucene; I'll carry the fix over to ES once that's in ...

Contributor

mikemccand commented May 5, 2014

I opened https://issues.apache.org/jira/browse/LUCENE-5641 to fix this in Lucene; I'll carry the fix over to ES once that's in ...

s1monw added a commit that referenced this issue May 17, 2014

Upgrade to Lucene 4.8.1
This commit upgrades to the latest Lucene 4.8.1 release including the
following bugfixes:

 * An IndexThrottle now kicks in when merges start falling behind
   limiting index threads to 1 until merges caught up. Closes #6066
 * RateLimiter now kicks in at the configured rate where previously
   the limiter was limiting at ~8MB/sec almost all the time. Closes #6018

@s1monw s1monw added the bug label May 18, 2014

@s1monw s1monw closed this in 85a0b76 May 19, 2014

s1monw added a commit that referenced this issue May 19, 2014

Upgrade to Lucene 4.8.1
This commit upgrades to the latest Lucene 4.8.1 release including the
following bugfixes:

 * An IndexThrottle now kicks in when merges start falling behind
   limiting index threads to 1 until merges caught up. Closes #6066
 * RateLimiter now kicks in at the configured rate where previously
   the limiter was limiting at ~8MB/sec almost all the time. Closes #6018

s1monw added a commit that referenced this issue May 19, 2014

Upgrade to Lucene 4.8.1
This commit upgrades to the latest Lucene 4.8.1 release including the
following bugfixes:

 * An IndexThrottle now kicks in when merges start falling behind
   limiting index threads to 1 until merges caught up. Closes #6066
 * RateLimiter now kicks in at the configured rate where previously
   the limiter was limiting at ~8MB/sec almost all the time. Closes #6018

ywelsch added a commit to ywelsch/elasticsearch that referenced this issue Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this issue Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this issue Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this issue Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828

ywelsch added a commit that referenced this issue Oct 3, 2015

Snapshot restore operations throttle more than specified
Lucene's RateLimiter can do too much sleeping on small values (see also #6018).
The issue here is that calls to "pause" are not properly guarded in "restoreFile".

Instead of simply adding the guard, this commit uses the RateLimitingInputStream similar as for "snapshotFile".

Closes #13828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment