-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSync translog outside of the writers global lock #18360
Conversation
Today we aquire a write global lock that blocks all modification to the translog file while we fsync / checkpoint the file. Yet, we don't necessarily needt to block concurrent operations here. This can lead to a lot of blocked threads if the machine has high concurrency (lot os CPUs) but uses slow disks (spinning disks) which is absolutely unnecessary. We just need to protect from fsyncing / checkpointing concurrently but we can fill buffers and write to the underlying file in a concurrent fashion. This change introduces an additional lock that we hold while fsyncing but moves the checkpointing code outside of the writers global lock.
Thanks @s1monw I will test this! |
@@ -139,30 +140,15 @@ private synchronized final void closeWithTragicEvent(Throwable throwable) throws | |||
return new Translog.Location(generation, offset, data.length()); | |||
} | |||
|
|||
private final ReentrantLock syncLock = new ReentrantLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a lock order comment here, e.g. "lock order syncLock -> synchronized(this)"?
LGTM |
I think this change is potentially a biggish performance gain on highly concurrent CPUs with slowish IO. I tested on 72 core box, on a spinning disk, and without the change (2 runs):
and 2 runs with the change:
I also ran the "random power loss tester" and there were no corruptions after 18 power loss events (at which point I hit disk full!). |
@mikemccand I pushed a new commit |
@mikemccand out of curiosity - how many concurrent threads did you use to index and how many shards are in the index? |
LGTM, thanks @s1monw |
6 shards, 12 client side threads, and I told ES to use 36 cores (though I did have a local mod to relax its current 32 core max). I disabled auto-IO merge throttling, and ran with translog durability |
I re-ran perf test with default translog durability (
This is powerful CPU (36 real cores) against slowish IO (single spinning 8 TB disk) ... I think there are real gains here. |
elastic#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock. I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock. I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.
This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.
I had some conversation with @mikemccand who saw congestion on these locks. @mikemccand can you give this patch a go if you see better concurrency? I also think we need to run the powertester on this one again :)