Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSync translog outside of the writers global lock #18360

Merged
merged 2 commits into from May 19, 2016

Conversation

Projects
None yet
4 participants
@s1monw
Copy link
Contributor

commented May 15, 2016

Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.

This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.

I had some conversation with @mikemccand who saw congestion on these locks. @mikemccand can you give this patch a go if you see better concurrency? I also think we need to run the powertester on this one again :)

FSync translog outside of the writers global lock
Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.

This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.
@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 15, 2016

Thanks @s1monw I will test this!

@@ -139,30 +140,15 @@ private synchronized final void closeWithTragicEvent(Throwable throwable) throws
return new Translog.Location(generation, offset, data.length());
}

private final ReentrantLock syncLock = new ReentrantLock();

This comment has been minimized.

Copy link
@mikemccand

mikemccand May 15, 2016

Contributor

Maybe add a lock order comment here, e.g. "lock order syncLock -> synchronized(this)"?

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 15, 2016

LGTM

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 16, 2016

I think this change is potentially a biggish performance gain on highly concurrent CPUs with slowish IO.

I tested on 72 core box, on a spinning disk, and without the change (2 runs):

  67.3 M docs in 948.4 seconds
  67.3 M docs in 946.3 seconds

and 2 runs with the change:

  67.3 M docs in 858.1 seconds
  67.3 M docs in 884.7 seconds

I also ran the "random power loss tester" and there were no corruptions after 18 power loss events (at which point I hit disk full!).

@s1monw

This comment has been minimized.

Copy link
Contributor Author

commented May 17, 2016

@mikemccand I pushed a new commit

@bleskes

This comment has been minimized.

Copy link
Member

commented May 17, 2016

@mikemccand out of curiosity - how many concurrent threads did you use to index and how many shards are in the index?

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 17, 2016

LGTM, thanks @s1monw

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 17, 2016

how many concurrent threads did you use to index and how many shards are in the index?

6 shards, 12 client side threads, and I told ES to use 36 cores (though I did have a local mod to relax its current 32 core max). I disabled auto-IO merge throttling, and ran with translog durability async.

@mikemccand

This comment has been minimized.

Copy link
Contributor

commented May 17, 2016

I re-ran perf test with default translog durability (request) fsync:

before.log.0: Indexer: 56952280 docs: 1139.35 sec [49986.8 dps, 16.0 MB/sec]
before.log.1: Indexer: 56952280 docs: 1133.59 sec [50240.5 dps, 16.1 MB/sec]
before.log.2: Indexer: 56952280 docs: 1156.01 sec [49266.4 dps, 15.8 MB/sec]

 after.log.0: Indexer: 56952280 docs: 1058.93 sec [53782.8 dps, 17.2 MB/sec]
 after.log.1: Indexer: 56952280 docs: 1064.17 sec [53518.0 dps, 17.2 MB/sec]
 after.log.2: Indexer: 56952280 docs: 1046.26 sec [54433.9 dps, 17.4 MB/sec]

This is powerful CPU (36 real cores) against slowish IO (single spinning 8 TB disk) ... I think there are real gains here.

@s1monw s1monw merged commit 2b972f1 into elastic:master May 19, 2016

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@s1monw s1monw deleted the s1monw:sync_without_global_lock branch May 19, 2016

bleskes added a commit to bleskes/elasticsearch that referenced this pull request May 20, 2016

Snapshotting and sync could cause a dead lock TranslogWriter
 elastic#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.

I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.

bleskes added a commit that referenced this pull request May 20, 2016

Snapshotting and sync could cause a dead lock TranslogWriter (#18481)
 #18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.

I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.