Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSync translog outside of the writers global lock #18360

Merged
merged 2 commits into from
May 19, 2016

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented May 15, 2016

Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.

This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.

I had some conversation with @mikemccand who saw congestion on these locks. @mikemccand can you give this patch a go if you see better concurrency? I also think we need to run the powertester on this one again :)

Today we aquire a write global lock that blocks all modification to the
translog file while we fsync / checkpoint the file. Yet, we don't necessarily
needt to block concurrent operations here. This can lead to a lot of blocked
threads if the machine has high concurrency (lot os CPUs) but uses slow disks
(spinning disks) which is absolutely unnecessary. We just need to protect from
fsyncing / checkpointing concurrently but we can fill buffers and write to the
underlying file in a concurrent fashion.

This change introduces an additional lock that we hold while fsyncing but moves
the checkpointing code outside of the writers global lock.
@mikemccand
Copy link
Contributor

Thanks @s1monw I will test this!

@@ -139,30 +140,15 @@ private synchronized final void closeWithTragicEvent(Throwable throwable) throws
return new Translog.Location(generation, offset, data.length());
}

private final ReentrantLock syncLock = new ReentrantLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a lock order comment here, e.g. "lock order syncLock -> synchronized(this)"?

@mikemccand
Copy link
Contributor

LGTM

@mikemccand
Copy link
Contributor

I think this change is potentially a biggish performance gain on highly concurrent CPUs with slowish IO.

I tested on 72 core box, on a spinning disk, and without the change (2 runs):

  67.3 M docs in 948.4 seconds
  67.3 M docs in 946.3 seconds

and 2 runs with the change:

  67.3 M docs in 858.1 seconds
  67.3 M docs in 884.7 seconds

I also ran the "random power loss tester" and there were no corruptions after 18 power loss events (at which point I hit disk full!).

@s1monw
Copy link
Contributor Author

s1monw commented May 17, 2016

@mikemccand I pushed a new commit

@bleskes
Copy link
Contributor

bleskes commented May 17, 2016

@mikemccand out of curiosity - how many concurrent threads did you use to index and how many shards are in the index?

@mikemccand
Copy link
Contributor

LGTM, thanks @s1monw

@mikemccand
Copy link
Contributor

how many concurrent threads did you use to index and how many shards are in the index?

6 shards, 12 client side threads, and I told ES to use 36 cores (though I did have a local mod to relax its current 32 core max). I disabled auto-IO merge throttling, and ran with translog durability async.

@mikemccand
Copy link
Contributor

I re-ran perf test with default translog durability (request) fsync:

before.log.0: Indexer: 56952280 docs: 1139.35 sec [49986.8 dps, 16.0 MB/sec]
before.log.1: Indexer: 56952280 docs: 1133.59 sec [50240.5 dps, 16.1 MB/sec]
before.log.2: Indexer: 56952280 docs: 1156.01 sec [49266.4 dps, 15.8 MB/sec]

 after.log.0: Indexer: 56952280 docs: 1058.93 sec [53782.8 dps, 17.2 MB/sec]
 after.log.1: Indexer: 56952280 docs: 1064.17 sec [53518.0 dps, 17.2 MB/sec]
 after.log.2: Indexer: 56952280 docs: 1046.26 sec [54433.9 dps, 17.4 MB/sec]

This is powerful CPU (36 real cores) against slowish IO (single spinning 8 TB disk) ... I think there are real gains here.

@s1monw s1monw merged commit 2b972f1 into elastic:master May 19, 2016
@s1monw s1monw deleted the sync_without_global_lock branch May 19, 2016 07:41
bleskes added a commit to bleskes/elasticsearch that referenced this pull request May 20, 2016
 elastic#18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.

I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
bleskes added a commit that referenced this pull request May 20, 2016
 #18360 introduced an extra lock in order to allow writes while syncing the translog. This caused a potential deadlock with snapshotting code where we first acquire the instance lock, followed by a sync (which acquires the syncLock). However, the sync logic acquires the syncLock first, followed by the instance lock.

I considered solving this by not syncing the translog on snapshot - I think we can get away with just flushing it. That however will create subtleties around snapshoting and whether operations in them are persisted. I opted instead to have slightly uglier code with nest synchronized, where the scope of the change is contained to the TranslogWriter class alone.
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v5.0.0-alpha3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants