Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid self-deadlock in the translog #29520

Merged
merged 3 commits into from Apr 15, 2018

Conversation

Projects
None yet
5 participants
@jasontedor
Copy link
Member

commented Apr 15, 2018

Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.

Closes #29509, relates #29401

Avoid self-deadlock in the translog
Today when reading an operation from the current generation fails
tragically we attempt to close the translog. However, by invoking close
before releasing the read lock we end up in self-deadlock because
closing tries to acquire the write lock and the read lock can not be
upgraded to a write lock. To avoid this, we move the close invocation
outside of the try-with-resources that acquired the read lock. As an
extra guard against this, we document the problem and add an assertion
that we are not trying to invoke close while holding the read lock.
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Apr 15, 2018

@bleskes
Copy link
Member

left a comment

LGTM

private void closeOnTragicEvent(final Exception ex) {
// we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock
assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName();

This comment has been minimized.

Copy link
@bleskes
@ywelsch
Copy link
Contributor

left a comment

LGTM, thanks @jasontedor. Can you add a note to the PR description that this was introduced with #29401, i.e. affects code that was never released?

private void closeOnTragicEvent(final Exception ex) {
// we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock
assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName();
Objects.requireNonNull(ex);

This comment has been minimized.

Copy link
@ywelsch

ywelsch Apr 15, 2018

Contributor

As we never expect a null value to be passed as parameter to this internal method, I'm not a fan of sprinkling this check here. This is defensive programming at its worst.

@jasontedor jasontedor merged commit 00fd73a into elastic:master Apr 15, 2018

3 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

jasontedor added a commit that referenced this pull request Apr 15, 2018

Avoid self-deadlock in the translog (#29520)
Today when reading an operation from the current generation fails
tragically we attempt to close the translog. However, by invoking close
before releasing the read lock we end up in self-deadlock because
closing tries to acquire the write lock and the read lock can not be
upgraded to a write lock. To avoid this, we move the close invocation
outside of the try-with-resources that acquired the read lock. As an
extra guard against this, we document the problem and add an assertion
that we are not trying to invoke close while holding the read lock.

@jasontedor jasontedor deleted the jasontedor:translog-self-deadlock branch Apr 16, 2018

jasontedor added a commit that referenced this pull request Apr 16, 2018

Merge branch '6.x' into ccr-6.x
* 6.x:
  [TEST] REST client request without leading '/' (#29471)
  Prevent accidental changes of default values (#29528)
  [Docs] Add definitions to glossary  (#29127)
  Avoid self-deadlock in the translog (#29520)
  Minor cleanup in NodeInfo.groovy
  Lazy configure build tasks that require older JDKs (#29519)
  Fix usage of NodeInfo#nodeVersion in build
  Control max size/count of warning headers (#28427) (#29516)
  Simplify snapshot check in root build file
  Make NodeInfo#nodeVersion strongly-typed as Version (#29515)
  Enable license header exclusions (#29379)
  Use proper Java version for BWC builds (#29493)
  Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently

jasontedor added a commit that referenced this pull request Apr 16, 2018

Merge branch 'master' into ccr
* master:
  [TEST] REST client request without leading '/' (#29471)
  Using ObjectParser in UpdateRequest (#29293)
  Prevent accidental changes of default values (#29528)
  [Docs] Add definitions to glossary  (#29127)
  Avoid self-deadlock in the translog (#29520)
  Minor cleanup in NodeInfo.groovy
  Lazy configure build tasks that require older JDKs (#29519)
  Simplify snapshot check in root build file
  Make NodeInfo#nodeVersion strongly-typed as Version (#29515)
  Enable license header exclusions (#29379)
  Use proper Java version for BWC builds (#29493)
  Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently
  Enable skipping fetching latest for BWC builds (#29497)

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 17, 2018

Merge branch 'master' into thread-pool-info-migration
* master: (96 commits)
  TEST: Mute testEnsureWeReconnect
  Mute full cluster restart test recovery test
  REST high-level client: add support for Indices Update Settings API [take 2] (elastic#29327)
  Plugins: Fix native controller confirmation for non-meta plugin (elastic#29434)
  Remove PipelineExecutionService#executeIndexRequest (elastic#29537)
  Fix overflow error in parsing of long geohashes (elastic#29418)
  Remove unused index.ttl.disable_purge setting (elastic#29527)
  FullClusterRestartIT.testRecovery should wait for all initializing shards
  Build: Fail if any libs depend on non-core libs (elastic#29336)
  [TEST] REST client request without leading '/' (elastic#29471)
  Using ObjectParser in UpdateRequest (elastic#29293)
  Prevent accidental changes of default values (elastic#29528)
  [Docs] Add definitions to glossary  (elastic#29127)
  Avoid self-deadlock in the translog (elastic#29520)
  Minor cleanup in NodeInfo.groovy
  Lazy configure build tasks that require older JDKs (elastic#29519)
  Simplify snapshot check in root build file
  Make NodeInfo#nodeVersion strongly-typed as Version (elastic#29515)
  Enable license header exclusions (elastic#29379)
  Use proper Java version for BWC builds (elastic#29493)
  ...

@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.