Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shouldRollGeneration should execute under read lock #41696

Merged
merged 4 commits into from
May 10, 2019

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 30, 2019

Translog#shouldRollGeneration should execute under the read lock for it accesses the current writer.

@dnhatn dnhatn added >bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.2.0 v6.7.3 v7.0.2 labels Apr 30, 2019
@dnhatn dnhatn requested a review from jasontedor April 30, 2019 16:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I you know ways in which this could manifest itself, it would be nice to include into the commit message. I think this should in reality never show, but is easier to reason about with the lock check included.

@jasontedor
Copy link
Member

I am not sure, is this actually a problem? There are issues here which this doesn't protect against. Namely the check and roll are not atomic, so it could be that a thread checks if a roll is needed, and then a roll occurs, and then this thread returns back and performs a roll itself. This change doesn't protect against that.

@dnhatn
Copy link
Member Author

dnhatn commented May 2, 2019

@henningandersen @jasontedor Thanks for looking.

I spotted this while I was investigating a case where more than 140 empty translog files in a single shard. For sure this is not the reason for those empty translog files. That shard has more than 280 primary failovers, and we roll a new translog on each primary promotion.

Although this should never happen in practice, I think this method should be called with lock. I marked this PR as a non-issue.

@jasontedor
Copy link
Member

I agree it's more correct and should be integrated, my point though is that I don't think it matters in practice yet it still leaves behind real issues. 🤷‍♀

@dnhatn dnhatn removed the v7.1.0 label May 6, 2019
@dnhatn
Copy link
Member Author

dnhatn commented May 9, 2019

@henningandersen @jasontedor Thanks for reviewing!

@dnhatn
Copy link
Member Author

dnhatn commented May 9, 2019

process was found dead while waiting for ports files, node{:modules:mapper-extras:integTest-1}

run elasticsearch-ci/1

@dnhatn dnhatn merged commit 80f4846 into elastic:master May 10, 2019
@dnhatn dnhatn deleted the should-rolll-under-lock branch May 10, 2019 12:50
dnhatn added a commit that referenced this pull request May 10, 2019
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request May 10, 2019
* elastic/master: (84 commits)
  [ML] adds geo_centroid aggregation support to data frames (elastic#42088)
  Add documentation for calendar/fixed intervals (elastic#41919)
  Remove global checkpoint assertion in peer recovery (elastic#41987)
  Don't create tempdir for cli scripts (elastic#41913)
  Fix debian-8 update (elastic#42056)
  Cleanup plugin bin directories (elastic#41907)
  Prevent order being lost for _nodes API filters (elastic#42045)
  Change IndexAnalyzers default analyzer access (elastic#42011)
  Remove reference to fs.data.spins in docs
  Mute failing AsyncTwoPhaseIndexerTests
  Remove close method in PageCacheRecycler/Recycler (elastic#41917)
  [ML] adding pivot.max_search_page_size option for setting paging size (elastic#41920)
  Docs: Tweak list formatting
  Simplify handling of keyword field normalizers (elastic#42002)
  [ML] properly nesting objects in document source (elastic#41901)
  Remove extra `ms` from log message (elastic#42068)
  Increase the sample space for random inner hits name generator (elastic#42057)
  Recognise direct buffers in heap size docs (elastic#42070)
  shouldRollGeneration should execute under read lock (elastic#41696)
  Wait for active shard after close in mixed cluster (elastic#42029)
  ...
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >non-issue v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants