Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never leave stale delete tombstones in version map #29619

Merged
merged 5 commits into from
Apr 20, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 19, 2018

Today the VersionMap does not clean up a stale delete tombstone if it
does not require safe access. However, in a very rare situation due to
concurrent refreshes, the safe-access flag may be flipped over then an
engine accidentally consult that stale delete tombstone.

This commit ensures to never leave stale delete tombstones in a version
map by always pruning delete tombstones when putting a new index entry
regardless of the value of the safe-access flag.

Today the VersionMap does not clean up a stale delete tombstone if it
does not require safe access. However, in a very rare situation due to
concurrent refreshes, the safe-access flag may be flipped over then an
engine accidentally consult that stale delete tombstone.

This commit ensures to never leave stale delete tombstones in a version
map by always pruning delete tombstones when putting a new index entry
regardless of the value of the safe-access flag.
@dnhatn dnhatn added >enhancement v7.0.0 v6.3.0 :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@dnhatn
Copy link
Member Author

dnhatn commented Apr 19, 2018

@DaveCTurner, I have a test simulating the violated case found by the formal model. However, I discussed with @bleskes and decided not to include it because the version-map test may be enough. You can refer this test at 2cff7fb.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but I'm not going to be able to check it in detail before tomorrow now.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prod code lgtm. I left a question on the tests.

@@ -312,6 +312,10 @@ void maybePutIndexUnderLock(BytesRef uid, IndexVersionValue version) {
if (maps.isSafeAccessMode()) {
putIndexUnderLock(uid, version);
} else {
// The existing delete tombstone is no longer the latest version (.i.e stale)
// when a newer version is processed regardless of the safe mode of the version map.
// Therefore, it's better to always prune that tombstone to avoid accidental accesses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:
// Even though we don't store a record of the indexing operation (and mark as unsafe) we should still remove any previous delete for this uuid. Note that this is expected to not hurt performance because the tombstone is small (if not empty) when usafe is relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment.

@@ -396,6 +397,50 @@ public void testPruneTombstonesWhileLocked() throws InterruptedException, IOExce
assertEquals(0, map.getAllTombstones().size());
}

public void testNeverLeaveStaleDeleteTombstone() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why we need concurrency here? (i.e., why a single threaded test is not enough)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single threaded test is good enough ;)

@dnhatn
Copy link
Member Author

dnhatn commented Apr 19, 2018

@bleskes I've addressed your comments. Can you please have another look? Thank you.

@dnhatn dnhatn requested a review from bleskes April 19, 2018 18:01
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some comments. No need for another review.

} else {
versionMap.maybePutIndexUnderLock(uid, new IndexVersionValue(randomTranslogLocation(), version, 1, 1));
}
VersionValue storedValue = versionMap.getUnderLock(uid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also check that this return value correlates with what we expect? i.e., if the doc was just deleted it should be deleted, if it was indexed it should never be marked as deleted? (and found if we're in safe mode). Also can you check this property holds also not directly after indexing/deleting? (i.e., after refresh, enforceSafe access etc). Said differently - maybe also randomly do nothing under lock?

@@ -396,6 +397,32 @@ public void testPruneTombstonesWhileLocked() throws InterruptedException, IOExce
assertEquals(0, map.getAllTombstones().size());
}

public void testNeverLeaveStaleDeleteTombstone() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name the test: "testRandomlyDeletingAndIndexingDoc"? it tests much more, which is good.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 20, 2018

Thanks @bleskes and @DaveCTurner for reviewing.

@DaveCTurner I am merging this to catch the release train but feel free to leave your feedbacks.

@dnhatn dnhatn merged commit 9cf8b01 into elastic:master Apr 20, 2018
@dnhatn dnhatn deleted the never-leave-stale-tombstone branch April 20, 2018 00:50
dnhatn added a commit that referenced this pull request Apr 20, 2018
Previously we did not put an indexing to a version map if that map does
not require safe access but removed the existing delete tombstone only
if assertion enabled. In #29585, we removed the side-effect caused by
assertion then this test started failing. This failure can be explained
as follows:

- Step 1: Index a doc then delete that doc

- Step 2: The version map can switch to unsafe mode because of
  concurrent refreshes (implicitly called by flushes)

- Step 3: Index a document - the version map won't add this version
  value and won't prune the tombstone (previously it did)

- Step 4: Delete a document - this will return NOT_FOUND instead of
  DELETED because of the stale delete tombstone

This failure is actually fixed by #29619 in which we never leave stale
delete tombstones

Closes #29626
dnhatn added a commit that referenced this pull request Apr 20, 2018
Today the VersionMap does not clean up a stale delete tombstone if it
does not require safe access. However, in a very rare situation due to
concurrent refreshes, the safe-access flag may be flipped over then an
engine accidentally consult that stale delete tombstone.

This commit ensures to never leave stale delete tombstones in a version
map by always pruning delete tombstones when putting a new index entry
regardless of the value of the safe-access flag.
dnhatn added a commit that referenced this pull request Apr 20, 2018
Previously we did not put an indexing to a version map if that map does
not require safe access but removed the existing delete tombstone only
if assertion enabled. In #29585, we removed the side-effect caused by
assertion then this test started failing. This failure can be explained
as follows:

- Step 1: Index a doc then delete that doc

- Step 2: The version map can switch to unsafe mode because of
  concurrent refreshes (implicitly called by flushes)

- Step 3: Index a document - the version map won't add this version
  value and won't prune the tombstone (previously it did)

- Step 4: Delete a document - this will return NOT_FOUND instead of
  DELETED because of the stale delete tombstone

This failure is actually fixed by #29619 in which we never leave stale
delete tombstones

Closes #29626
martijnvg added a commit that referenced this pull request Apr 20, 2018
* es/master: (32 commits)
  TEST: Unmute testPrimaryRelocationWhileIndexing
  Remove remaining tribe node references (#29574)
  Never leave stale delete tombstones in version map (#29619)
  Do not serialize common stats flags using ordinal (#29600)
  Remove stale comment from JVM stats (#29625)
  TEST: Mute testPrimaryRelocationWhileIndexing
  Remove bulk fallback for write thread pool (#29609)
  Fix an incorrect reference to 'zero_terms_docs' in match_phrase queries.
  Update the version compatibility for zero_terms_query in match_phrase.
  Account translog location to ram usage in version map
  Remove extra spaces from changelog
  Add support to match_phrase query for zero_terms_query. (#29598)
  Fix incorrect references to 'zero_terms_docs' in query parsing error messages. (#29599)
  Build: Move java home checks to pre-execution phase (#29548)
  Avoid side-effect in VersionMap when assertion enabled (#29585)
  [Tests] Remove accidental logger usage
  Add tests for ranking evaluation with aliases (#29452)
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429)
  Update plan for the removal of mapping types. (#29586)
  [Docs] Add rankEval method for Jva HL client
  ...
martijnvg added a commit that referenced this pull request Apr 20, 2018
* es/6.x: (28 commits)
  TEST: Unmute testPrimaryRelocationWhileIndexing
  Never leave stale delete tombstones in version map (#29619)
  Do not serialize common stats flags using ordinal (#29600)
  Remove stale comment from JVM stats (#29625)
  TEST: Mute testPrimaryRelocationWhileIndexing
  Remove 7.0.0 from 6.x changelog (#29621)
  Add support to match_phrase query for zero_terms_query. (#29598)
  Account translog location to ram usage in version map
  Avoid side-effect in VersionMap when assertion enabled (#29585)
  Build: Move java home checks to pre-execution phase (#29548)
  Add tests for ranking evaluation with aliases (#29452)
  [Test] Fix assertion in SearchDocumentationIT
  Deprecate use of `htmlStrip` as name for HtmlStripCharFilter (#27429)
  test: Assert deprecated http.enebled setting warning
  Update plan for the removal of mapping types. (#29586)
  [Docs] Add rankEval method for Jva HL client
  Make ranking evaluation details accessible for client
  Rename the bulk thread pool to write thread pool (#29593)
  [Test] Minor changes to rank_eval tests (#29577)
  test: Assert deprecated http.enebled setting warning
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants