Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not optimize append-only operation if normal operation with higher seq# was seen #28787

Merged
merged 10 commits into from Mar 26, 2018

Conversation

@dnhatn
Copy link
Contributor

commented Feb 22, 2018

When processing an append-only operation, primary knows that operations can only conflict with another instance of the same operation. This is true as the id was freshly generated. However this property doesn't hold for replicas. As soon as an auto-generated ID was indexed into the primary, it can be exposed to a search and users can issue a follow up operation on it. In extremely rare cases, the follow up operation can be arrived and processed on a replica before the original append-only request. In this case we can't simply proceed with the append-only request and blindly add it to the index without consulting the version map. The following scenario can cause difference between primary and replica.

  1. Primary indexes an auto-gen-id doc. (id=X, v=1, s#=20)
  2. A refresh cycle happens on primary
  3. The new doc is picked up and modified - say by a delete by query request - Primary gets a delete doc (id=X, v=2, s#=30)
  4. Delete doc is processed first on the replica (id=X, v=2, s#=30)
  5. Indexing operation arrives on the replica, since it's an auto-gen-id request and the retry marker is lower, we put it into lucene without any check. Replica has a doc the primary doesn't have.

To deal with a potential conflict between an append-only operation and a normal operation on replicas, we need to rely on sequence numbers. This commit maintains the max seqno of non-append-only operations on replica then only apply optimization for an append-only operation only if its seqno is higher than the seqno of all non-append-only.

@s1monw
s1monw approved these changes Feb 22, 2018
Copy link
Contributor

left a comment

I left some suggestions but LGTM otherwise

private void updateMaxSeqNoOfNonAppendOnlyOperations(Operation op) {
assert op.origin() != Operation.Origin.PRIMARY;
maxSeqNoOfNonAppendOnlyOperations.updateAndGet(curr -> Math.max(op.seqNo(), curr));
assert maxSeqNoOfNonAppendOnlyOperations.get() >= op.seqNo();

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 22, 2018

Contributor

this needs a message

return index.seqNo() <= maxSeqNoOfNonAppendOnlyOperations.get();
}

private void updateMaxSeqNoOfNonAppendOnlyOperations(Operation op) {

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 22, 2018

Contributor

maybe just inline this into the planIndexingAsNonPrimary method? I think that would be cleaner.

*/
assert canOptimizeAddDocument(index);
assert index.origin() != Operation.Origin.PRIMARY;
return index.seqNo() <= maxSeqNoOfNonAppendOnlyOperations.get();

This comment has been minimized.

Copy link
@s1monw

s1monw Feb 22, 2018

Contributor

I think you should inline this into planIndexingAsNonPrimary then we don't need all the asserts

@DaveCTurner

This comment has been minimized.

Copy link
Contributor

commented Feb 22, 2018

Thanks @dnhatn - I think we should agree on elastic/elasticsearch-formal-models#28 before we proceed with this. In particular, it's not yet clear that this is the only problem in this area, and we haven't settled whether we think replicas should have a different version map that only considers seq#s (the document version numbers are now only of importance on the primary).

@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Feb 23, 2018

@DaveCTurner I understand your concern. Boaz and I discussed and agreed to start the implementation for this and #28790. We hoped that our reasoning for these would be fine. Certainly, we need to sync between the formal model and implementation.

Copy link
Member

left a comment

This is great. I left some nits. As discussed before, we should wait for blessing via @DaveCTurner 's work before merging.

break;
final String key = entry.getKey();
if (key.equals(MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_ID)) {
maxUnsafeAutoIdTimestamp.set(Math.max(maxUnsafeAutoIdTimestamp.get(), Long.parseLong(entry.getValue())));

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 3, 2018

Member

question - why the leniency with max?

This comment has been minimized.

Copy link
@dnhatn

dnhatn Mar 4, 2018

Author Contributor

I removed the max expr.

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 8, 2018

Member

Looks like the max is still here for time stamps? maybe assert it's -1 (for both seq# and timestamp)

assert index.version() == 1L : "can optimize on replicas but incoming version is [" + index.version() + "]";
plan = IndexingStrategy.optimizedAppendOnly(index.seqNo());
} else {
if (appendOnlyRequest == false) {

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 3, 2018

Member

can we introduce a method similar to mayHaveBeenIndexedBefore that does the check and also updates the maxSeqNoOfNonAppendOnlyOperations? I think it's good to have both marker handling consistent.

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 8, 2018

Member

Apparently @s1monw prefered the reverse. I'm fine with leaving as is.

}
// doc1 is delayed and arrived after a non-append-only op.
final long seqNoDoc1 = localCheckpointTracker.generateSeqNo();
Engine.IndexResult regularDoc = engine.index(replicaIndexForDoc(

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 3, 2018

Member

can we also test a delete?

@bleskes

This comment has been minimized.

Copy link
Member

commented Mar 3, 2018

One more nit - can we maybe give the PR title to be more exact about what the code change? something like do not optimize if a normal operation was seen with a higher seq# ..

@dnhatn dnhatn changed the title Do not optimize append-only operation if it may have been exposed Do not optimize append-only op if normal op with higher seq# was seen Mar 4, 2018
dnhatn added 2 commits Mar 4, 2018
@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Mar 4, 2018

@bleskes I've addressed your comments. Would you please take another look? Thank you.

@DaveCTurner

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

I have updated elastic/elasticsearch-formal-models#28 as we discussed, so that's awaiting further comments. AIUI the model covers both this PR and #28790. From a first look, this seems to be quite different from the modelled solution and I'd be more comfortable if we brought them closer together (from either/both ends). In particular, we don't yet seem to be looking at the sequence numbers of replication requests here and I think we should.

I'm still travelling and haven't had a lot of sleep so expect more useful/correct/accurate feedback at a later date. I'm sending this now in case it's useful to those in more westerly timezones in the meantime.

@bleskes
bleskes approved these changes Mar 8, 2018
break;
final String key = entry.getKey();
if (key.equals(MAX_UNSAFE_AUTO_ID_TIMESTAMP_COMMIT_ID)) {
maxUnsafeAutoIdTimestamp.set(Math.max(maxUnsafeAutoIdTimestamp.get(), Long.parseLong(entry.getValue())));

This comment has been minimized.

Copy link
@bleskes

bleskes Mar 8, 2018

Member

Looks like the max is still here for time stamps? maybe assert it's -1 (for both seq# and timestamp)

@bleskes

This comment has been minimized.

Copy link
Member

commented Mar 8, 2018

In particular, we don't yet seem to be looking at the sequence numbers of replication requests here and I think we should.

@DaveCTurner can you clarify what you mean? we use the seq# of the incoming op to update the maxSeqNoOfNonAppendOnlyOperations field?

@DaveCTurner

This comment has been minimized.

Copy link
Contributor

commented Mar 8, 2018

Apologies, that was written on my phone and is inaccurate. I meant that the modelled solution stores seq#s in the version map.

DaveCTurner added a commit to DaveCTurner/elasticsearch-formal-models that referenced this pull request Mar 26, 2018
Copy link
Contributor

left a comment

LGTM.

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/engine/InternalEngine.java
#	server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
#	server/src/test/java/org/elasticsearch/index/replication/ESIndexLevelReplicationTestCase.java
#	server/src/test/java/org/elasticsearch/index/replication/IndexLevelReplicationTests.java
@dnhatn

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2018

Thanks @s1monw, @bleskes and @DaveCTurner for reviewing.

@dnhatn dnhatn merged commit 0ac89a3 into elastic:master Mar 26, 2018
2 checks passed
2 checks passed
CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci Build finished.
Details
@dnhatn dnhatn deleted the dnhatn:append-only-marker branch Mar 26, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 27, 2018
* master:
  Do not optimize append-only if seen normal op with higher seqno (elastic#28787)
  [test] packaging: gradle tasks for groovy tests (elastic#29046)
  Prune only gc deletes below local checkpoint (elastic#28790)
@dnhatn dnhatn changed the title Do not optimize append-only op if normal op with higher seq# was seen Do not optimize append-only if seen normal op with higher seq# Mar 27, 2018
@dnhatn dnhatn changed the title Do not optimize append-only if seen normal op with higher seq# Do not optimize append-only operation if normal operation with higher seq# was seen Mar 27, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 27, 2018
* master: (40 commits)
  Do not optimize append-only if seen normal op with higher seqno (elastic#28787)
  [test] packaging: gradle tasks for groovy tests (elastic#29046)
  Prune only gc deletes below local checkpoint (elastic#28790)
  remove testUnassignedShardAndEmptyNodesInRoutingTable
  elastic#28745: remove extra option in the composite rest tests
  Fold EngineDiskUtils into Store, for better lock semantics (elastic#29156)
  Add file permissions checks to precommit task
  Remove execute mode bit from source files
  Optimize the composite aggregation for match_all and range queries (elastic#28745)
  [Docs] Add rank_eval size parameter k (elastic#29218)
  [DOCS] Remove ignore_z_value parameter link
  Docs: Update docs/index_.asciidoc (elastic#29172)
  Docs: Link C++ client lib elasticlient (elastic#28949)
  [DOCS] Unregister repository instead of deleting it (elastic#29206)
  Docs: HighLevelRestClient#multiSearch (elastic#29144)
  Add Z value support to geo_shape
  Remove type casts in logging in server component (elastic#28807)
  Change BroadcastResponse from ToXContentFragment to ToXContentObject (elastic#28878)
  REST : Split `RestUpgradeAction` into two actions (elastic#29124)
  Add error file docs to important settings
  ...
martijnvg added a commit that referenced this pull request Mar 28, 2018
* es/master: (22 commits)
  Fix building Javadoc JARs on JDK for client JARs (#29274)
  Require JDK 10 to build Elasticsearch (#29174)
  Decouple NamedXContentRegistry from ElasticsearchException (#29253)
  Docs: Update generating test coverage reports (#29255)
  [TEST] Fix issue with HttpInfo passed invalid parameter
  Remove all dependencies from XContentBuilder (#29225)
  Fix sporadic failure in CompositeValuesCollectorQueueTests
  Propagate ignore_unmapped to inner_hits (#29261)
  TEST: Increase timeout for testPrimaryReplicaResyncFailed
  REST client: hosts marked dead for the first time should not be immediately retried (#29230)
  TEST: Use different translog dir for a new engine
  Make SearchStats implement Writeable (#29258)
  [Docs] Spelling and grammar changes to reindex.asciidoc (#29232)
  Do not optimize append-only if seen normal op with higher seqno (#28787)
  [test] packaging: gradle tasks for groovy tests (#29046)
  Prune only gc deletes below local checkpoint (#28790)
  remove testUnassignedShardAndEmptyNodesInRoutingTable
  #28745: remove extra option in the composite rest tests
  Fold EngineDiskUtils into Store, for better lock semantics (#29156)
  Add file permissions checks to precommit task
  ...
DaveCTurner added a commit to elastic/elasticsearch-formal-models that referenced this pull request Mar 28, 2018
This models how indexing and deletion operations are handled on the replica,
including the optimisations for append-only operations and the interaction with
Lucene commits and the version map.

It incorporates

- elastic/elasticsearch#28787
- elastic/elasticsearch#28790
- elastic/elasticsearch#29276
- a proposal to always prune tombstones
dnhatn added a commit that referenced this pull request Mar 28, 2018
When processing an append-only operation, primary knows that operations
can only conflict with another instance of the same operation. This is
true as the id was freshly generated. However this property doesn't hold
for replicas. As soon as an auto-generated ID was indexed into the
primary, it can be exposed to a search and users can issue a follow up
operation on it. In extremely rare cases, the follow up operation can be
arrived and processed on a replica before the original append-only
request. In this case we can't simply proceed with the append-only
request and blindly add it to the index without consulting the version
map.

The following scenario can cause difference between primary and
replica.

1. Primary indexes an auto-gen-id doc. (id=X, v=1, s#=20)
2. A refresh cycle happens on primary
3. The new doc is picked up and modified - say by a delete by query
   request - Primary gets a delete doc (id=X, v=2, s#=30)
4. Delete doc is processed first on the replica (id=X, v=2, s#=30)
5. Indexing operation arrives on the replica, since it's an auto-gen-id
   request and the retry marker is lower, we put it into lucene without
   any check. Replica has a doc the primary doesn't have.

To deal with a potential conflict between an append-only operation and a
normal operation on replicas, we need to rely on sequence numbers. This
commit maintains the max seqno of non-append-only operations on replica
then only apply optimization for an append-only operation only if its
seq# is higher than the seq# of all non-append-only.
@colings86 colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.