Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move primary term from ReplicationRequest to ConcreteShardRequest #25822

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jul 21, 2017

Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries). In particular this will help with recent test failures that we've been seeing.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The simplicity of how this turned out to be is a good confirmation it was the right move. I think it's good to set ext.bwc_tests_enabled to false in this PR (#25230)

@@ -920,6 +927,11 @@ private void acquirePrimaryShardReference(ShardId shardId, String allocationId,
if (actualAllocationId.equals(allocationId) == false) {
throw new ShardNotFoundException(shardId, "expected aID [{}] but found [{}]", allocationId, actualAllocationId);
}
final long actualTerm = indexShard.getPrimaryTerm();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you choose to do it here rather than doing it in index shard, like we discussed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the check to acquirePrimaryOperationPermit would be less of a contained change than this (i.e. it would also involve recovery, where would then have to add a term to StartRecoveryRequest, ...). I still plan on exploring that as a follow-up, but thought this to be the easiest way to have a self-contained change that focuses on the TransportReplicationAction / ReplicaRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k. ++

PlainActionFuture<TestResponse> listener = new PlainActionFuture<>();
final boolean wrongAllocationId = randomBoolean();
final long wrongTerm = primaryTerm + randomIntBetween(1, 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doesn't allow for the right term and the wrong aid. Something that can't really happen, but is good to test imo.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -706,7 +711,9 @@ private void performLocalAction(ClusterState state, ShardRouting primary, Discov
logger.trace("send action [{}] to local primary [{}] for request [{}] with cluster state version [{}] to [{}] ",
transportPrimaryAction, request.shardId(), request, state.version(), primary.currentNodeId());
}
performAction(node, transportPrimaryAction, true, new ConcreteShardRequest<>(request, primary.allocationId().getId()));
final long primaryTerm = state.metaData().getIndexSafe(primary.index()).primaryTerm(primary.id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something I noticed while reviewing the bwc code - we can pass the indexMetadata (or primary term) as an argument. We look it up on in the calling method. Save on double lookups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one @bleskes.

@@ -159,7 +159,8 @@ task verifyVersions {
* after the backport of the backcompat code is complete.
*/
allprojects {
ext.bwc_tests_enabled = true
// disabled in preparation of merging #25822 & #25824
ext.bwc_tests_enabled = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only do not forget to reenable (although we do have a daily CI check that will fail if it's not enabled).

@ywelsch ywelsch merged commit a2624df into elastic:master Jul 21, 2017
@ywelsch
Copy link
Contributor Author

ywelsch commented Jul 21, 2017

Thanks @bleskes @jasontedor

ywelsch added a commit that referenced this pull request Jul 21, 2017
…5824)

Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries).

5.x. backport of #25822
ywelsch added a commit that referenced this pull request Jul 21, 2017
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants