New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move primary term from ReplicationRequest to ConcreteShardRequest #25822
Move primary term from ReplicationRequest to ConcreteShardRequest #25822
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The simplicity of how this turned out to be is a good confirmation it was the right move. I think it's good to set ext.bwc_tests_enabled
to false in this PR (#25230)
@@ -920,6 +927,11 @@ private void acquirePrimaryShardReference(ShardId shardId, String allocationId, | |||
if (actualAllocationId.equals(allocationId) == false) { | |||
throw new ShardNotFoundException(shardId, "expected aID [{}] but found [{}]", allocationId, actualAllocationId); | |||
} | |||
final long actualTerm = indexShard.getPrimaryTerm(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you choose to do it here rather than doing it in index shard, like we discussed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the check to acquirePrimaryOperationPermit
would be less of a contained change than this (i.e. it would also involve recovery, where would then have to add a term to StartRecoveryRequest, ...). I still plan on exploring that as a follow-up, but thought this to be the easiest way to have a self-contained change that focuses on the TransportReplicationAction / ReplicaRequest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k. ++
PlainActionFuture<TestResponse> listener = new PlainActionFuture<>(); | ||
final boolean wrongAllocationId = randomBoolean(); | ||
final long wrongTerm = primaryTerm + randomIntBetween(1, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't allow for the right term and the wrong aid. Something that can't really happen, but is good to test imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -706,7 +711,9 @@ private void performLocalAction(ClusterState state, ShardRouting primary, Discov | |||
logger.trace("send action [{}] to local primary [{}] for request [{}] with cluster state version [{}] to [{}] ", | |||
transportPrimaryAction, request.shardId(), request, state.version(), primary.currentNodeId()); | |||
} | |||
performAction(node, transportPrimaryAction, true, new ConcreteShardRequest<>(request, primary.allocationId().getId())); | |||
final long primaryTerm = state.metaData().getIndexSafe(primary.index()).primaryTerm(primary.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something I noticed while reviewing the bwc code - we can pass the indexMetadata (or primary term) as an argument. We look it up on in the calling method. Save on double lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one @bleskes.
@@ -159,7 +159,8 @@ task verifyVersions { | |||
* after the backport of the backcompat code is complete. | |||
*/ | |||
allprojects { | |||
ext.bwc_tests_enabled = true | |||
// disabled in preparation of merging #25822 & #25824 | |||
ext.bwc_tests_enabled = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only do not forget to reenable (although we do have a daily CI check that will fail if it's not enabled).
Thanks @bleskes @jasontedor |
…5824) Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the ReplicationOperation universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries). 5.x. backport of #25822
Removes the primary term from the replication request and pushes it into the transport envelope. This makes it possible to remove the term from the
ReplicationOperation
universe. The primary term that is to be used for a replication operation is now determined in the reroute phase when the node decides to execute a primary action (and validated once the primary action gets to execute). This makes it possible to validate that the primary action was sent to the correct primary shard instance that it was meant to be sent to (currently we only validate primary actions using the allocation id, which can be reused for failed and reallocated primaries). In particular this will help with recent test failures that we've been seeing.