Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose all permits acquisition in IndexShard and TransportReplicationAction #35540

Merged
merged 4 commits into from Nov 23, 2018

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Nov 14, 2018

This pull request exposes two new methods in the IndexShard and TransportReplicationAction classes in order to allow transport replication actions to acquire all index shard operation permits for their execution.

It first adds the acquirePrimaryAllOperationsPermits() and the acquireReplicaAllOperationsPermits() methods to the IndexShard class which allow to acquire all operations permits on a shard while exposing a Releasable. It also refactors the TransportReplicationAction class to expose two protected methods (acquirePrimaryOperationPermit() and acquireReplicaOperationPermit()) that can be overridden when a transport replication action requires the acquisition of all permits on primary and/or replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which illustrates how a transport replication action can grab all permits before adding a cluster block in the cluster state, making subsequent operations that requires a single permit to fail (such test has been discussed in #35332 (comment)).

Related to elastic #33888

@tlrx tlrx added >enhancement v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v6.6.0 labels Nov 14, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @tlrx. I left initial feedback on the production call.

@@ -697,7 +728,7 @@ public void onFailure(Exception e) {
}
}

protected IndexShard getIndexShard(ShardId shardId) {
protected IndexShard getIndexShard(final ShardId shardId, final String targetAllocationID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the targetAllocaitonID passed here? I guess it should be validated, but it isn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a leftover that was helpful in TransportReplicationAllPermitsAcquisitionTests to dinstinguish between the primary and replica shard when executing a transport replication test action.

I reverted this and changed how the action in test retrieves the index shard.

* Acquire all primary operation permits. Once all permits are acquired, the provided ActionListener is called.
* It is the responsibility of the caller to close the {@link Releasable}.
*/
public void acquirePrimaryAllOperationsPermits(final ActionListener<Releasable> onPermitAcquired, final TimeValue timeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: acquireAllPrimaryOperationPermits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* an {@link IllegalStateException}. Otherwise the global checkpoint and the max_seq_no_of_updates marker of the replica are updated
* before the invocation of the {@link ActionListener#onResponse(Object)}} method of the provided listener.
*/
private ActionListener<Releasable> createListener(final ActionListener<Releasable> listener,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so comfortable with separating this code from the one in updatePrimaryTermIfNeeded - they are tightly connected. Instead of sharing code this way, how about creating a callback that will run:

indexShardOperationPermits.acquire(listener, executorOnDelay, true, debugInfo);

or

indexShardOperationPermits.asyncBlockOperations(listener, timeout.duration(), timeout.timeUnit());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share your concerns and I updated the code, let me know what you think.

@tlrx
Copy link
Member Author

tlrx commented Nov 15, 2018

Thanks @bleskes, I addressed your first bunch of comments. I also added comments to the TransportReplicationAllPermitsAcquisitionTests and reworked it a bit so that it is easier to understand.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left one suggestion for a potential follow up PR.

allPermitsAction.new AsyncPrimaryAction(request(), allocationId(), primaryTerm(), transportChannel(allPermitFuture), null) {
@Override
void runWithPrimaryShardReference(final TransportReplicationAction.PrimaryShardReference reference) {
assertEquals("All permits must be acquired", 0, reference.indexShard.getActiveOperationsCount());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indicates a funky property of getActiveOperationsCount. To date all the blocks were internal to the shard and thus 0 was a good return. I don't really have a good solution in mind. At the very least we should document this on getActiveOperationsCount. Maybe 1 (we know there is only one exclusive op) is a better return value? /cc @ywelsch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take the point, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I missed the ping here. I think that 1 is ok too, but then we should in general change the terminology in IndexShardOperationPermits to move from the notion of blocking operations to running exclusive operations (similar to a read/write lock).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

if (this.shardId.equals(shardId) == false) {
throw new AssertionError("shard id differs from " + shardId);
}
return (executedOnPrimary.get() == null) ? primary : replica;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, not really proud of this... I'll take a look at how to improve this test.

…cationAction

This commit adds the acquirePrimaryAllOperationsPermits() and the
acquireReplicaAllOperationsPermits() methods to the IndexShard class.
These methods allow to acquire all operations permits on a primary or
a replica shard and can be used in future transport replication actions
to acquire all permits instead of a single one.

Related to elastic elastic#33888
@tlrx tlrx force-pushed the add-indexshard-acquire-all-permits branch from 4865718 to ef255a2 Compare November 22, 2018 12:11
@tlrx tlrx merged commit 2e37f17 into elastic:master Nov 23, 2018
@tlrx tlrx deleted the add-indexshard-acquire-all-permits branch November 23, 2018 08:27
tlrx added a commit that referenced this pull request Nov 23, 2018
…Action (#35540)

This pull request exposes two new methods in the IndexShard and
TransportReplicationAction classes in order to allow transport replication
actions to acquire all index shard operation permits for their execution.

It first adds the acquireAllPrimaryOperationPermits() and the
acquireAllReplicaOperationsPermits() methods to the IndexShard class
which allow to acquire all operations permits on a shard while exposing
a Releasable. It also refactors the TransportReplicationAction class to
expose two protected methods (acquirePrimaryOperationPermit() and
acquireReplicaOperationPermit()) that can be overridden when a transport
replication action requires the acquisition of all permits on primary and/or
replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
 illustrates how a transport replication action can grab all permits before
adding a cluster block in the cluster state, making subsequent operations
that requires a single permit to fail).

Related to elastic #33888
original-brownbear pushed a commit that referenced this pull request Nov 23, 2018
…Action (#35540)

This pull request exposes two new methods in the IndexShard and 
TransportReplicationAction classes in order to allow transport replication 
actions to acquire all index shard operation permits for their execution.

It first adds the acquireAllPrimaryOperationPermits() and the 
acquireAllReplicaOperationsPermits() methods to the IndexShard class 
which allow to acquire all operations permits on a shard while exposing 
a Releasable. It also refactors the TransportReplicationAction class to 
expose two protected methods (acquirePrimaryOperationPermit() and 
acquireReplicaOperationPermit()) that can be overridden when a transport 
replication action requires the acquisition of all permits on primary and/or 
replica shard during execution.

Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
 illustrates how a transport replication action can grab all permits before 
adding a cluster block in the cluster state, making subsequent operations 
that requires a single permit to fail).

Related to elastic #33888
@tlrx tlrx mentioned this pull request Dec 5, 2018
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants