New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only fail the relocation target when a replication request on it fails #15791

Merged
merged 5 commits into from Jan 6, 2016

Conversation

Projects
None yet
3 participants
@jasontedor
Member

jasontedor commented Jan 6, 2016

This commit addresses an issue when handling a failed replication
request against a relocating target shard. Namely, if a replication
request fails against the target of a relocation we currently fail both
the source and the target. This leads to an unnecessary
recovery. Instead, only the target of the relocation should be failed.

Closes #15790

Assert that we fail the correct shard when a replication request fails
This commit adds an assertion to
TransportReplicationActionTests#runReplicateTest that when a replication
request fails, we fail the correct shard.

@jasontedor jasontedor changed the title from Relocating shard failure to Only fail the relocation target when a replication request it fails Jan 6, 2016

}
// send operation to relocating shard
if (shard.relocating()) {
performOnReplica(shard, shard.relocatingNodeId());
performOnReplica(shard.buildTargetRelocatingShard());

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

I'm fine with this for this fix (to keep it small), but it's a shame we create these objects. As a followup I think we should change IndexShardRoutingTable#activeShards to include relocation targets (as allInitializingShards does). Then we can use that list which is cached and we also won't need the relocation check,

This comment has been minimized.

@jasontedor

jasontedor Jan 6, 2016

Member

As a followup I think we should change IndexShardRoutingTable#activeShards to include relocation targets (as allInitializingShards does). Then we can use that list which is cached and we also won't need the relocation check,

@bleskes I opened #15798.

@bleskes

View changes

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java Outdated
// requests were sent to the correct shard copies
List<ShardRouting> shards =
clusterService.state().getRoutingTable().index(shardId.getIndex()).shard(shardId.id()).shards();

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

we have a utility method RoutingTable.shardRoutingTable which gives you what you want. Also IndexShardRoutingTabel is iterable so we don't need the shard list.

@bleskes

View changes

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java Outdated
List<ShardRouting> shards =
clusterService.state().getRoutingTable().index(shardId.getIndex()).shard(shardId.id()).shards();
for (ShardRouting shard : shards) {
if (!shard.primary() && !executeOnReplica) {

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

use == false . Also where do we check that the total number of captured requests is what we expect? (to guarantee it doesn't contain the node the primary shard is on).

@bleskes

View changes

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java Outdated
if (shard.unassigned()) {
continue;
}
if (!clusterService.state().getNodes().localNodeId().equals(shard.currentNodeId())) {

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

== false

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

we always run on the node with primary, so I don't think we need this check?

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

sorry - got confused - I thought this code skips the primary. I think it will be clear if the if here would say ShardRouting.primary() == false

@bleskes

View changes

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java Outdated
// and the shard that was requested to be failed
ShardStateAction.ShardRoutingEntry shardRoutingEntry = (ShardStateAction.ShardRoutingEntry)shardFailedRequest.request;
// the shard the request was sent to and the shard to be failed should be the same
assertTrue(shardRoutingEntry.getShardRouting().isSameAllocation(routing));

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

I wonder if we should use equality here?

@jasontedor

This comment has been minimized.

Member

jasontedor commented Jan 6, 2016

@bleskes I pushed c291c17 to address your feedback on TransportReplicationActionTests#runReplicateTest.

@bleskes

View changes

.../test/java/org/elasticsearch/action/support/replication/TransportReplicationActionTests.java Outdated
@@ -486,7 +491,40 @@ protected void runReplicateTest(IndexShardRoutingTable shardRoutingTable, int as
replicationPhase.run();
final CapturingTransport.CapturedRequest[] capturedRequests = transport.capturedRequests();
transport.clear();
assertThat(capturedRequests.length, equalTo(assignedReplicas));

This comment has been minimized.

@bleskes

bleskes Jan 6, 2016

Member

this can go away now...

This comment has been minimized.

@jasontedor
@bleskes

This comment has been minimized.

Member

bleskes commented Jan 6, 2016

awesome. LGTM. Can we open an issue/work on that comment regarding building the target routing for each request? just making sure it's not lost :)

jasontedor added some commits Jan 6, 2016

Only fail the relocation target when a replication request on it fails
This commit addresses an issue when handling a failed replication
request against a relocating target shard. Namely, if a replication
request fails against the target of a relocation we currently fail both
the source and the target. This leads to an unnecessary
recovery. Instead, only the target of the relocation should be failed.
Assert that replication requests are sent to the correct shard copies
This commit adds tighter assertions in
TransportReplicationActionTests#runReplicateTest that replication
requests are sent to the correct shard copies.
Cleanup TransportReplicationActionTests#runReplicateTest
This commit cleans up some of the assertions in
TransportReplicationActionTests#runReplicateTest:
 - use a Map to track actual vs. expected requests
 - assert that no request was sent to the local node
 - use RoutingTable#shardRoutingTable convenience method
 - explicitly use false in boolean conditions
 - clarify requests are expected on replica shards when assigned and
   execution on replicas is true
 - test ShardRouting equality when checking the failed shard request

@jasontedor jasontedor changed the title from Only fail the relocation target when a replication request it fails to Only fail the relocation target when a replication request on it fails Jan 6, 2016

jasontedor added a commit that referenced this pull request Jan 6, 2016

Merge pull request #15791 from jasontedor/relocating-shard-failure
Only fail the relocation target when a replication request on it fails

Closes #15790

@jasontedor jasontedor merged commit 3b192cf into elastic:master Jan 6, 2016

1 check passed

CLA Commit author is a member of Elasticsearch
Details

@jasontedor jasontedor deleted the jasontedor:relocating-shard-failure branch Jan 6, 2016

@jasontedor jasontedor removed the review label Jan 6, 2016

jasontedor added a commit that referenced this pull request Jan 6, 2016

Only fail the relocation target when a replication request on it fails
This commit backports commit 3b192cf
from master to 2.x.

Relates #15791

jasontedor added a commit that referenced this pull request Jan 6, 2016

Only fail the relocation target when a replication request on it fails
This commit backports commit 3b192cf
from master to 2.2.

Relates #15791

jasontedor added a commit that referenced this pull request Jan 6, 2016

Only fail the relocation target when a replication request on it fails
This commit backports commit 3b192cf
from master to 2.1.

Relates #15791

jasontedor added a commit that referenced this pull request Jan 7, 2016

Only fail the relocation target when a replication request on it fails
This commit backports commit 3b192cf
from master to 2.0.

Relates #15791

jasontedor added a commit that referenced this pull request Jan 9, 2016

Only fail the relocation target when a replication request on it fails
This commit backports commit 3b192cf
from master to 1.7.

Relates #15791
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment