Skip to content

Conversation

abeyad
Copy link

@abeyad abeyad commented Dec 14, 2016

This PR completes the refactoring of the cluster allocation explain API and improves it in the following two high-level ways:

  1. The explain API now uses the same allocators that the AllocationService uses to make shard allocation decisions. Prior to this PR, the explain API would run the deciders against each node for the shard in question, but this was not executed on the same code path as the allocators, and many of the scenarios in shard allocation were not captured due to not executing through the same code paths as the allocators.

  2. The APIs have changed, both on the Java and JSON level, to accurately capture the decisions made by the system. The APIs also now report on shard moving and rebalancing decisions, whereas the previous API did not report decisions for moving shards which cannot remain on their current node or rebalancing shards to form a more balanced cluster.

Note: this change affects plugin developers who may have a custom implementation of the ShardsAllocator interface. The method weighShards has been removed and no longer has any utility. In order to support the new explain API, however, a custom implementation of ShardsAllocator must now implement ShardAllocationDecision decideShardAllocation(ShardRouting shard, RoutingAllocation allocation) which provides a decision and explanation for allocating a single shard. For implementations that do not support explaining a single shard allocation via the cluster allocation explain API, this method can simply return an UnsupportedOperationException.

Relates #21691 #21662 #21103 #20634 #20347

@abeyad
Copy link
Author

abeyad commented Dec 16, 2016

retest this please

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments. Will give it a more thorough look tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to specify full path, ShardRoutingState is imported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shard has unassignedInfo if it is in state "initializing".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_state might be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not change AllocationStatus.value() to print "not_permitted"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change how the AllocationStatus is represented in the ClusterState

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe convert it to an AllocationDecision and use the toString() provided by that one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that will be fine, since every possible value of AllocationStatus is covered by AllocationDecision, and this rendering for UnassignedInfo is specific to the explain API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why catch this and swallow exception? I think it's ok if the API does not work if that bit is not implemented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nicer to have:

        StringBuilder buf = new StringBuilder();
        if (usedIndexSetting) {
            buf.append("index setting [");
            buf.append(INDEX_ROUTING_ALLOCATION_ENABLE_SETTING.getKey());
        } else {
            buf.append("cluster setting [");
            buf.append(CLUSTER_ROUTING_ALLOCATION_ENABLE_SETTING.getKey());
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, other deciders have same issue with messages (e.g. filter allocation decider). Let's have a separate PR to fix these messages to make clear that they refer to index / cluster settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaner to just use

AsyncShardFetch<TransportNodesListGatewayStartedShards.NodeGatewayStartedShards> fetch = asyncFetchStarted.computeIfAbsent(shard.shardId(),
    shardId -> new InternalAsyncFetch<>(logger, "shard_started", shard.shardId(), startedAction));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does Clint think about this? I also wonder if we should only enable this when ?human is not explicitly set on the request

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintongormley what do you think about setting the human readable flag for the REST API here? it only affects the time fields (e.g. remaining_delay vs remaining_delay_in_millis for the delayed allocation scenario) and byte fields for shard stores. Should the user just be required to put ?human to get the more user friendly outputs similar to the other APIs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was advocating for was to enable the human flag by default, but if someone sets human = false to respect that as well...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, yes that makes sense. @clintongormley should the human flag be enabled by default for the explain API? For example, the GET indices API takes the human flag but the default is false in that case. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere that we support the human flag, it defaults to false. Even though this API will primarily consumed by humans, I think I'd prefer to keep the consistency of defaulting to false. remaining_delay_in_millis is still quite readable for a human, and rerunning the request with ?human is easy enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comes for free if you use auto-managed min-master-nodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

@abeyad
Copy link
Author

abeyad commented Dec 19, 2016

@ywelsch I pushed 02dc012353ab8325ac9036fd3e2b728cf1339842 to address your initial comments.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm not sure how we should handle initializing / relocating shards. Giving the same output as for started shards is highly confusing (e.g. SameShardAllocator). There are also no tests for these 2 scenarios.
  • The API should support selecting a specific shard if there are multiple replica shards that are allocated. One possibility would be to pass the node id of the node the selected shard is allocated to. This will help answer the question "Why is that replica shard allocated to that node".
  • The docs also require updates with the changed output format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have an extra flag to set "explain mode".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but felt that there really is no concrete distinction between "debug" mode and "explain" mode in this case. Would changing the naming of the enum values help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change regarding relocated/initializing shard I think it's fine now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this throw an UnsupportedOperationException?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it would help to catch incorrect usages, as getExplanation() should never be called on Multi

@abeyad
Copy link
Author

abeyad commented Dec 19, 2016

@ywelsch

I'm not sure how we should handle initializing / relocating shards. Giving the same output as for started shards is highly confusing (e.g. SameShardAllocator). There are also no tests for these 2 scenarios.

I am not sure myself. Thinking about it some more, I prefer if we don't try to explain initializing / relocating shards. It doesn't make sense in context - an initializing shard just got assigned, so there is nothing to report about in terms of "why are my shards unassigned" and whichever node it got assigned to is either (1) the most optimal node from a cluster balance perspective or (2) it had a copy of the data. In any case, nothing to really explain. In the case of relocating shards, the same thing - we are relocating due to a better balance or being required to move, so there is nothing new to explain that the allocators didn't already take into account by starting the relocation process. I think if we returned a ShardAllocationDecision that had a method isDecisionTaken() that returned false for the Java API, it would be sufficient, and for the JSON output, we could just have a single string output "cannot explain initializing | relocating shards". The other alternative is to throw an exception. What do you think? @clintongormley do you have any thoughts on this point?

The API should support selecting a specific shard if there are multiple replica shards that are allocated. One possibility would be to pass the node id of the node the selected shard is allocated to. This will help answer the question "Why is that replica shard allocated to that node".

This is a good idea. I'll add this.

The docs also require updates with the changed output format.

Yes, I was going to do this in a follow up PR.

@abeyad
Copy link
Author

abeyad commented Dec 19, 2016

I pushed 56573f99d6518996d98eefa6b7a5a56288e218d1 to add the option to specify a node id for an assigned replica shard to explain.

@clintongormley
Copy link
Contributor

I think if we returned a ShardAllocationDecision that had a method isDecisionTaken() that returned false for the Java API, it would be sufficient, and for the JSON output, we could just have a single string output "cannot explain initializing | relocating shards". The other alternative is to throw an exception. What do you think? @clintongormley do you have any thoughts on this point?

Sounds reasonable to me

@abeyad
Copy link
Author

abeyad commented Dec 20, 2016

@ywelsch I pushed 7c960c4f81ae2819834d1d3367c5217e881aa395 to not try to explain initializing or relocating shards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why default to false? This API is to be consumed primarily by humans

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Clint's comment here #22182 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClinT_T

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable is only used in one place. Just put it there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a mistake, i meant to use it in the line below as well, so I changed the line below to use it (its used twice now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe swap the branches, no need to have an == false condition here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make NOT_TAKEN public static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_node might suffice (see reroute API)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return null as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if replicaNode is specified, we have to check that a shard on that node can be found. Also I think we should use DiscoveryNodes.resolveNode so that the user can specify the node by name or id or ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code already fails if a replica shard on the specified node cannot be found. foundShard will be null so we will throw an exception below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change regarding relocated/initializing shard I think it's fine now.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments.

@abeyad
Copy link
Author

abeyad commented Dec 20, 2016

@ywelsch I pushed 9a5134af5c92120ed362efc4c156c060d9a9f2e6 and e0d6df724b6d489129bbc4d0e382cf16e8114b98

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but looks good o.w.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalStateException?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just let it bubble up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not true anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also include this.currentNode == null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not true anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer relocationTargetNode to relocatingTargetNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClinT_T

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testInitializingOrRelocatingShardExplanation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_state?

@abeyad abeyad force-pushed the explain_api_refactor_rest branch from 81c16f5 to 02213ed Compare December 21, 2016 19:59
@abeyad
Copy link
Author

abeyad commented Dec 21, 2016

@ywelsch thanks for your review! I pushed another commit to address the last round of feedback.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @abeyad

@abeyad
Copy link
Author

abeyad commented Dec 22, 2016

retest this please

1 similar comment
@abeyad
Copy link
Author

abeyad commented Dec 22, 2016

retest this please

@abeyad abeyad force-pushed the explain_api_refactor_rest branch 2 times, most recently from 9648ff4 to 0089cc1 Compare December 22, 2016 15:13
@abeyad
Copy link
Author

abeyad commented Dec 22, 2016

retest this please

1 similar comment
@abeyad
Copy link
Author

abeyad commented Dec 22, 2016

retest this please

This commit changes the REST and transport layers to use the
allocation deciders refactoring and new formats for presenting
the allocation explanations.  The API is changed from both the
Java API layer as well as the REST layer.  This is a breaking
change from 5.0.x and 5.1.x explain API.
@abeyad abeyad force-pushed the explain_api_refactor_rest branch from aa5aae2 to 4e7bcff Compare December 23, 2016 17:38
"cluster shard allocation explanation test":
- skip:
version: " - 5.2.0"
reason: allocation explain api format is different in versions <= 5.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean 5.1.99? The explain format is different in versions < 5.2.0?

try {
orig = serialize(version, streamable);
} catch (IllegalArgumentException e) {
} catch (IllegalArgumentException | IllegalStateException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IllegalArgumentException for checkVersion functionality as well. It fits better than IllegalStateException I think (the node does not need to look at current state to reject serializing the message but just at the arguments).


@Override
public void writeTo(StreamOutput out) throws IOException {
checkVersion(out.getVersion());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can have a REST test that checks this exception. This could be done with a skip section of the form 5.2.0 - so that the test is only active in BWC tests where the minimum es version is < 5.2.0.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with doing this is that such a test will sometimes pass (having triggered the IllegalStateException) and sometimes it won't. In particular, if the REST request is routed directly to the master node, there will be no exception. The exception will only occur if the REST request is routed to the non-master node which is on a different version from master (and not backward compatible w.r.t. the explain API), in which case the exception will be triggered when routed to the master node.

I did manually test these scenarios. We can discuss options tomorrow.

@abeyad
Copy link
Author

abeyad commented Jan 1, 2017

@ywelsch I pushed 03656a9 to address your feedback.

@abeyad
Copy link
Author

abeyad commented Jan 2, 2017

retest this please

@abeyad abeyad merged commit 20ab4be into elastic:master Jan 2, 2017
@abeyad abeyad deleted the explain_api_refactor_rest branch January 2, 2017 18:28
abeyad pushed a commit that referenced this pull request Jan 2, 2017
…cation decisions (#22182)

This PR completes the refactoring of the cluster allocation explain API and improves it in the following two high-level ways:

 1. The explain API now uses the same allocators that the AllocationService uses to make shard allocation decisions. Prior to this PR, the explain API would run the deciders against each node for the shard in question, but this was not executed on the same code path as the allocators, and many of the scenarios in shard allocation were not captured due to not executing through the same code paths as the allocators.

 2. The APIs have changed, both on the Java and JSON level, to accurately capture the decisions made by the system. The APIs also now report on shard moving and rebalancing decisions, whereas the previous API did not report decisions for moving shards which cannot remain on their current node or rebalancing shards to form a more balanced cluster.

Note: this change affects plugin developers who may have a custom implementation of the ShardsAllocator interface. The method weighShards has been removed and no longer has any utility. In order to support the new explain API, however, a custom implementation of ShardsAllocator must now implement ShardAllocationDecision decideShardAllocation(ShardRouting shard, RoutingAllocation allocation) which provides a decision and explanation for allocating a single shard. For implementations that do not support explaining a single shard allocation via the cluster allocation explain API, this method can simply return an UnsupportedOperationException.
@abeyad
Copy link
Author

abeyad commented Jan 2, 2017

5.x commit: 7c879ef

@lcawl lcawl added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018
DiannaHohensee added a commit that referenced this pull request Aug 22, 2023
Removes ensureStableCluster() calls in ClusterAllocationExplainIT
testing after startNode*() calls. The stabilization checks were
originally necessary when the tests were added in #22182. However,
#49248 later enhanced validateClusterFormed(), which startNode*()
implicitly calls, to perform similar checks to ensureStableCluster()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking-java :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement release highlight v5.2.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants