New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show configured and remaining delay for an unassigned shard. #17515
Conversation
When a shard is delayed, we now show output like: ```json { "shard" : { "index" : "i", "index_uuid" : "QzoKda9aQCG_hCaZQ18GEg", "id" : 3, "primary" : false }, "assigned" : false, "unassigned_info" : { "reason" : "NODE_LEFT", "at" : "2016-04-04T16:44:47.520Z", "details" : "node_left[HyRLmMLxR5m_f58RKURApQ]" }, "allocation_delay" : "59.9s", "allocation_delay_ms" : 59910, "remaining_delay" : "38.9s", "remaining_delay_ms" : 38991, "nodes" : { "jKiyQcWFTkyp3htyyjxoCw" : { "node_name" : "Landslide", "node_attributes" : { }, "final_decision" : "YES", "weight" : 1.0, "decisions" : [ ] }, "9bzF0SgoQh-G0F0sRW_qew" : { "node_name" : "Caretaker", "node_attributes" : { }, "final_decision" : "NO", "weight" : 2.0, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "the shard cannot be allocated on the same node id [9bzF0SgoQh-G0F0sRW_qew] on which it already exists" } ] } } } ``` Where the new addition is this section: ``` "allocation_delay" : "59.9s", "allocation_delay_ms" : 59910, "remaining_delay" : "38.9s", "remaining_delay_ms" : 38991, ``` Which shows the configured delay as well as the remaining delay until the shard can be considered "assignable". This data is only shown if the shard is unassigned. Relates to elastic#17372
builder.field("allocation_delay", TimeValue.timeValueNanos(delay)); | ||
builder.field("allocation_delay_ms", TimeValue.timeValueNanos(delay).millis()); | ||
builder.field("remaining_delay", TimeValue.timeValueNanos(remainingDelayNanos)); | ||
builder.field("remaining_delay_ms", TimeValue.timeValueNanos(remainingDelayNanos).millis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why we need both allocation_delay
and remaining_delay
. Don't they both convey the time remaining before shard allocation kicks in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocation delay is the configured time before the delayed shard is rerouted (by default, 1 minute unless configured on the per-index level), remaining delay is the time left, so for the example output:
"allocation_delay" : "59.9s",
"allocation_delay_ms" : 59910,
"remaining_delay" : "38.9s",
"remaining_delay_ms" : 38991,
Only remaining_delay
will be counting down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I'm confused is, allocation_delay
is set from a call to unassignedInfo#getLastComputedLeftDelayNanos()
, and in UnassignedInfo#updateDelay
, there is this code:
public long updateDelay(long nanoTimeNow, Settings settings, Settings indexSettings) {
final long newComputedLeftDelayNanos = getRemainingDelay(nanoTimeNow, settings, indexSettings);
lastComputedLeftDelayNanos = newComputedLeftDelayNanos;
return newComputedLeftDelayNanos;
}
so it seems the lastComputedLeftDelayNanos
gets updated to the remaining delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is only going to get updated when the master node changes (there is a lot of weird things with this regarding nanos and millis and serialization)
Just left a couple comments, otherwise, LGTM |
NodesStatsResponse resp = client().admin().cluster().prepareNodesStats().get(); | ||
assertThat(resp.getNodes().length, equalTo(3)); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, instead of the above to wait for the nodes to come up, could something like this be done?
client().admin().cluster().health(Requests.clusterHealthRequest().waitForNodes(Integer.toString(3))).actionGet();
When a shard is delayed, we now show output like:
Where the new addition is this section:
Which shows the configured delay as well as the remaining delay until
the shard can be considered "assignable". This data is only shown if the
shard is unassigned.
Relates to #17372