Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show configured and remaining delay for an unassigned shard. #17515

Merged
merged 2 commits into from Apr 7, 2016

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Apr 4, 2016

When a shard is delayed, we now show output like:

{
  "shard" : {
    "index" : "i",
    "index_uuid" : "QzoKda9aQCG_hCaZQ18GEg",
    "id" : 3,
    "primary" : false
  },
  "assigned" : false,
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2016-04-04T16:44:47.520Z",
    "details" : "node_left[HyRLmMLxR5m_f58RKURApQ]"
  },
  "allocation_delay" : "59.9s",
  "allocation_delay_ms" : 59910,
  "remaining_delay" : "38.9s",
  "remaining_delay_ms" : 38991,
  "nodes" : {
    "jKiyQcWFTkyp3htyyjxoCw" : {
      "node_name" : "Landslide",
      "node_attributes" : { },
      "final_decision" : "YES",
      "weight" : 1.0,
      "decisions" : [ ]
    },
    "9bzF0SgoQh-G0F0sRW_qew" : {
      "node_name" : "Caretaker",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 2.0,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [9bzF0SgoQh-G0F0sRW_qew] on which it already exists"
      } ]
    }
  }
}

Where the new addition is this section:

  "allocation_delay" : "59.9s",
  "allocation_delay_ms" : 59910,
  "remaining_delay" : "38.9s",
  "remaining_delay_ms" : 38991,

Which shows the configured delay as well as the remaining delay until
the shard can be considered "assignable". This data is only shown if the
shard is unassigned.

Relates to #17372

When a shard is delayed, we now show output like:

```json
{
  "shard" : {
    "index" : "i",
    "index_uuid" : "QzoKda9aQCG_hCaZQ18GEg",
    "id" : 3,
    "primary" : false
  },
  "assigned" : false,
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2016-04-04T16:44:47.520Z",
    "details" : "node_left[HyRLmMLxR5m_f58RKURApQ]"
  },
  "allocation_delay" : "59.9s",
  "allocation_delay_ms" : 59910,
  "remaining_delay" : "38.9s",
  "remaining_delay_ms" : 38991,
  "nodes" : {
    "jKiyQcWFTkyp3htyyjxoCw" : {
      "node_name" : "Landslide",
      "node_attributes" : { },
      "final_decision" : "YES",
      "weight" : 1.0,
      "decisions" : [ ]
    },
    "9bzF0SgoQh-G0F0sRW_qew" : {
      "node_name" : "Caretaker",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 2.0,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [9bzF0SgoQh-G0F0sRW_qew] on which it already exists"
      } ]
    }
  }
}
```

Where the new addition is this section:

```
  "allocation_delay" : "59.9s",
  "allocation_delay_ms" : 59910,
  "remaining_delay" : "38.9s",
  "remaining_delay_ms" : 38991,
```

Which shows the configured delay as well as the remaining delay until
the shard can be considered "assignable". This data is only shown if the
shard is unassigned.

Relates to elastic#17372
builder.field("allocation_delay", TimeValue.timeValueNanos(delay));
builder.field("allocation_delay_ms", TimeValue.timeValueNanos(delay).millis());
builder.field("remaining_delay", TimeValue.timeValueNanos(remainingDelayNanos));
builder.field("remaining_delay_ms", TimeValue.timeValueNanos(remainingDelayNanos).millis());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why we need both allocation_delay and remaining_delay. Don't they both convey the time remaining before shard allocation kicks in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocation delay is the configured time before the delayed shard is rerouted (by default, 1 minute unless configured on the per-index level), remaining delay is the time left, so for the example output:

  "allocation_delay" : "59.9s",
  "allocation_delay_ms" : 59910,
  "remaining_delay" : "38.9s",
  "remaining_delay_ms" : 38991,

Only remaining_delay will be counting down

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm confused is, allocation_delay is set from a call to unassignedInfo#getLastComputedLeftDelayNanos(), and in UnassignedInfo#updateDelay, there is this code:

public long updateDelay(long nanoTimeNow, Settings settings, Settings indexSettings) {
        final long newComputedLeftDelayNanos = getRemainingDelay(nanoTimeNow, settings, indexSettings);
        lastComputedLeftDelayNanos = newComputedLeftDelayNanos;
        return newComputedLeftDelayNanos;
 }

so it seems the lastComputedLeftDelayNanos gets updated to the remaining delay?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is only going to get updated when the master node changes (there is a lot of weird things with this regarding nanos and millis and serialization)

@abeyad
Copy link

abeyad commented Apr 7, 2016

Just left a couple comments, otherwise, LGTM

NodesStatsResponse resp = client().admin().cluster().prepareNodesStats().get();
assertThat(resp.getNodes().length, equalTo(3));
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, instead of the above to wait for the nodes to come up, could something like this be done?

client().admin().cluster().health(Requests.clusterHealthRequest().waitForNodes(Integer.toString(3))).actionGet();

@dakrone dakrone merged commit 199c725 into elastic:master Apr 7, 2016
@clintongormley clintongormley added :Data Management/Stats Statistics tracking and retrieval APIs and removed :Allocation labels May 2, 2016
@dakrone dakrone deleted the allocation-explain-show-delay branch May 13, 2016 16:44
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 15, 2016
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement v5.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants