Index creation does not cause the cluster health to go RED #18737

abeyad · 2016-06-04T02:57:09Z

Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.

Relates #9126

Closes #9106

abeyad · 2016-06-04T03:00:47Z

@bleskes @ywelsch FYI, most of the PR is tests.

djschny · 2016-06-05T13:21:29Z

How will this affect client programmers. For example after creating an index, as a developer it was common to code in logic to check the cluster health and make sure it is not red prior to starting to index data. With this change it looks like there might be a false state of affairs as indexing could start even though the primaries have not been fully allocated yet, and hence indexing errors.

If I'm misinterpreting the implementation of the change, then my apologies. If not then I believe we need to make sure we have a wait_for_completion style option on the index creation request (I thought we had issue for this, by searching is failing me).

abeyad · 2016-06-05T14:05:51Z

@djschny you are correct, the work is split into two parts that will go into a feature branch before going into master. The second part is to wait until at least the primaries are initialized before returning from the index creation call, unless a primary shard allocation actually failed. See the comment here: #9126 (comment)

djschny · 2016-06-05T14:09:31Z

Cool thanks @abeyad did not realize this was part of larger picture goal. All 👍 as my previous comments are addressed in the parent issue you referenced. Thanks.

ywelsch · 2016-06-08T09:29:47Z

core/src/main/java/org/elasticsearch/cluster/health/ClusterShardHealth.java

@@ -59,6 +62,9 @@ public ClusterShardHealth(final int shardId, final IndexShardRoutingTable shardR
            } else if (shardRouting.unassigned()) {
                computeUnassignedShards++;
            }
+            if (shardRouting.primary()) {
+                primaryRouting = shardRouting;
+            }


Simpler just to write ShardRouting primaryRouting = shardRoutingTable.primaryShard(); which is guaranteed non-null.

Let's also remove computePrimaryActive and directly write primaryRouting.active() instead.

ywelsch · 2016-06-08T10:13:59Z

Left some comments. As @bleskes noted, we should ensure that cluster health goes red if we cannot allocate newly created/recovered index to any node. I would suggest adding a boolean to UnassignedInfo that captures the information if shard could not be assigned on first (or subsequent) tries due to allocation deciders saying NO.

bleskes · 2016-06-09T14:57:09Z

I think it will be clearer and easier to debug if we store the last allocation decision on the UnassignedInfo, instead of a boolean.

ywelsch · 2016-06-15T10:25:15Z

core/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

@@ -116,6 +117,7 @@
    private final String message;
    private final Throwable failure;
    private final int failedAllocations;
+    private final Optional<Decision> lastAllocationDecision; // the last allocation decision take for this shard


We are only interested in Decision.Type, so let's only store that one in UnassignedInfo. Putting the whole Decision object into the cluster state (with explanations and all) seems wasteful to me.

ywelsch · 2016-06-16T17:59:02Z

core/src/main/java/org/elasticsearch/cluster/health/ClusterShardHealth.java

+     * NB: this method should *not* be called on active shards nor on non-primary shards.
+     */
+    public static ClusterHealthStatus getInactivePrimaryHealth(final ShardRouting shardRouting, final IndexMetaData indexMetaData) {
+        assert shardRouting.primary() : "cannot invoke on a replica shard: " + shardRouting.shardId() + "[R]]";


for simplicity, just put shardRouting there instead of shardRouting.shardId() + "[R]]"

abeyad · 2016-06-17T14:34:21Z

@ywelsch @bleskes aee01c4 addresses the latest review comments

ywelsch · 2016-06-17T15:19:56Z

core/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

+                    return NO_ATTEMPT;
+                default:
+                    throw new IllegalArgumentException("Unknown AllocationStatus value [" + v + "]");
+            }


for toXContent and toString(), we can also provide a method here that gives a nice description string .

do we have a precedent for this with other enums? it seems like in toXContent for the cluster state, we just use the enum's string value, e.g. in UnassignedInfo.Reason

ywelsch · 2016-06-17T15:31:29Z

Left a few very minor comments. LGTM otherwise. Thanks @abeyad

Previously, index creation would momentarily cause the cluster health to go RED, because the primaries were still being assigned and activated. This commit ensures that when an index is created or an index is being recovered during cluster recovery and it does not have any active allocation ids, then the cluster health status will not go RED, but instead be YELLOW. Relates elastic#9126

If the allocation decision for a primary shard was NO, this should cause the cluster health for the shard to go RED, even if the shard belongs to a newly created index or is part of cluster recovery. Relates elastic#9126

abeyad · 2016-06-20T14:55:18Z

@ywelsch thank you for your review!

abeyad added >enhancement resiliency :Cluster v5.0.0-alpha4 labels Jun 4, 2016

abeyad force-pushed the index-creation-cluster-health branch from 623a4c7 to e982a0a Compare June 4, 2016 03:05

ywelsch reviewed Jun 8, 2016
View reviewed changes

abeyad closed this Jun 8, 2016

abeyad reopened this Jun 8, 2016

abeyad force-pushed the index-creation-cluster-health branch from 62f81b5 to 1a480ae Compare June 15, 2016 05:01

ywelsch reviewed Jun 15, 2016
View reviewed changes

ywelsch reviewed Jun 16, 2016
View reviewed changes

ywelsch reviewed Jun 17, 2016
View reviewed changes

abeyad force-pushed the index-creation-cluster-health branch from 8de6b41 to a6ee9ff Compare June 20, 2016 14:48

Ali Beyad added 3 commits June 20, 2016 10:53

Blocked allocations on primary causes RED health

ef715f7

If the allocation decision for a primary shard was NO, this should cause the cluster health for the shard to go RED, even if the shard belongs to a newly created index or is part of cluster recovery. Relates elastic#9126

Fix line length formatting for ClusterStateHealthTests

f61f2e7

abeyad force-pushed the index-creation-cluster-health branch from a6ee9ff to f61f2e7 Compare June 20, 2016 14:54

abeyad merged commit 1c209d9 into elastic:feature/friendly-index-creation Jun 20, 2016

clintongormley mentioned this pull request Jun 27, 2016

Index creation causes cluster health to turn red momentarily #9106

Closed

abeyad mentioned this pull request Jul 15, 2016

Makes index creation more friendly #19450

Merged

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

pickypg mentioned this pull request Oct 26, 2018

[Monitoring] Cluster Status alert triggers on transient yellow status #34814

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index creation does not cause the cluster health to go RED #18737

Index creation does not cause the cluster health to go RED #18737

abeyad commented Jun 4, 2016 •

edited by clintongormley

Loading

abeyad commented Jun 4, 2016

djschny commented Jun 5, 2016

abeyad commented Jun 5, 2016

djschny commented Jun 5, 2016

ywelsch Jun 8, 2016

ywelsch commented Jun 8, 2016

bleskes commented Jun 9, 2016

ywelsch Jun 15, 2016

ywelsch Jun 16, 2016

abeyad commented Jun 17, 2016

ywelsch Jun 17, 2016

abeyad Jun 17, 2016

ywelsch commented Jun 17, 2016

abeyad commented Jun 20, 2016

Index creation does not cause the cluster health to go RED #18737

Index creation does not cause the cluster health to go RED #18737

Conversation

abeyad commented Jun 4, 2016 • edited by clintongormley Loading

abeyad commented Jun 4, 2016

djschny commented Jun 5, 2016

abeyad commented Jun 5, 2016

djschny commented Jun 5, 2016

ywelsch Jun 8, 2016

Choose a reason for hiding this comment

ywelsch commented Jun 8, 2016

bleskes commented Jun 9, 2016

ywelsch Jun 15, 2016

Choose a reason for hiding this comment

ywelsch Jun 16, 2016

Choose a reason for hiding this comment

abeyad commented Jun 17, 2016

ywelsch Jun 17, 2016

Choose a reason for hiding this comment

abeyad Jun 17, 2016

Choose a reason for hiding this comment

ywelsch commented Jun 17, 2016

abeyad commented Jun 20, 2016

abeyad commented Jun 4, 2016 •

edited by clintongormley

Loading