Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A cluster with no data nodes or indices reports YELLOW health when an index is created. #41073

Closed
DaveCTurner opened this issue Apr 10, 2019 · 16 comments · Fixed by #43284
Closed
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) good first issue low hanging fruit help wanted adoptme

Comments

@DaveCTurner
Copy link
Contributor

If you start up a new cluster without any data nodes then it will report green health, because there are no unassigned shards. If you subsequently create an index it will report yellow health because there are unassigned newly-created primaries. However we never attempt to assign any shards if there are no data nodes so the health will continue to report yellow and will never move to red:

if (allocation.routingNodes().size() == 0) {
/* with no nodes this is pointless */
return;
}

I think in this case we should move the unassigned status for any new shards from AllocationStatus.NO_ATTEMPT to AllocationStatus.DECIDERS_NO before bailing out, which would result in red health.

@DaveCTurner DaveCTurner added >bug good first issue low hanging fruit help wanted adoptme :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Apr 10, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Gaurav614
Copy link
Contributor

https://discuss.elastic.co/t/primary-shards-unassigned-but-still-cluster-state-health-yellow/176189/12
This issue was first reported by me.

@DaveCTurner
Copy link
Contributor Author

Hi @Gaurav614, thanks for reporting this. If you'd like to open a PR to propose a fix then that'd be very welcome.

@Gaurav614
Copy link
Contributor

Will work upon it soon.

@atris
Copy link

atris commented Apr 10, 2019

I wonder if we should let the status quo prevail here. RED cluster status is typically interpreted as a critical state by users, including data loss scenarios, so we should avoid unnecessary transition into that state.

Note that in this case, ES does not even attempt to allocate shards. So it is not really an allocation failure, more of a strange scenario (and likely a user error). This raises the question of whether we should go into the RED state for a customer induced scenario especially since the cluster is still very much functional.

@Gaurav614
Copy link
Contributor

@atris This scenario can occur in rare case if all the data nodes went down and the user dont have any info about it. So when he tries to create an index he will be seeing the index health as Yellow.
Secondly as per ES definition the Cluster State is RED when primary shards are unassigned . But this scenario will violate that definition.

@atris
Copy link

atris commented Apr 10, 2019

@Gaurav614 I doubt if that would be the case, since IIRC, if an index creation attempt happens when there are no data nodes present and none of the data nodes was safely shut down, we do go to RED state (not sure though, would be good to confirm)

@Gaurav614
Copy link
Contributor

Gaurav614 commented Apr 10, 2019

@atris

ES does not even attempt to allocate shards.

This could be a another issue as well. Since ES is deflecting from its behavior of index.allocation.max_retries. as ES should try attempting . But that's different case

@Gaurav614 I doubt if that would be the case, since IIRC, if an index creation attempt happens when there are no data nodes present and none of the data nodes was safely shut down, we do go to RED state (not sure though, would be good to confirm)

The Cluster Health will be yellow (_cluster/health) if there is no red index previously in the data node and the yellow status will be due to newly created index. This rare scenario can even occur when the first timer tries to make a ES cluster and its data node when down even before creating any index. But as user is first timer and he might be not aware that his data node went down so he will try to create the index and which will result in Cluster State to be yellow as The IndexHealth is yellow for that newly created index. this problem will be aggravated more if went on creating the new indices and each Index health will be Yellow and which will eventually result in Cluster state to be yellow. All these are rare and corner cases . Wont impact the business of any organization as such but will be very helpful in creating elastic search more clean and robust

@DaveCTurner
Copy link
Contributor Author

RED cluster status is typically interpreted as a critical state by users, including data loss scenarios, so we should avoid unnecessary transition into that state.

On the contrary I think it's appropriate to treat this situation as critical: you've created an index but you won't be able to write to it, which is what RED health indicates.

it is not really an allocation failure, more of a strange scenario (and likely a user error)

On the contrary I think it's appropriate to consider this to be an allocation failure: it doesn't make sense to distinguish this case from the case where there are data nodes present but none is suitable for allocation, perhaps due to an allocation filter.

if an index creation attempt happens when there are no data nodes present and none of the data nodes was safely shut down, we do go to RED state

This is not the case, although it is exactly what I expected too until @Gaurav614 raised this issue.

@atris
Copy link

atris commented Apr 10, 2019

it is not really an allocation failure, more of a strange scenario (and likely a user error)

On the contrary I think it's appropriate to consider this to be an allocation failure: it doesn't make sense to distinguish this case from the case where there are data nodes present but none is suitable for allocation, perhaps due to an allocation filter.

I would consider these two as different cases since the lack of data nodes may be a user triggered scenario. However, the line is too thin to attempt to disambiguate the behaviour for the two cases.

if an index creation attempt happens when there are no data nodes present and none of the data nodes was safely shut down, we do go to RED state

This is not the case, although it is exactly what I expected too until @Gaurav614 raised this issue.

Ah, ok. That changes my opinion then, since we definitely run the risk of having dead data nodes without the user being aware, as @Gaurav614 highlighted upstream.

In all, +1 from me.

@vigyasharma
Copy link
Contributor

+1. Also makes sense for cases where master nodes came up but data nodes failed, and users start indexing data assuming cluster is up and healthy. With no former indices to go red, it is misleading to see the new index yellow.

@sawyna
Copy link

sawyna commented May 14, 2019

If no-one is working on this at the moment, I'd like to take this up as a first issue to get started with es codebase.

@DaveCTurner
Copy link
Contributor Author

@Gaurav614 are you still planning to work on this?

@Gaurav614
Copy link
Contributor

Gaurav614 commented May 14, 2019

@DaveCTurner I have requested for the addition to CLA.

@Gaurav614
Copy link
Contributor

@DaveCTurner Hey. I have requested Baird Garrett for addition to CLA. But didnt received any response from him. Is there anything you can do from your end?

@DaveCTurner
Copy link
Contributor Author

Signing the CLA is an automatic process that doesn't need the involvement of Baird or the rest of the legal team. You should sign up here: https://www.elastic.co/contributor-agreement

Gaurav614 added a commit to Gaurav614/elasticsearch that referenced this issue Jun 17, 2019
Addition of test case that creates the scenario
when there are no data nodes in Cluster and user tries for index Creation.
Changing the status of primary shards that are unassigned to AllocationStatus.Deciders_NO when there are no data nodes helps in solving this issue
Gaurav614 added a commit to Gaurav614/elasticsearch that referenced this issue Jun 17, 2019
Addition of test case that creates the scenario
when there are no data nodes in Cluster and user tries for index Creation.
Changing the status of primary shards that are unassigned to AllocationStatus.Deciders_NO when there are no data nodes helps in solving this issue
DaveCTurner pushed a commit that referenced this issue Sep 30, 2019
Today if you create an index in a cluster without any data nodes then it will
report yellow health because it never attempts to assign any shards if there
are no data nodes, so the new shards remain at `AllocationStatus.NO_ATTEMPT`.
This commit moves the new primaries to `AllocationStatus.DECIDERS_NO` in this
situation, causing the cluster health to move to red.

Fixes #41073
DaveCTurner pushed a commit that referenced this issue Sep 30, 2019
Today if you create an index in a cluster without any data nodes then it will
report yellow health because it never attempts to assign any shards if there
are no data nodes, so the new shards remain at `AllocationStatus.NO_ATTEMPT`.
This commit moves the new primaries to `AllocationStatus.DECIDERS_NO` in this
situation, causing the cluster health to move to red.

Fixes #41073
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) good first issue low hanging fruit help wanted adoptme
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants