Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better indication of the cluster state - is it writable (has quorum) #4755

Closed
rore opened this issue Jan 16, 2014 · 10 comments
Closed
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement

Comments

@rore
Copy link

rore commented Jan 16, 2014

When using Elasticsearch as a critical application system you need to be able to monitor the cluster health.
The current health indication is not enough since the "yellow" state doesn't have a singular meaning.
It means that all primaries are up but some replica shards are not allocated. But this, on an operational level, has two possible implications -
It can be that some shards are missing but there's a quorum to all indexes so the cluster is "writable".
It can also mean that some indexes don't have a quorum in which case those writes will fail.

This is a huge difference on an operational level.

We need a way to know (and monitor) the real state of the cluster - knowing not only that all primaries are up, but also if there's a quorum.

A possible way can be to add another color, for instance -
yellow - some replicas are missing but you have a quorum
orange - primary up but no quorum

Or use another indicator altogether.

@dakrone
Copy link
Member

dakrone commented Apr 10, 2015

I think we can add something like this to the indices level of the cluster health API, so something like:

{
  "active_primary_shards": 1,
  "active_shards": 1,
  "cluster_name": "elasticsearch",
  "indices": {
    "test": {
      "active_primary_shards": 1,
      "active_shards": 1,
      "initializing_shards": 0,
      "number_of_replicas": 1,
      "number_of_shards": 1,
      "relocating_shards": 0,
      "status": "yellow",
      "unassigned_shards": 1,
      "has_quorum": true
    }
  },
  "initializing_shards": 0,
  "number_of_data_nodes": 1,
  "number_of_nodes": 1,
  "number_of_pending_tasks": 0,
  "relocating_shards": 0,
  "status": "yellow",
  "timed_out": false,
  "unassigned_shards": 1
}

(see the has_quorum key)

@dakrone dakrone added help wanted adoptme and removed discuss labels Apr 10, 2015
@rore
Copy link
Author

rore commented Apr 11, 2015

Having it per index is still problematic to monitor. The idea here was that the global cluster health indicator will reflect the state. Like today, if you have only one index that isn't fully allocated the cluster health indicates it. But there's a big difference between a "yellow" that is caused by some not allocated replicates but still allows writes, and a "yellow" when there's no quorum and writes will fail. The cluster health currently doesn't differentiate between those states.

@bleskes
Copy link
Contributor

bleskes commented Apr 12, 2015

To me, yellow means “cluster is fully functional albeit not conforming to the required replication factor”. Red means that some operations can not be performed. With this definition in mind, I think we should signal an shard (and thus index & cluster) as red if one can not index into it due to the write_consistency settings (which also be all , and not quorum)

On 11 Apr 2015, at 21:16, Rotem Hermon notifications@github.com wrote:

Having it per index is still problematic to monitor. The idea here was that the global cluster health indicator will reflect the state. Like today, if you have only one index that isn't fully allocated the cluster health indicates it. But there's a big difference between a "yellow" that is caused by some not allocated replicates but still allows writes, and a "yellow" when there's no quorum and writes will fail. The cluster health currently doesn't differentiate between those states.


Reply to this email directly or view it on GitHub.

@javanna
Copy link
Member

javanna commented Apr 15, 2015

I tend to agree with @rore here. We could improve the colors that we use depending on the cluster level action.write_consistency setting (note that it can be overridden per request though). I am not a big fan of returning red for an index that would be yellow but cannot be written into. Maybe a new color would help. I don't think this is ready to be an adoptme though, given that we are still discussing it? Maybe we should mark for discussion instead? :)

@jpountz
Copy link
Contributor

jpountz commented Nov 13, 2015

We just discussed this in Fixit Friday. Having indices that don't have a quorum reported RED or YELLOW seems problematic, so it appears we would either need to:

  • add a new color (ORANGE?) (@javanna's suggestion)
  • add a new flag (has_quorum?) on the index and cluster levels (@dakrone's suggestion)

@dakrone
Copy link
Member

dakrone commented Sep 27, 2016

@abeyad I think this is no longer applicable with the quorum changes you added at index creation?

@abeyad
Copy link

abeyad commented Sep 27, 2016

The default as of 5.0 is to only require the primary shard to be active before indexing, which is the same as != RED. The only time we can't in the YELLOW cluster state is if wait_for_active_shards has been explicitly changed to a different value. In this situation, someone can just pass wait_for_active_shards as a request parameter to the indexing operation, and the indexing operation will wait for the requisite number of shards to be active before proceeding with the write (or it will timeout). So I don't believe there is anything more to do for this.

@dakrone
Copy link
Member

dakrone commented Sep 27, 2016

Great! I'm going to close this then, we can always re-open if needed

@dakrone dakrone closed this as completed Sep 27, 2016
@javanna
Copy link
Member

javanna commented Sep 27, 2016

This issue is about getting info through cluster health about whether indexing would pass our "consistency" checks. We have now renamed write consistency to wait for active shards and made things much better, but do we have that piece of info now in cluster health? Or maybe we don't need to add it anymore? Sorry if I misunderstood or I am missing anything ;)

@abeyad
Copy link

abeyad commented Sep 27, 2016

@javanna in 5.0, by default, if the cluster health is YELLOW, it means the "consistency" check will pass for write operations. If a user wants to increase the number of active shards to wait on before indexing proceeds, they can do so by setting the wait_for_active_shards parameter on the indexing operation, so at that point, they are in control of the value they give. If a user wants to make sure that the value they give for the indexing operation would proceed, they can always do a health check first e.g. /_cluster/health/{index_name}?wait_for_active_shards={n}. Not sure if that answers your question.. feel free to ask again if I wasn't clear :)

@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement
Projects
None yet
Development

No branches or pull requests

7 participants