New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide better indication of the cluster state - is it writable (has quorum) #4755
Comments
I think we can add something like this to the indices level of the cluster health API, so something like: {
"active_primary_shards": 1,
"active_shards": 1,
"cluster_name": "elasticsearch",
"indices": {
"test": {
"active_primary_shards": 1,
"active_shards": 1,
"initializing_shards": 0,
"number_of_replicas": 1,
"number_of_shards": 1,
"relocating_shards": 0,
"status": "yellow",
"unassigned_shards": 1,
"has_quorum": true
}
},
"initializing_shards": 0,
"number_of_data_nodes": 1,
"number_of_nodes": 1,
"number_of_pending_tasks": 0,
"relocating_shards": 0,
"status": "yellow",
"timed_out": false,
"unassigned_shards": 1
} (see the |
Having it per index is still problematic to monitor. The idea here was that the global cluster health indicator will reflect the state. Like today, if you have only one index that isn't fully allocated the cluster health indicates it. But there's a big difference between a "yellow" that is caused by some not allocated replicates but still allows writes, and a "yellow" when there's no quorum and writes will fail. The cluster health currently doesn't differentiate between those states. |
To me, yellow means “cluster is fully functional albeit not conforming to the required replication factor”. Red means that some operations can not be performed. With this definition in mind, I think we should signal an shard (and thus index & cluster) as red if one can not index into it due to the
|
I tend to agree with @rore here. We could improve the colors that we use depending on the cluster level |
@abeyad I think this is no longer applicable with the quorum changes you added at index creation? |
The default as of 5.0 is to only require the primary shard to be active before indexing, which is the same as |
Great! I'm going to close this then, we can always re-open if needed |
This issue is about getting info through cluster health about whether indexing would pass our "consistency" checks. We have now renamed write consistency to wait for active shards and made things much better, but do we have that piece of info now in cluster health? Or maybe we don't need to add it anymore? Sorry if I misunderstood or I am missing anything ;) |
@javanna in 5.0, by default, if the cluster health is YELLOW, it means the "consistency" check will pass for write operations. If a user wants to increase the number of active shards to wait on before indexing proceeds, they can do so by setting the |
When using Elasticsearch as a critical application system you need to be able to monitor the cluster health.
The current health indication is not enough since the "yellow" state doesn't have a singular meaning.
It means that all primaries are up but some replica shards are not allocated. But this, on an operational level, has two possible implications -
It can be that some shards are missing but there's a quorum to all indexes so the cluster is "writable".
It can also mean that some indexes don't have a quorum in which case those writes will fail.
This is a huge difference on an operational level.
We need a way to know (and monitor) the real state of the cluster - knowing not only that all primaries are up, but also if there's a quorum.
A possible way can be to add another color, for instance -
yellow - some replicas are missing but you have a quorum
orange - primary up but no quorum
Or use another indicator altogether.
The text was updated successfully, but these errors were encountered: