Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics that shows live and unrachable nodes #10102

Closed
amnonh opened this issue Feb 18, 2022 · 1 comment
Closed

Add metrics that shows live and unrachable nodes #10102

amnonh opened this issue Feb 18, 2022 · 1 comment
Milestone

Comments

@amnonh
Copy link
Contributor

amnonh commented Feb 18, 2022

It will be useful to be able to tell what is the status that each node sees, for example, in a split-brain situation
each node will only see part of the cluster nodes, even thogh the monitoring for example will be able to see all the nodes.

It could be expensive to report each of the possible node-to-node connetion as it's quadratice in nature.
Instead, each node would report only the number of live and unreachable nodes.
This can be reported once per node (not per shard) and can be added easility to the gossiper.

amnonh added a commit to amnonh/scylla that referenced this issue Feb 20, 2022
this patch adds two gauges:
scylla_gossip_live - how many live nodes the gossiper sees
scylla_gossip_unreachable - how many nodes the gossiper tries to connect
to but cannot.

Both metrics are reported once per node (i.e., per node, not per shard) it
gives visibility to how a specific node sees the cluster.

For example, a split-brain 6 nodes cluster (3 and 3). Each node would
report that it sees 2 nodes, but the monitoring system would see that
there are, in fact, 6 nodes.

Example of two nodes cluster, both running:
``
scylla_gossip_live{shard="0"} 1.000000
scylla_gossip_unreachable{shard="0"} 0.000000
``

Example of two nodes cluster, one is down:
``
scylla_gossip_live{shard="0"} 0.000000
scylla_gossip_unreachable{shard="0"} 1.000000
``

Fixes scylladb#10102

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
@avikivity
Copy link
Member

Not a bug, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants