Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track UJ node progress #2076

Closed
d-helios opened this issue Oct 10, 2023 · 2 comments · Fixed by #2079
Closed

Track UJ node progress #2076

d-helios opened this issue Oct 10, 2023 · 2 comments · Fixed by #2079

Comments

@d-helios
Copy link

Motivation

Monitoring should be able to identify nodes that are too slow to join the cluster, or perhaps simply stuck.

Possible implementation

Once new node is created and scylla is started, monitoring stack should start scrapping metrics from scylla and operating system.
If new node in UJ state.

  • verify network traffic - scylla_node_network_receive_bytes_total
  • verify disk space utilisation - scylla_node_filesystem_total_avail_bytes

if diff for the last X minutes lower then Y over the Z minutes, trigger an alert.

@amnonh
Copy link
Collaborator

amnonh commented Oct 10, 2023

@d-helios luckily, I've added a metrics for a node state, scylla_node_operation_mode so we can check if a node is in joining mode for more than X minutes

@amnonh
Copy link
Collaborator

amnonh commented Oct 11, 2023

@d-helios note that this is monitoring only and if it's cloud related, there should be a cloud issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants