[Feature] List down all the metrics necessary for maintaining multinode #340
Labels
area/monitoring
Monitoring (including availability monitoring and alerting) related
kind/enhancement
Enhancement, improvement, extension
release/ga
Planned for GA(General Availability) release of the Feature
status/closed
Issue is closed (either delivered or triaged)
Milestone
Feature (What you would like to be added):
Create a collective list of all the metrics that is needed to maintain multi-node ETCD. Update the file with the list. This is a running document that will capture all the metrics that will be exposed through prometheus for mutinode ETCD.
Motivation (Why is this needed?):
Once etcd-druid starts managing multi-node etcd clusters, it would perform operations such as cluster scale-up, scale-down, recovery quorum losses, forced restorations, etc. Currently druid does not expose any metrics about its operations, and these metrics will become imperative for the multi-node story, especially for understanding and debugging druid behaviour during etcd cluster failures.
Approach/Hint to the implement solution (optional):
The text was updated successfully, but these errors were encountered: