Skip to content

Commit

Permalink
add mapi_machinehealthcheck_remediation_success_total metric
Browse files Browse the repository at this point in the history
This change add the metric as well as a helper function to increase the
count, and update the metric doc to reflect the new metric.
  • Loading branch information
elmiko committed Nov 17, 2020
1 parent 2b1ae67 commit 08917b2
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 2 deletions.
9 changes: 8 additions & 1 deletion docs/dev/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,16 +115,23 @@ mapi_instance_delete_failed{name=machine-3,namespace=openshift-machine-api,reaso
## Metrics about MachineHealthCheck resources

When using MachineHealthChecks, metrics are available from the `machine-api-controllers` Pod on the
default metrics port(`8083`) for the `machine-healthcheck-controller` container.
default metrics port(`8083`) for the `machine-healthcheck-controller` container.

The `mapi_machinehealthcheck_nodes_covered` metric describes the number of Nodes that are currently
being monitored by `machine-healthcheck-controller`. The `name` label in this metric refers to the
name of the MachineHealthCheck that is being reported.

The `mapi_machinehealthcheck_remediation_success_total` metric gives a total count of the successful
remediation performed by a MachineHealthCheck. The `name` label in this metric refers to the name
of the MachineHealthCheck that is being reported.

**Sample metrics**
```
# HELP mapi_machinehealthcheck_nodes_covered Number of nodes covered by MachineHealthChecks
# TYPE mapi_machinehealthcheck_nodes_covered gauge
mapi_machinehealthcheck_nodes_covered{name="machine-api-termination-handler"} 0
mapi_machinehealthcheck_nodes_covered{name="mhc-1"} 1
# HELP mapi_machinehealthcheck_remediation_success_total Number of successful remediations performed by MachineHealthChecks
# TYPE mapi_machinehealthcheck_remediation_success_total counter
mapi_machinehealthcheck_remediation_success_total{name="mhc-1"} 1
```
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,8 @@ func (t *target) remediate(r *ReconcileMachineHealthCheck) error {
"Machine %v has been remediated by requesting to delete Machine object",
t.string(),
)
metrics.RegisterMachineHealthCheckRemediationSuccess(t.MHC.Name)

return nil
}

Expand Down
17 changes: 16 additions & 1 deletion pkg/metrics/machinehealthcheck.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,21 @@ var (
Help: "Number of nodes covered by MachineHealthChecks",
}, []string{"name"},
)

// MachineHealthCheckRemediationSuccessTotal is a Prometheus metric, which reports the number of successful remediations by MachineHealthChecks
MachineHealthCheckRemediationSuccessTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "mapi_machinehealthcheck_remediation_success_total",
Help: "Number of successful remediations performed by MachineHealthChecks",
}, []string{"name"},
)
)

func InitializeMachineHealthCheckMetrics() {
metrics.Registry.MustRegister(MachineHealthCheckNodesCovered)
metrics.Registry.MustRegister(
MachineHealthCheckNodesCovered,
MachineHealthCheckRemediationSuccessTotal,
)
}

func DeleteMachineHealthCheckNodesCovered(name string) {
Expand All @@ -45,3 +56,7 @@ func DeleteMachineHealthCheckNodesCovered(name string) {
func RegisterMachineHealthCheckNodesCovered(name string, count int) {
MachineHealthCheckNodesCovered.With(prometheus.Labels{"name": name}).Set(float64(count))
}

func RegisterMachineHealthCheckRemediationSuccess(name string) {
MachineHealthCheckRemediationSuccessTotal.With(prometheus.Labels{"name": name}).Inc()
}

0 comments on commit 08917b2

Please sign in to comment.