Expose metrics for the rhmi installation CR #784

jjaferson · 2020-05-22T17:42:20Z

Description

Exposes two metrics for the RHMI installation CR

rhmi_status and rhmi_info

https://issues.redhat.com/browse/INTLY-6667

Verification steps

Install operator via OLM
Go to the rhmi-operator pod
In the terminal tab curl the metric endpoint to verify if metrics are present

curl 0.0.0.0:8383/metrics | grep rhmi_status{
curl 0.0.0.0:8383/metrics | grep rhmi_info{

Check if metrics have all the labels created and if their values match with installation CR

rhmi_info{installation_type="", master_url="", namespace="", operator_name="", use_cluster_storage=""}
rhmi_status{namespace="", last_error="", operator_name="", preflight_message="", preflight_status="", stage="", <product_name>-status="" ...}

Go to redhat-rhmi-middleware-monitoring-operator namespace
Access Prometheus and verify if rhmi_status and rhmi_info are available

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have added tests that prove my fix is effective or that my feature works
I have added a test case that will be used to verify my changes
Verified independently on a cluster by reviewer

coveralls · 2020-05-22T17:49:04Z

Coverage decreased (-0.3%) to 59.407% when pulling ccbefb7 on jjaferson:intly-6667 into 6c6ec8d on integr8ly:master.

jjaferson · 2020-05-22T18:22:49Z

/test images

davidffrench

Looks fantastic @jjaferson . Reviewed code and verified on cluster. One requested change on the addition to the CSV. this should be changed in deploy/role.yaml.

rhmi_status - Looks Good

rhmi_status_available - Looks Good

rhmi_info - Question

One question on the rhmi_info metric. You can see from the screenshot there are two metrics being exposed for this. Do you know why this is?

...lm-catalog/integreatly-operator/2.2.0/integreatly-operator.v2.2.0.clusterserviceversion.yaml

pkg/metrics/metrics.go

jjaferson · 2020-05-25T14:01:56Z

Looks fantastic @jjaferson . Reviewed code and verified on cluster. One requested change on the addition to the CSV. this should be changed in deploy/role.yaml.

rhmi_status - Looks Good

rhmi_status_available - Looks Good

rhmi_info - Question

One question on the rhmi_info metric. You can see from the screenshot there are two metrics being exposed for this. Do you know why this is?

I noticed that but I think it's something that happens by default as I saw the same metric for other CRDs like enmasse_addressplan_info and blackboxtarget_info with the same labels

enmasse_addressplan_info

blackboxtarget_info

david-martin · 2020-05-25T14:33:26Z

I noticed that but I think it's something that happens by default as I saw the same metric for other CRDs like enmasse_addressplan_info and blackboxtarget_info with the same labels

It looks like the metric is being scraped from 2 different places/endpoints. cr-metrics and http-metrics.
Is there a duplicate metric being exposed?

jjaferson · 2020-05-25T15:17:34Z

I noticed that but I think it's something that happens by default as I saw the same metric for other CRDs like enmasse_addressplan_info and blackboxtarget_info with the same labels

It looks like the metric is being scraped from 2 different places/endpoints. cr-metrics and http-metrics.
Is there a duplicate metric being exposed?

I was doing some investigation and I found out that there is a gauge metric being exposed on port 8686 of the operator

curl 0.0.0.0:8686/metrics
# HELP rhmi_info Information about the RHMI custom resource.
# TYPE rhmi_info gauge
rhmi_info{namespace="redhat-rhmi-operator",rhmi="rhmi"} 1

https://github.com/integr8ly/integreatly-operator/blob/master/cmd/manager/main.go#L148

Not sure where the metric is created

jjaferson · 2020-05-25T15:30:53Z

I noticed that but I think it's something that happens by default as I saw the same metric for other CRDs like enmasse_addressplan_info and blackboxtarget_info with the same labels

It looks like the metric is being scraped from 2 different places/endpoints. cr-metrics and http-metrics.
Is there a duplicate metric being exposed?

I was doing some investigation and I found out that there is a gauge metric being exposed on port 8686 of the operator
curl 0.0.0.0:8686/metrics
# HELP rhmi_info Information about the RHMI custom resource.
# TYPE rhmi_info gauge
rhmi_info{namespace="redhat-rhmi-operator",rhmi="rhmi"} 1
https://github.com/integr8ly/integreatly-operator/blob/master/cmd/manager/main.go#L148

Not sure where the metric is created

Metric is created by default by the operator-sdk

Custom resource specific metrics

By default operator will expose info metrics based on the number of the current instances of an operator’s custom resources in the cluster. It leverages kube-state-metrics as a library to generate those metrics. Metrics initialization lives in the cmd/manager/main.go file of the operator in the serveCRMetrics function. Its arguments are a custom resource’s group, version, and kind to generate the metrics. The metrics are served on 0.0.0.0:8686/metrics by default.

https://sdk.operatorframework.io/docs/golang/monitoring/prometheus/

david-martin · 2020-05-25T15:31:30Z

Not sure where the metric is created

Looks like a CR metric exposed by the operator sdk for any/all CRD instances that exist.

I think we need to change the name of the metric that we're in control of (not the sdk generated one)

jjaferson · 2020-05-25T15:32:57Z

Not sure where the metric is created

Looks like a CR metric exposed by the operator sdk for any/all CRD instances that exist.

I think we need to change the name of the metric that we're in control of (not the sdk generated one)

Yup, what about rhmi_installation_info ?

jjaferson · 2020-05-25T19:57:25Z

/test e2e

failed to create e2e pod

"failed to fetch Cluster: failed to fetch dependency of "Cluster": failed to generate asset "Platform Permissions Check": validate AWS credentials: mint credentials check: error simulating policy: Throttling: Rate exceeded\n\tstatus code: 400, request id: 557a3694-d525-4c45-a76b-e95ce8ed683e"

jjaferson · 2020-05-26T08:38:28Z

/retest

jjaferson · 2020-05-26T09:52:45Z

/retest

davidffrench · 2020-05-27T07:48:46Z

/lgtm
/approve

@jjaferson Great work. Please make sure to update the JIRA with the name change from rhmi_info to rhmi_spec

openshift-ci-robot · 2020-05-27T07:48:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidffrench

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [davidffrench]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jjaferson · 2020-05-27T08:35:02Z

/test test-cases-lint

Approved changes now, dismissing requested changes

david-martin · 2020-05-28T09:07:06Z

Unable to tag @openshift/openshift-team-monitoring,
so leaving a comment that these are the code changes that expose rhmi_status metric.
See openshift/cluster-monitoring-operator#795 for cluster-monitoring-operator PR to whitelist this metric.

openshift-ci-robot added the needs-rebase label May 22, 2020

jjaferson force-pushed the intly-6667 branch from c4e4f96 to 27545f4 Compare May 22, 2020 17:43

openshift-ci-robot removed the needs-rebase label May 22, 2020

jjaferson changed the title ~~expose prometheus metrics for the rhmi installation CR~~ Expose metrics for the rhmi installation CR May 22, 2020

jjaferson force-pushed the intly-6667 branch 2 times, most recently from 4599bc5 to aaa2bc0 Compare May 22, 2020 18:04

jjaferson force-pushed the intly-6667 branch 5 times, most recently from 77aaaf7 to 33e7b2f Compare May 25, 2020 13:30

davidffrench previously requested changes May 25, 2020

View reviewed changes

...lm-catalog/integreatly-operator/2.2.0/integreatly-operator.v2.2.0.clusterserviceversion.yaml Outdated Show resolved Hide resolved

pkg/metrics/metrics.go Show resolved Hide resolved

jjaferson force-pushed the intly-6667 branch from 33e7b2f to 54f7025 Compare May 25, 2020 14:21

jjaferson force-pushed the intly-6667 branch 2 times, most recently from 03eebaf to 85c5830 Compare May 25, 2020 17:41

expose prometheus metrics for the rhmi installation CR

5574a29

jjaferson force-pushed the intly-6667 branch from 85c5830 to 5574a29 Compare May 25, 2020 18:02

Add automated testcase

ccbefb7

openshift-ci-robot assigned davidffrench May 27, 2020

openshift-ci-robot added the lgtm label May 27, 2020

openshift-ci-robot added the approved label May 27, 2020

openshift-merge-robot merged commit 8f437be into integr8ly:master May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose metrics for the rhmi installation CR #784

Expose metrics for the rhmi installation CR #784

jjaferson commented May 22, 2020 •

edited

coveralls commented May 22, 2020 •

edited

jjaferson commented May 22, 2020

davidffrench left a comment •

edited

jjaferson commented May 25, 2020

rhmi_status - Looks Good

rhmi_status_available - Looks Good

rhmi_info - Question

david-martin commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 25, 2020

david-martin commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 26, 2020

jjaferson commented May 26, 2020

davidffrench commented May 27, 2020

openshift-ci-robot commented May 27, 2020

jjaferson commented May 27, 2020

david-martin commented May 28, 2020 •

edited

Expose metrics for the rhmi installation CR #784

Expose metrics for the rhmi installation CR #784

Conversation

jjaferson commented May 22, 2020 • edited

Description

Verification steps

Type of change

Checklist

coveralls commented May 22, 2020 • edited

jjaferson commented May 22, 2020

davidffrench left a comment • edited

Choose a reason for hiding this comment

rhmi_status - Looks Good

rhmi_status_available - Looks Good

rhmi_info - Question

jjaferson commented May 25, 2020

rhmi_status - Looks Good

rhmi_status_available - Looks Good

rhmi_info - Question

david-martin commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 25, 2020

david-martin commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 25, 2020

jjaferson commented May 26, 2020

jjaferson commented May 26, 2020

davidffrench commented May 27, 2020

openshift-ci-robot commented May 27, 2020

jjaferson commented May 27, 2020

david-martin commented May 28, 2020 • edited

jjaferson commented May 22, 2020 •

edited

coveralls commented May 22, 2020 •

edited

davidffrench left a comment •

edited

david-martin commented May 28, 2020 •

edited