Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api to metrics service changed from prometheus to legacyregistry #42

Merged
merged 1 commit into from Nov 12, 2019

Conversation

SamuelStuchly
Copy link
Contributor

@SamuelStuchly SamuelStuchly commented Nov 8, 2019

Api to metrics service has been changed suddenly without our knowledge, causing metrics to not be registered anymore, therefore this change is needed to resolve the issue.

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Nov 8, 2019
@openshift-ci-robot
Copy link
Contributor

Hi @SamuelStuchly. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 8, 2019
@tisnik
Copy link
Contributor

tisnik commented Nov 8, 2019

Nice catch @SamuelStuchly.
Note for the rest of the team: in the future we'd need to test the /metrics endpoint to ensure that any new change won't break this functionality.

@iNecas
Copy link
Contributor

iNecas commented Nov 11, 2019

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 11, 2019
@iNecas
Copy link
Contributor

iNecas commented Nov 11, 2019

I've tested this, but I'm still not sure why it got broken (seems like related to rebase to k8s 1.16).

@mfojtik any idea? Michal also: can you confirm other operators using the same approach are still reporting the metrics ok (such as cluster-auth-operator https://github.com/openshift/cluster-authentication-operator/blob/master/pkg/version/version.go - I don't see the metric openshift_cluster_authentication_operator_build_info in nightly clusters).

I've seen image_registry_operator_storage_reconfigured_total from https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/metrics/metrics.go, but this operator is running their own metrics server.

What is the recommended way doing this anyway?

@tisnik
Copy link
Contributor

tisnik commented Nov 11, 2019

@iNecas yes Ivan, it was caused by rebasing k8s. Previously the prometheus package was used to expose metrics, now the k8s.io/component-base/metrics/legacyregistry is used instead. Please note that we don'n expose /metrics manually - it is done inside k8s (can be grepped as "/metrics" string is used on just one place). So when prometheus was encapsulated by legacyregistry, the call of prometheus.Register or prometheus.MustRegister is no-op, as /metrics endpoint is handled by another package.

https://github.com/openshift/insights-operator/blob/master/vendor/k8s.io/apiserver/pkg/server/routes/metrics.go#L35

History of this file:
https://github.com/openshift/insights-operator/blame/master/vendor/k8s.io/apiserver/pkg/server/routes/metrics.go#L35

@mfojtik
Copy link
Member

mfojtik commented Nov 11, 2019

  1. you need e2e test so this won't break again
  2. i think using legacyregistry is discouraged, see https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/20190404-kubernetes-control-plane-metrics-stability.md
  3. you might need your custom registry for metrics long term, so you can guarantee "stable" metrics and possibly deprecate metrics (that is what upstream wants)
  4. short term, you can use legacyregistry, but somebody should work out how to wire custom registry and replace legacy.

@tisnik
Copy link
Contributor

tisnik commented Nov 11, 2019

@mfojtik yes Michal, we talked about #1, it really needs to be tested

@iNecas
Copy link
Contributor

iNecas commented Nov 11, 2019

/lgtm, as I've seen this working.

For the custom metrics, I would like to see some other openshift operator going over this process, to be able to inspire from it: the OpenShift core team will probably be more effective finding the right way of doing this that we do.

@iNecas
Copy link
Contributor

iNecas commented Nov 11, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 11, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iNecas, SamuelStuchly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 11, 2019
@iNecas
Copy link
Contributor

iNecas commented Nov 12, 2019

/retest

@openshift-merge-robot openshift-merge-robot merged commit 977c905 into openshift:master Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants