-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PollImmediate for e2e metrics to avoid race with prom scrape interval #2483
Add PollImmediate for e2e metrics to avoid race with prom scrape interval #2483
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold Can you remove the skip for none platform to ensure this addresses that use case? Or is that not part of the goal here?
feel free to remove the hold if this isn't the goal, i just wanted to make sure we got the right signal we were looking for with the e2e tests |
Thanks @davidvossel I don't want to take any risk of conflating potentially different root causes. Let me ship this as it's if it passes and I'll immediately follow up to remove the none skip. |
47f2d70
to
a68f36d
Compare
/lgtm |
FYI: issue this should fix took out an e2e
|
/lgtm |
@enxebre: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What this PR does / why we need it:
We have seen failures
Failed to validate that metrics are exposed: "hypershift_cluster_deletion_duration_seconds" not found for the TestNodePool/ValidateMetricsAreExposed
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2477/pull-ci-openshift-hypershift-main-e2e-aws/1651173441761447936/artifacts/e2e-aws/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/hypershift/core/pods/logs/operator-85bbc866c7-7kjxl-operator.log
Even though the code for exposing the metric has run as proved by this log
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2477/pull-ci-openshift-hypershift-main-e2e-aws/1651173441761447936/artifacts/e2e-aws/run-e2e/artifacts/TestNodePool_DestroyCluster_1/namespaces/hypershift/core/pods/logs/operator-85bbc866c7-7kjxl-operator.log
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)
format, where issue_number might be a GitHub issue, or a Jira story:Fixes #
Checklist