SREP-4345: Fix ROSA CI stability for account-roles and osd-cluster-ready#77500
Conversation
|
@dustman9000: This pull request references SREP-4345 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cb04460 to
2528751
Compare
Two fixes for systemic ROSA CI failures: 1. Account-roles version fallback: when nightly builds for unreleased OCP versions (4.23, 5.0) dont have IAM policies published yet, fall back to the latest available version instead of failing. 2. Increase osd-cluster-ready timeout from 60m to 120m: certman-operator cert delivery via Hive can be slow on staging (DNS validation), and the osd-cluster-ready job crashes on transient errors (log.Fatal), burning time in exponential backoff. Doubling the timeout gives more headroom while the upstream crash-loop fix is worked on in openshift/osd-cluster-ready.
2528751 to
7327e69
Compare
|
[REHEARSALNOTIFIER]
A total of 292 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-4.21-e2e-rosa-sts-ovn |
|
@dustman9000: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-4.21-e2e-rosa-sts-ovn |
|
@dustman9000: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse ack |
|
@dustman9000: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dustman9000, joshbranham The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@dustman9000: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
PR openshift#77500 introduced a version fallback block that references CLUSTER_SWITCH before it was defined, causing an unbound variable error with set -o nounset. Move the CLUSTER_SWITCH assignment before the fallback block so it is available when needed.
Summary
Two fixes for systemic ROSA CI failures affecting all HCP and Classic jobs:
1. Account-roles version fallback (fixes 4.23/5.0 jobs)
When nightly builds for unreleased OCP versions (4.23, 5.0) don't have IAM policies published in ROSA yet,
rosa create account-roles --version 4.23fails. The fix detects unavailable versions and falls back to the latest available version in that channel group.Affected jobs (all 100% failing):
periodic-ci-openshift-release-main-nightly-4.23-e2e-rosa-hcp-ovnperiodic-ci-openshift-release-main-nightly-4.23-e2e-rosa-sts-ovnperiodic-ci-openshift-release-main-nightly-5.0-e2e-rosa-sts-ovn2. Increase osd-cluster-ready timeout from 60m to 120m (fixes Classic STS)
The
osd-cluster-readyjob requires 20 consecutive health checks including certificate validation. certman-operator cert provisioning on staging is slow enough to reset the check counter repeatedly, exceeding the 60m timeout.Affected jobs (all 100% failing):
periodic-ci-openshift-release-main-nightly-4.18-e2e-rosa-sts-ovnperiodic-ci-openshift-release-main-nightly-4.19-e2e-rosa-sts-ovnperiodic-ci-openshift-release-main-nightly-4.20-e2e-rosa-sts-ovnperiodic-ci-openshift-release-main-nightly-4.21-e2e-rosa-sts-ovnperiodic-ci-openshift-release-main-nightly-4.22-e2e-rosa-sts-ovnJira: https://redhat.atlassian.net/browse/SREP-4345