Skip to content

ci-operator/jobs/openshift/release: Use upgrade-all in chained updates#13002

Closed
wking wants to merge 1 commit intoopenshift:masterfrom
wking:conformance-post-chained-update
Closed

ci-operator/jobs/openshift/release: Use upgrade-all in chained updates#13002
wking wants to merge 1 commit intoopenshift:masterfrom
wking:conformance-post-chained-update

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Oct 21, 2020

WIP because I'm only messing with 4.6 for now. If presubmits look good, I'll expand to the other versions.

Using the upgrade-all precedent from cfcd60f (#8594). I'm not clear on why we are joining with ; instead of &&; presumably this is getting wrapped in a set -e or equivalent. But I'm sticking with
; to match precedent.

This increases the risk that we time out these slow jobs (e.g. this job took 3h42m), but we really want to exercise tests like openshift/origin@9f7fe0089d (openshift/origin#22564), which is in
openshift/conformance/serial, because machines launch with the born-in boot images until we get openshift/enhancements#201.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 21, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: wking
To complete the pull request process, please assign droslean after the PR has been reviewed.
You can assign the PR to them by writing /assign @droslean in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking wking force-pushed the conformance-post-chained-update branch from f9fddc7 to f92c233 Compare October 21, 2020 20:01
@wking
Copy link
Copy Markdown
Member Author

wking commented Oct 21, 2020

I've pushed f9fddc76b1 -> f92c233225 to hopefully fix could not find required secret: secrets "e2e-aws-upgrade-all-cluster-profile" not found.

@wking
Copy link
Copy Markdown
Member Author

wking commented Oct 22, 2020

Hah:

fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:137]: during upgrade to RELEASE_IMAGES
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/13002/rehearse-13002-release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci/1319006396699643904/artifacts/e2e-aws-upgrade-all/clusterversion.json | jq -r '.items[].status.history[] | .startedTime + " " + .completionTime + " " + .version + " " + .state + " " + (.verified | tostring) + " " + .image'
2020-10-21T20:44:16Z   Partial false RELEASE_IMAGES
2020-10-21T20:22:31Z 2020-10-21T20:40:46Z 4.3.40 Completed false quay.io/openshift-release-dev/ocp-release@sha256:9ff90174a170379e90a9ead6e0d8cf6f439004191f80762764a5ca3dbaab01dc

I'm just cribbing the IMAGE=RELEASE_IMAGES from the existing e2e-$(CLUSTER_TYPE)-upgrade entry. Maybe this is something that pj-rehearse has trouble rehearsing?

@wking wking force-pushed the conformance-post-chained-update branch from f92c233 to 9dbc178 Compare October 22, 2020 05:08
@wking wking changed the title WIP: ci-operator/jobs/openshift/release: Use upgrade-all in chained updates ci-operator/jobs/openshift/release: Use upgrade-all in chained updates Oct 22, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 22, 2020
Using the 'upgrade-all' precedent from cfcd60f (release:
Standardize all ci-chat-bot jobs, 2020-04-27, openshift#8594).  I'm not clear
on why we are joining with a newline instead of '&&'; presumably this
is getting wrapped in a 'set -e' or equivalent.  But I'm sticking with
newline to match precedent.

This increases the risk that we time out these slow jobs (e.g. [1]
took 3h42m), but we really want to exercise tests like
openshift/origin@9f7fe0089d (Add test for scaling machineSets,
2019-04-11, openshift/origin#22564), which is in
openshift/conformance/serial, because machines launch with the born-in
boot images until we get [2].

And in fact, the reason why we didn't have this post-update suite in
4.6 was because of 3bc9d8e (stop running e2e tests after three
upgrades because we hit timeouts and lose upgrade signal, 2020-10-05, openshift#12436).
But since 3c915e2 (ci-operator/step-registry/openshift/e2e/test:
Add 2h active_deadline_seconds, 2020-10-09, openshift#12647), we no longer have
to worry about getting logs when that step is slow.  So we might not
pass if we're slow, but we'll still get logs to debug why we're slow.

Only for 4.6 and later, because 4.5 is live and if we had problems
there we'd probably have already heard about them from customers.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci/1318709056830967808
[2]: openshift/enhancements#201
@wking wking force-pushed the conformance-post-chained-update branch from 9dbc178 to 9ed2193 Compare October 22, 2020 05:10
@wking
Copy link
Copy Markdown
Member Author

wking commented Oct 22, 2020

No longer a WIP, because I'm bumping 4.7 now too. No 4.8 yet, so nothing to bump beyond 4.7. CC @deads2k , since this is something of a revert for #12436. In fact, because I want the Serial machine-scaling test, it's making the risk of a timeout even worse. If we are really tight for time, and can't talk ourselves into larger caps, one safety valve would be setting TEST_SKIPS to trim down the stuff we don't want (#12233, or adding a TEST_INCLUDES regexp option to whitelist the test-cases we do want).

@wking
Copy link
Copy Markdown
Member Author

wking commented Oct 22, 2020

/hold

While we sort out #13018 and #13019.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 22, 2020
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/rehearse/release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.6-to-4.7-ci 9ed2193 link /test pj-rehearse
ci/rehearse/release-openshift-origin-installer-e2e-aws-upgrade-4.3-to-4.4-to-4.5-to-4.6-ci 9ed2193 link /test pj-rehearse
ci/prow/pj-rehearse 9ed2193 link /test pj-rehearse
ci/prow/secret-generator-config-valid 9ed2193 link /test secret-generator-config-valid

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 25, 2021

@wking: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/ci-secret-bootstrap-config-validation 9ed2193 link /test ci-secret-bootstrap-config-validation

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 28, 2021
@openshift-bot
Copy link
Copy Markdown
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants