Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support oc adm upgrade rollback in CI #51287

Merged
merged 1 commit into from
May 10, 2024

Conversation

shellyyang1989
Copy link
Contributor

The change adds support for oc adm upgrade rollback cli.

Refers OTA-1071

@shellyyang1989
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-ec-aws-ipi-byo-route53-f28

@openshift-ci-robot
Copy link
Contributor

@shellyyang1989: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Comment on lines 63 to 68
res=$(run_command "oc adm upgrade rollback")
out="Requested rollback from ${TARGET_VERSION} to ${SOURCE_VERSION}"
if [[ ${res} == *"${out}"* ]]; then
echo "Rolling back cluster gets started..."
else
echo "Rolling back cluster doesn't start..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should fail (return 1) if we're getting unexpected output, as well as capture stderr (2>&1) into res and print the error vs the expected? wdyt?

Suggested change
res=$(run_command "oc adm upgrade rollback")
out="Requested rollback from ${TARGET_VERSION} to ${SOURCE_VERSION}"
if [[ ${res} == *"${out}"* ]]; then
echo "Rolling back cluster gets started..."
else
echo "Rolling back cluster doesn't start..."
res=$(oc adm upgrade rollback 2>&1 || true)
out="Requested rollback from ${TARGET_VERSION} to ${SOURCE_VERSION}"
if [[ ${res} == *"${out}"* ]]; then
echo "Rolling back cluster from ${TARGET_VERSION} to ${SOURCE_VERSION} started..."
else
echo "Rolling back cluster returned unexpected:\n${res}\nexpecting: ${out}"
return 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

local version state
version=$(oc get clusterversion/version -o jsonpath='{.status.history[0].version}')
state=$(oc get clusterversion/version -o jsonpath='{.status.history[0].state}')
if [[ ${version} == "${TARGET_VERSION}" && ${state} == "Completed" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version=$(oc get clusterversion/version -o jsonpath='{.status.history[0].version}')
state=$(oc get clusterversion/version -o jsonpath='{.status.history[0].state}')
if [[ ${version} == "${TARGET_VERSION}" && ${state} == "Completed" ]]; then
echo "History check PASSED, cluster is now rollbacked to ${TARGET_VERSION}" && return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

TARGET_VERSION="$(env "NO_PROXY=*" "no_proxy=*" oc adm release info "${TARGET}" --output=json | jq -r '.metadata.version')"

SOURCE_VERSION="$(oc get clusterversion --no-headers | awk '{print $2}')"
SOURCE_MINOR_VERSION="$(echo "${SOURCE_VERSION}" | cut -f2 -d.)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like SOURCE_MINOR_VERSION and TARGET_MINOR_VERSION are not used in any other places except in log, so could be dropped, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

# Rollback the cluster to target release
function rollback() {
res=$(run_command "oc adm upgrade rollback")
out="Requested rollback from ${TARGET_VERSION} to ${SOURCE_VERSION}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to https://github.com/openshift/release/pull/51287/files#r1593479250, should adjust ${TARGET_VERSION} with ${SOURCE_VERSION}, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@shellyyang1989
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-ec-aws-ipi-byo-route53-f28 periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f14

@openshift-ci-robot
Copy link
Contributor

@shellyyang1989: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@shellyyang1989 shellyyang1989 force-pushed the rollback-oc branch 3 times, most recently from 7d2c4da to 3778e0c Compare May 9, 2024 04:09
@shellyyang1989
Copy link
Contributor Author

/pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-ec-aws-ipi-byo-route53-f28

@openshift-ci-robot
Copy link
Contributor

@shellyyang1989: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@evakhoni
Copy link
Contributor

evakhoni commented May 9, 2024

maybe worth adding status dump such as

if [[ -n "${TARGET_MINOR_VERSION}" ]] && [[ "${TARGET_MINOR_VERSION}" -ge "16" ]] ; then
echo -e "\n# oc adm upgrade status\n"
env OC_ENABLE_CMD_UPGRADE_STATUS='true' oc adm upgrade status --details=all || true
fi

sounds like rollback failures may be a valuable insight
however this may go into another PR as well, just an idea

@evakhoni
Copy link
Contributor

evakhoni commented May 9, 2024

all in all lgtm, leaving to @jiajliu for a final review.

cpu: 100m
memory: 200Mi
- ref: cucushift-chainupgrade-toimage
- chain: openshift-upgrade-qe-sanity-rollback
workflow: cucushift-installer-rehearse-ibmcloud-ipi
- as: nutanix-ipi-f28
Copy link
Contributor

@jianlinliu jianlinliu May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we update the nutanix job name to reflect the install configurations together in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -38397,7 +38397,7 @@ periodics:
ci.openshift.io/generator: prowgen
job-release: "4.16"
pj-rehearse.openshift.io/can-be-rehearsed: "true"
name: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f14
name: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had a discussion around nightly job in #51287 (comment), wondering the signature problem is not an issue now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch! I should have updated the latest image to rc.0 but I forgot. Rolling back from nightly to rc should be okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

env:
- name: TIMEOUT
default: "130"
documentation: Time to wait for upgrade finish
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: upgrade->rollback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@shellyyang1989
Copy link
Contributor Author

Rollback step passed in the reheasal.

INFO[2024-05-09T06:30:58Z] Running step aws-ipi-byo-route53-f28-cucushift-upgrade-rollback. 
INFO[2024-05-09T07:36:32Z] Step aws-ipi-byo-route53-f28-cucushift-upgrade-rollback succeeded after 1h5m34s. 

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@shellyyang1989: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-nutanix-ipi-boot-categories-project-f28 N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f28 N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-aws-ipi-byo-route53-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-aws-ipi-ovn-hypershift-inplace-f7 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-baremetal-ipi-ovn-ipv4-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-ibmcloud-ipi-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-gcp-ipi-ovn-ipsec-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-vsphere-upi-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-ec-aws-ipi-byo-route53-f28 N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@shellyyang1989: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-ibmcloud-ipi-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-nutanix-ipi-boot-categories-project-f28 N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-gcp-ipi-ovn-ipsec-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f28 N/A periodic Periodic changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-aws-ipi-ovn-hypershift-inplace-f7 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-baremetal-ipi-ovn-ipv4-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-ec-aws-ipi-byo-route53-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-vsphere-upi-f28 N/A periodic Ci-operator config changed
periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-aws-ipi-byo-route53-f28 N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Copy link
Contributor

openshift-ci bot commented May 9, 2024

@shellyyang1989: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f14 4e679f5 link unknown /pj-rehearse periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-rollback-nightly-azure-upi-f14

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jiajliu
Copy link
Contributor

jiajliu commented May 10, 2024

lgtm

@jianlinliu
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 10, 2024
@shellyyang1989
Copy link
Contributor Author

@liangxia PTAL. Thank you!

@liangxia
Copy link
Member

/lgtm

Copy link
Contributor

openshift-ci bot commented May 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jianlinliu, liangxia, shellyyang1989

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2024
@jianlinliu
Copy link
Contributor

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@jianlinliu: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 10, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit f986910 into openshift:master May 10, 2024
19 checks passed
jbpratt pushed a commit to jbpratt/release that referenced this pull request May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged
Projects
None yet
6 participants