Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.12] OCPBUGS-14260: Check for minimum OCP version #1644

Merged

Conversation

aravindhp
Copy link
Contributor

@aravindhp aravindhp commented Jun 6, 2023

OCPBUGS-4862 fixed an issue where deletion of a BYOH node object resulted in it hanging in the Ready,SchedulingDisabled state. This was fixed in OCP 4.12.3 and as a result any WMCO upgrades from 7.0.1 needs to be on versions greater than that. Otherwise BYOH node upgrades will fail and could potentially cause workload disruptions. To prevent that from happening stop WMCO 7.1.0 from running if the OCP version is below 4.12.3.

Fixes OCPBUGS-14260

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 6, 2023
@openshift-ci-robot
Copy link

@aravindhp: This pull request references Jira Issue OCPBUGS-14260, which is invalid:

  • expected the bug to target the "4.12.z" version, but no target version was set
  • expected Jira Issue OCPBUGS-14260 to depend on a bug targeting a version in 4.13.0, 4.13.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

OCPBUGS-4862 fixed an issue where deletion of a BYOH node objected resulted in it hanging in the Ready,SchedulingDisabled state. This was fixed in OCP 4.12.3 and as a result any WMCO upgrades from 7.0.1 needs to be on versions greater than that. Otherwise BYOH node upgrades will fail and could potentially cause workload disruptions. To prevent that from happening stop WMCO 7.1.0 from running if the OCP version is below 4.12.3.

Fixes OCPBUGS-14260

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023
@aravindhp
Copy link
Contributor Author

/approve cancel
/jira refresh

@openshift-ci-robot
Copy link

@aravindhp: This pull request references Jira Issue OCPBUGS-14260, which is invalid:

  • expected Jira Issue OCPBUGS-14260 to depend on a bug targeting a version in 4.13.0, 4.13.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/approve cancel
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023
@aravindhp
Copy link
Contributor Author

/label jira/valid-bug

This bug only affects OCP 4.12 / WMCO 7.0.1

@openshift-ci openshift-ci bot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jun 6, 2023
@aravindhp aravindhp removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jun 6, 2023
@aravindhp
Copy link
Contributor Author

/test gcp-e2e-operator

@aravindhp
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link

@aravindhp: This pull request references Jira Issue OCPBUGS-14260, which is invalid:

  • expected Jira Issue OCPBUGS-14260 to depend on a bug targeting a version in 4.13.0, 4.13.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Retaining the jira/valid-bug label as it was manually added.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aravindhp
Copy link
Contributor Author

/test gcp-e2e-operator

@aravindhp
Copy link
Contributor Author

/test gcp-e2e-operator

@aravindhp
Copy link
Contributor Author

/test gcp-e2e-operator

@aravindhp
Copy link
Contributor Author

/test unit

Comment on lines +33 to +34
// minOpenShiftVersion is the minimum required OCP version due to https://issues.redhat.com/browse/OCPBUGS-4862.
// Without this fix, BYOH node upgrades will fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could call out this is the min required OCP version within 4.12.z

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the release-4.12 branch. Do we really need to call this out?

Comment on lines 161 to 169
var openShiftVersion string
for _, update := range clusterVersion.Status.History {
if update.State == oconfig.CompletedUpdate {
// obtain the version from the last completed update
openShiftVersion = update.Version
break
}
}
return openShiftVersion, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't checking if the version is missing, I suggest doing this instead:

	var openShiftVersion string
	for _, update := range clusterVersion.Status.History {
		if update.State == oconfig.CompletedUpdate {
			// obtain the version from the last completed update
			return update.Version
		}
	}
	return "", fmt.Errorf("no completed updated was found")

Comment on lines 201 to 203
if len(openShiftVersion) == 0 {
return fmt.Errorf("cluster version not set")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this check, it would be more comprehensive to add the v prefix and then use semver.isValid to check that the version is formatted properly.

Alternatively you could do this within getOpenShiftVersion, renaming it to getOpenShiftSemver. That way it is guaranteed that getOpenShiftSemver is returning a valid semver, and the code within this function doesn't need to be as defensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOpenShiftSemver would be overloading a function and is not a good pattern. I prefer to keep the validation in one spot and making code more defensive is fine.

Comment on lines 270 to 271
if (err != nil) != tt.wantErr {
assert.Errorf(t, err, "error = %v, wantErr %v", err, tt.wantErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to do something like this to check if theres an error when we want it, and not when don't?

assert.Equal(t,( err != nil ), tt.wantErr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually has the same effect but I like you version better.

@aravindhp
Copy link
Contributor Author

/retitle OCPBUGS-14260: Check for minimum OCP version

@openshift-ci openshift-ci bot changed the title OCPBUGS-14260: [cluster] Check for minimum OCP version OCPBUGS-14260: Check for minimum OCP version Jun 8, 2023
Copy link
Contributor Author

@aravindhp aravindhp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @sebsoto. I have addressed your comments.

Comment on lines +33 to +34
// minOpenShiftVersion is the minimum required OCP version due to https://issues.redhat.com/browse/OCPBUGS-4862.
// Without this fix, BYOH node upgrades will fail.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the release-4.12 branch. Do we really need to call this out?

Comment on lines 201 to 203
if len(openShiftVersion) == 0 {
return fmt.Errorf("cluster version not set")
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOpenShiftSemver would be overloading a function and is not a good pattern. I prefer to keep the validation in one spot and making code more defensive is fine.

Comment on lines 270 to 271
if (err != nil) != tt.wantErr {
assert.Errorf(t, err, "error = %v, wantErr %v", err, tt.wantErr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually has the same effect but I like you version better.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sebsoto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2023
Copy link
Contributor

@alinaryan alinaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than commit message

's/fixed an issue where
deletion of a BYOH node objected/fixed an issue where
deletion of a BYOH node object/'

https://issues.redhat.com/browse/OCPBUGS-4862 fixed an issue where
deletion of a BYOH node object resulted in it hanging in the
"Ready,SchedulingDisabled" state. This was fixed in OCP 4.12.3 and as a
result any WMCO upgrades from 7.0.1 needs to be on versions greater than
that. Otherwise BYOH node upgrades will fail and could potentially cause
workload disruptions. To prevent that from happening stop WMCO 7.1.0
from running if the OCP version is below 4.12.3. This check is ignored
in CI runs as CI clusters do not advertise Z streams.
@openshift-ci-robot
Copy link

@aravindhp: This pull request references Jira Issue OCPBUGS-14260, which is invalid:

  • expected Jira Issue OCPBUGS-14260 to depend on a bug targeting a version in 4.13.0, 4.13.z and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Retaining the jira/valid-bug label as it was manually added.

In response to this:

OCPBUGS-4862 fixed an issue where deletion of a BYOH node object resulted in it hanging in the Ready,SchedulingDisabled state. This was fixed in OCP 4.12.3 and as a result any WMCO upgrades from 7.0.1 needs to be on versions greater than that. Otherwise BYOH node upgrades will fail and could potentially cause workload disruptions. To prevent that from happening stop WMCO 7.1.0 from running if the OCP version is below 4.12.3.

Fixes OCPBUGS-14260

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aravindhp
Copy link
Contributor Author

LGTM other than commit message

's/fixed an issue where deletion of a BYOH node objected/fixed an issue where deletion of a BYOH node object/'

@alinaryan fixed.

@alinaryan
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 8, 2023
@aravindhp aravindhp marked this pull request as ready for review June 8, 2023 19:45
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2023
@aravindhp
Copy link
Contributor Author

/retitle [release-4.12] OCPBUGS-14260: Check for minimum OCP version

@openshift-ci openshift-ci bot changed the title OCPBUGS-14260: Check for minimum OCP version [release-4.12] OCPBUGS-14260: Check for minimum OCP version Jun 8, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD d4ffecd and 2 for PR HEAD 2a77c4d in total

@aravindhp
Copy link
Contributor Author

/retest-required

DaemonSet scale down failure in e2e-aws-upgrade

@aravindhp
Copy link
Contributor Author

/retest-required

 INFO[2023-06-09T05:44:41Z] pod pending for more than 30m0s: containers have not started in 30m0.001030114s: ci-scheduling-dns-wait, place-entrypoint, release, sidecar: 
* Container release is not ready with reason PodInitializing
* Container sidecar is not ready with reason PodInitializing
Found 2 events for Pod release-latest:
* 0x : Successfully assigned ci-op-0xcr7jbh/release-latest to ip-10-0-163-85.ec2.internal
* 6x kubelet: Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded 
INFO[2023-06-09T05:44:41Z] Ran for 1h2m4s                               
ERRO[2023-06-09T05:44:41Z] Some steps failed:                           
ERRO[2023-06-09T05:44:41Z] 
  * could not run steps: step [release:latest] failed: release "release-latest" failed: pod pending for more than 30m0s: containers have not started in 30m0.001030114s: ci-scheduling-dns-wait, place-entrypoint, release, sidecar: 
* Container release is not ready with reason PodInitializing
* Container sidecar is not ready with reason PodInitializing
Found 2 events for Pod release-latest:
* 0x : Successfully assigned ci-op-0xcr7jbh/release-latest to ip-10-0-163-85.ec2.internal
* 6x kubelet: Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded 

🤷🏽‍♂️

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 9, 2023

@aravindhp: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit cfb3eef into openshift:release-4.12 Jun 9, 2023
13 checks passed
@openshift-ci-robot
Copy link

@aravindhp: Jira Issue OCPBUGS-14260: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-14260 has been moved to the MODIFIED state.

In response to this:

OCPBUGS-4862 fixed an issue where deletion of a BYOH node object resulted in it hanging in the Ready,SchedulingDisabled state. This was fixed in OCP 4.12.3 and as a result any WMCO upgrades from 7.0.1 needs to be on versions greater than that. Otherwise BYOH node upgrades will fail and could potentially cause workload disruptions. To prevent that from happening stop WMCO 7.1.0 from running if the OCP version is below 4.12.3.

Fixes OCPBUGS-14260

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@aravindhp aravindhp deleted the OCPBUGS-14260 branch June 9, 2023 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants