Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Detect when the master pool is still updating after upgrade #25922

Merged

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Feb 23, 2021

The MCO is required to roll out the master config pool prior to reporting level, and thus CVO should never reach Available at a
new version without the master pool being updated. Add an extra check to the pool rollout - if we see the master pool with an
Updating condition flag it and fail the upgrade job. Also handle paused pools in the wait loop (a paused pool is effectively "we
will not deal with this pool" and so we will exit early and log).

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/988/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1364259570968432640 had a possible case where this happened, tighten the test.

Feb 23 18:43:19.284: INFO: cluster upgrade is Progressing: Working towards 4.8.0-0.ci.test-2021-02-23-170859-ci-op-m36h5d5b: 666 of 669 done (99% complete)
Feb 23 18:43:29.285: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-m36h5d5b/release@sha256:7884e2856b8e4f2a71dbcaf8333adc36ac7805315726b31d7c075b961a163a94
Feb 23 18:43:29.313: INFO: Waiting on pools to be upgraded
Feb 23 18:43:29.351: INFO: Pool master is still reporting (Updated: false, Updating: true, Degraded: false)

^ no, bad

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2021
Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

Holding so we can look at the presubmit jobs and confirm this is working.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2021
@wking
Copy link
Member

wking commented Feb 23, 2021

images:

test/extended/dr/common.go:89:3: cannot use bool value as type error in return argument:
	bool does not implement error (missing Error method)
make: *** [vendor/github.com/openshift/build-machinery-go/make/targets/golang/build.mk:14: build] Error 2

not clear to me how that's related to this PR, but we certainly don't want to land things that fail compilation.

@smarterclayton
Copy link
Contributor Author

It’s not, which is concerning

/retest

@@ -499,33 +507,39 @@ func recordClusterEvent(client kubernetes.Interface, uid, action, reason, note s
}

// TODO(runcom): drop this when MCO types are in openshift/api and we can use the typed client directly
func IsPoolUpdated(dc dynamic.NamespaceableResourceInterface, name string) (bool, error) {
func IsPoolUpdated(dc dynamic.NamespaceableResourceInterface, name string) (poolUpToDate bool, poolIsUpdating bool) {
Copy link
Member

@wking wking Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ git --no-pager grep IsPoolUpdated origin/master
origin/master:test/e2e/upgrade/upgrade.go:                                      updated, err := IsPoolUpdated(mcps, p.GetName())
origin/master:test/e2e/upgrade/upgrade.go:func IsPoolUpdated(dc dynamic.NamespaceableResourceInterface, name string) (bool, error) {
origin/master:test/extended/dr/common.go:               return upgrade.IsPoolUpdated(mcps, "master")

So this is causing the compilation error.

The MCO is required to roll out the master config pool prior to
reporting level, and thus CVO should never reach Available at a
new version without the master pool being updated. Add an extra
check to the pool rollout - if we see the master pool with an
Updating condition flag it and fail the upgrade job. Also handle
paused pools in the wait loop (a paused pool is effectively "we
will not deal with this pool" and so we will exit early and log).
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 24, 2021
@openshift-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/test e2e-gcp
/test e2e-aws-fips

@smarterclayton
Copy link
Contributor Author

/test e2e-aws-fips

@smarterclayton
Copy link
Contributor Author

/retest
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2021
@smarterclayton
Copy link
Contributor Author

The flake is not being cooperative, I’m going to merge and then keep watching for it

@smarterclayton smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2021
@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/refresh

@smarterclayton
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants