[release-4.8] Bug 1998938: Use client-go's leader election implementation #421

timflannagan · 2021-08-30T04:57:29Z

Description of the change:

Motivation for the change:

Reviewer Checklist

Implementation matches the proposed design, or proposal is updated to match implementation
Sufficient unit test coverage
Sufficient end-to-end test coverage
Docs updated or added to /docs
Commit messages sensible and descriptive

…figMaps Update the `marketplace-operator` ClusterRole RBAC and add an entry that allows the marketplace operator to update ConfigMaps in order to get client-go's leader election leader-for-lease implementation working correctly.

Update the cmd,pkg files that house the cleanup/migration logic for ensuring that any deprecated APIs, like the OperatorSource resource, are transistioned to the CatalogSource resource. These APIs have been deprecated for multiple OCP releases now, and the marketplace project is largely a downstream project at this point as most of the heavy-lifting functionality has been migrated over to upstream OLM, so these changes should help cleanup the main package.

Update the cmd/manager/main.go package and use client-go's leader election implementation instead of operator-sdk's implementation. This type of leader election implementation is acquired through a leasing mechanism vs. operator-sdk's leader for life, which can produce a failed upgrade in some edge cases as the new operator version cannot acquire leader election. Alternatives: - Using controller-runtime's implementation, but that would likely include a sizable refactor in the driver function when setting up the manager instance here, as that implementation only attempts to gain leader election when mgr.Start() is called. This is problematic in the case that we have several pre-requisites before starting any controllers/informer caches/etc., like loading the default CatalogSource YAML manifests into a cache. - Avoid needing a leader election mechanism by spinning up a sidecar container that recycles Pods when a new rollout is detected.

openshift-ci · 2021-08-30T04:57:38Z

@timflannagan: This pull request references Bugzilla bug 1998938, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.8.z) matches configured target release for branch (4.8.z)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
dependent bug Bugzilla bug 1958888 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
dependent Bugzilla bug 1958888 targets the "4.9.0" release, which is one of the valid target releases: 4.9.0
bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (xzha@redhat.com), skipping review request.

In response to this:

[release-4.8] Bug 1998938: Use client-go's leader election implementation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

timflannagan · 2021-08-30T05:00:25Z

cmd/manager/main.go

-	clientGo, err := client.New(cfg, client.Options{Scheme: mgr.GetScheme()})
-	if err != nil && !k8sErrors.IsNotFound(err) {
-		logrus.Fatal(err, "Failed to instantiate the client for migrator")
-	}
-	migrator := migrator.New(clientGo)
-	err = migrator.Migrate()
-	if err != nil {
-		logrus.Error(err, "[migration] Error while migrating Marketplace away from OperatorSource API")
-	}


Quick note to reviewers: the master/release-4.9 PR automatic cherrypick failed when attempting to apply these commit(s) to the existing cmd/manager/main.go implementation. As a result, I added the commit for removing the migration logic that was added during the 4.9 release timeframe: bcb9289. Should be a safe change to sneak in given that OperatorSource APIs were deprecated in 4.5 and removed in 4.6+.

timflannagan · 2021-08-30T05:03:08Z

pkg/signals/signals.go

@@ -28,7 +30,9 @@ func Context() context.Context {

 			select {
 			case <-signalCtx.Done():
+				logrus.Info("received the done signal")


Hmm, these look like debug artifacts that were added in the initial, merged PR that made it's way through code review. I'm not entirely sure I see the value in logging the current state of signal handling, at least at the info-level log level, but not the end of the world. Happy to modify that cherrypick(-ed) commit and remove this additional logging if anyone has a strong opinion.

openshift-ci · 2021-08-30T06:08:19Z

@timflannagan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Rerun command
ci/prow/okd-e2e-aws	`c4f9672`	link	`/test okd-e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

kevinrizza · 2021-08-30T13:17:20Z

/approve

dinhxuanvu

/lgtm

openshift-ci · 2021-08-30T17:33:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dinhxuanvu, kevinrizza, timflannagan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [kevinrizza]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2021-08-30T17:34:54Z

/retest-required