Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix finalizer processing #3180

Merged
merged 1 commit into from
Feb 28, 2024
Merged

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Feb 22, 2024

Description of the change:

Handle CSV deletion during sync.

Motivation for the change:

There are two reconcilers for the CSV, one is controller-runtime based,
the other is based on queueinformer. A finalizer was added to the CSV
and it's handled by the controller-runtime reconciler.

However, the queueinformer-based reconciler may take a while to do its
processing such that the CSV may be deleted and the finalizer run via
the controller-runtime reconciler. (This still takes <1s.)

This causes problems when CSV being synced is deleted. The queueinformer
attempts to sync RBAC to the stale CSV. The proper (BIG/risky) fix is to
consolidate the two reconcilers. A less risky fix is to query the lister
cache to see if the CSV has been deleted (or has a DeletionTimestamp),
and not do the RBAC updates if that's the case.

Architectural changes:

None.

Testing remarks:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Bug fixes are accompanied by regression test(s)
  • e2e tests and flake fixes are accompanied evidence of flake testing, e.g. executing the test 100(0) times
  • tech debt/todo is accompanied by issue link(s) in comments in the surrounding code
  • Tests are comprehensible, e.g. Ginkgo DSL is being used appropriately
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive
  • Tests marked as [FLAKE] are truly flaky and have an issue
  • Code is properly formatted

@tmshort
Copy link
Contributor Author

tmshort commented Feb 22, 2024

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 22, 2024
@tmshort
Copy link
Contributor Author

tmshort commented Feb 22, 2024

This is an improvement, but it's not perfect... there's still a race condition.

@tmshort tmshort marked this pull request as draft February 22, 2024 18:07
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 22, 2024
@tmshort tmshort marked this pull request as ready for review February 26, 2024 21:00
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 26, 2024
@tmshort
Copy link
Contributor Author

tmshort commented Feb 26, 2024

/unhold

@tmshort tmshort closed this Feb 26, 2024
@tmshort tmshort reopened this Feb 26, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2024
There are two reconcilers for the CSV, one is controller-runtime based,
the other is based on queueinformer. A finalizer was added to the CSV
and it's handled by the controller-runtime reconciler.

However, the queueinformer-based reconciler may take a while to do its
processing such that the CSV may be deleted and the finalizer run via
the controller-runtime reconciler. (This still takes <1s.)

This causes problems when CSV being synced is deleted. The queueinformer
attempts to sync RBAC to the stale CSV. The proper (BIG/risky) fix is to
consolidate the two reconcilers. A less risky fix is to move the finalizer
to the queueinformer reconciler and query the lister cache to see if the
CSV has been deleted, and not do the RBAC updates if that's the case.

Signed-off-by: Todd Short <todd.short@me.com>
Copy link
Member

@kevinrizza kevinrizza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a question and two nits, otherwise this looks good to me.

@@ -1093,6 +1095,9 @@ func (a *Operator) handleClusterServiceVersionDeletion(obj interface{}) {
"phase": clusterServiceVersion.Status.Phase,
})

logger.Debug("start deleting CSV")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are useful in tracking what's happening; this one is debug, so it doesn't clutter the log normally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking me to get rid of the extra logs?

return nil, false
}

log.Info("started finalizer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is Info, however.

@@ -100,10 +95,6 @@ func (r *OperatorConditionGeneratorReconciler) Reconcile(ctx context.Context, re
return ctrl.Result{}, client.IgnoreNotFound(err)
}

if err, ok := r.processFinalizer(ctx, in); !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we were doing this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time, it was the easier location to put the finalizer, since it was the controller-runtime reconciler. But that reconciler didn't manage the RBAC, which caused this race condition. Moving it to the queueinformer reconciler is the proper place for it. It's easier than combining the reconcilers.

@tmshort
Copy link
Contributor Author

tmshort commented Feb 28, 2024

ping @kevinrizza ?

@kevinrizza
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 28, 2024
@kevinrizza kevinrizza added this pull request to the merge queue Feb 28, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 28, 2024
@tmshort tmshort added this pull request to the merge queue Feb 28, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 28, 2024
@tmshort tmshort added this pull request to the merge queue Feb 28, 2024
Merged via the queue into operator-framework:master with commit fc3c183 Feb 28, 2024
15 checks passed
@tmshort tmshort deleted the finalizer-fix branch February 28, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants