Skip to content

Conversation

liouk
Copy link
Member

@liouk liouk commented Jul 15, 2025

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True.

This is needed in the scope of openshift/cluster-authentication-operator#740.

This PR replaces #1902.

@liouk liouk changed the title Add workload deletion condition method and clean-up AUTH-543: Add workload deletion condition method and clean-up Jul 15, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 15, 2025

@liouk: This pull request references AUTH-543 which is a valid jira issue.

In response to this:

There might be cases (as demonstrated in OCPBUGS-44937) where we might want to gracefully delete the operand workload of the workload controller, and keep the operator status available (instead of unavailable or degraded).

This PR adds an optional way of specifying a deletion condition which will trigger the deletion of the operand gracefully, keeping the operator's status as Available=True.

This is needed in the scope of openshift/cluster-authentication-operator#740.

This PR replaces #1902.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and p0lyn0mial July 15, 2025 12:50
// remove any conditions and generations owned by it, because the respective API fields have 'map'
// as the list type where field managers can be list element-specific
if err := c.operatorClient.ApplyOperatorStatus(ctx, c.controllerInstanceName, applyoperatorv1.OperatorStatus()); err != nil {
return c.updateOperatorStatus(ctx, operatorStatus, nil, false, false, []error{err})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just simply return an err here, this is also what we do in updateOperatorStatus


// WithEmptyVersionRemoval returns a copy of the StatusSyncer that will
// remove versions that are an empty string in VersionGetter from the status.
func (c *StatusSyncer) WithEmptyVersionRemoval() *StatusSyncer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use WithVersionRemoval instead ?

Copy link
Member Author

@liouk liouk Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a WithVersionRemoval() method that does something slightly different: it removes versions that are missing in VersionGetter from the status. Also, there's no way to delete a version from the version getter, as it only allows to Get/Set one, hence the empty string need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm, maybe we should allow for version removal then ?

I think that would involve exposing a new method on https://github.com/openshift/cluster-openshift-apiserver-operator/blob/476eac3b927b2a935f01f8ac4ec74f9a0cf54bc4/vendor/github.com/openshift/library-go/pkg/operator/status/version.go#L33

If we had such method then we could pair it with WithVersionRemoval, right ?

@liouk liouk force-pushed the AUTH-543 branch 2 times, most recently from cedc4e7 to cb97c15 Compare July 15, 2025 14:59
v.lock.Lock()
defer v.lock.Unlock()

delete(v.versions, operandName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check if the version wasn't removed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if it was maybe we don't have to notify (?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() is a no-op if the map is nil/empty or the key doesn't exist, so this is a safe call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we will call notifyChannelsLocked event if nothing was removed.
let's only call notifyChannelsLocked when we removed an entry from the map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

return channel
}

func (v *versionGetter) notifyChannels() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change the name of the method to notifyChannelsLocked so that it is clear it must be called under lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point

return err
}

if deleted, operandName, err := c.delegate.WorkloadDeleted(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move it down. In theory it could be possible to define preconditions for the new method as well.

So I think the order should be: Precondiditons, WorkladDeleted, Sync

return err
}

c.versionRecorder.UnsetVersion(operandName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding Set method can be prefixed with c.operandNamePrefix. I think we need to do the same, no?

type VersionGetter interface {
// SetVersion is a way to set the version for an operand. It must be thread-safe
SetVersion(operandName, version string)
// UnsetVersion removes a version for an operand. It must be thread-safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please expand the comment by explaining what happens when the given operandName was already deleted.

v.lock.Lock()
defer v.lock.Unlock()

delete(v.versions, operandName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we will call notifyChannelsLocked event if nothing was removed.
let's only call notifyChannelsLocked when we removed an entry from the map.

Copy link
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two more comments, other than that LGTM

return nil
}

func (c *Controller) addPrefix(name string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please rename to constructOperandNameFor(name string) string or constructOperandName(name string) string

v.lock.Lock()
defer v.lock.Unlock()

delete(v.versions, operandName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

liouk added 2 commits July 16, 2025 09:57
…rface

This determines whether the delegate controller has deleted the workload, and
indicates to the workload controller that it must clear the respective
operator status fields.
@p0lyn0mial
Copy link
Contributor

/lgtm
/approve

/hold

please test this PR with the oas and maybe the auth operator.
feel free to cancel the hold if there are no errors. thanks!

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2025
Copy link
Contributor

openshift-ci bot commented Jul 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liouk, p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 16, 2025
@liouk
Copy link
Member Author

liouk commented Jul 16, 2025

@everettraven
Copy link

New proof PRs after dependency bumps:

@everettraven
Copy link

openshift/cluster-openshift-apiserver-operator#621 has passed all tests as well.

please test this PR with the oas and maybe the auth operator.
feel free to cancel the hold if there are no errors. thanks!

Unholding as this has been satisfied.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 7f9bc3e and 2 for PR HEAD adfa251 in total

Copy link
Contributor

openshift-ci bot commented Jul 29, 2025

@liouk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 0e81d05 into openshift:master Jul 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants