Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRKLDS-728: controller/apiservice: allow to disable/delete an API service resources #1534

Conversation

ingvagabund
Copy link
Member

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 30, 2023

@ingvagabund: This pull request references WRKLDS-728 which is a valid jira issue.

In response to this:

SSIA

Needed by openshift/cluster-openshift-apiserver-operator#532

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from p0lyn0mial and tkashem May 30, 2023 11:52
// GetAPIServicesToMangeFunc provides a full list of managed APIService items.
// The list needs to always contain all the managed APIServices so only managed
// objects are removed without removing user-created/unmanaged objects.
type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this function couldn't return services that are enabled :) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this is not a backward-compatible change which will break users of this function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will you know if an api service is not enabled ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this function couldn't return services that are enabled :) ?

The controller needs a list of APIServices that are disabled. So they can be deleted if present.

how will you know if an api service is not enabled ?

Configurable by anyone who consumes the APIService controller. Atm, the OA operator constructs the list based on enabled capabilities.

@p0lyn0mial
Copy link
Contributor

also the title of this pr is a bit misleading for me, are we actually going to delete something?

curr := coordinate{namespace: apiService.Spec.Service.Namespace, name: apiService.Spec.Service.Name}
for _, managedAPIService := range apiServices {
if !managedAPIService.Enabled {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are skipping disabled services I would simply return only enabled services in GetAPIServicesToMangeFunc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both newEndpointPrecondition and checkDiscoveryForByAPIServices can be probably given only a list of enabled services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I would prefer that :)

for _, managedApiService := range apiServices {
// Remove disabled APIService
if !managedApiService.Enabled {
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, managedApiService.APIService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry, i was reading top-to-bottom, i think this is the place that will remove an api service

// GetAPIServicesToMangeFunc provides a full list of managed APIService items.
// The list needs to always contain all the managed APIServices so only managed
// objects are removed without removing user-created/unmanaged objects.
type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you considered adding a new function which will return services to be removed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or changing the signature of this method to return enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be an alternative. In any case, still a breaking change.

type ManagedAPIService struct {
// Enabled signals whether an APIService is enabled.
// Any disabled APIService is removed from a cluster.
Enabled bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this field be used by all clients or only for some?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once such struct is created, it's intended to be used only by the controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to ask was if all operands will be required to provide this information.

In general my preference would be to check how many operators consume this function.
If the number is low then I would change the signature of the method to enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://github.com/search?q=org%3Aopenshift%20NewAPIServiceController&type=code NewAPIServiceController is used only once.
Based on https://github.com/search?q=org%3Aopenshift%20WithAPIServiceController&type=code WithAPIServiceController is used only by openshift-apiserver and cluster authentication operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I wanted to ask was if all operands will be required to provide this information.

Yeah. They have to. https://github.com/openshift/cluster-authentication-operator/blob/1b2c022034879e09e324bbdc50b84b78ce8de4ee/pkg/operator/starter.go#L793-L798 builds the list explicitly as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to provide a filter function that if not provide will include every api service. On the other hand such a filter will get invoked all over again instead of constructing a list of enabled/disabled service only once at the beginning. The list of enabled/disabled api services is expected to be static or changed very rarely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, since there are only two operators using this controller I would:

  1. Change the signature of the method to GetAPIServicesToMangeFunc func() (enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)
  2. Rename precondition to preconditionForEnabledAPIServices
  3. Split the sync method into two flows, one for reconciling enabled services and the second for reconciling disabled services.

Does it make sense ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It looks better now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have added a few more comments/questions, PTAL.

@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch 2 times, most recently from faf15e3 to 3ca87e7 Compare May 31, 2023 09:49
@@ -118,7 +122,7 @@ func (c *APIServiceController) sync(ctx context.Context, syncCtx factory.SyncCon
return err
}

err = c.syncAPIServices(ctx, apiServices, syncCtx.Recorder())
err = c.syncAPIServices(ctx, enabledApiServices, disabledApiServices, syncCtx.Recorder())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a separate method for reconciling disalbedApiServices and calling it before preconditionForEnabledAPIServices (line 101)?

Having two separate methods would allow us to reconcile enabledApiServices and disabledApiServices independently. I think we always want to reconcile both and aggregate the errors. What do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

errs := []error{}
var availableConditionMessages []string

for _, apiService := range apiServices {
for _, apiService := range disabledApiServices {
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get an APIService from the cache first as that would allow us to put less pressure on the server.
We should issue a live request only when we know the resource must be deleted.
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

type GetAPIServicesToMangeFunc func() ([]*apiregistrationv1.APIService, error)
// GetAPIServicesToMangeFunc provides a two list of managed APIService items.
// Both lists need to always contain all the managed APIServices so only managed
// objects are removed without removing user-created/unmanaged objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by without removing user-created/unmanaged objects ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated. The controller needs to avoid deleting APIServices that are not part of the OpenShift API. I.e. the controller needs to explicitly know which APIServices are disabled instead of deleting all not enabled APIServices.

@ingvagabund
Copy link
Member Author

/retest-required


operatorClient v1helpers.OperatorClient
kubeClient kubernetes.Interface
apiregistrationv1Client apiregistrationv1client.ApiregistrationV1Interface
apiservicelister apiregistrationv1lister.APIServiceLister
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that strange that this controller didn't use apiservicelister ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not taken into account when #667 was reviewed.

for _, apiService := range apiServices {
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil {
for _, apiService := range append(enabledApiServices, disabledApiServices...) {
if _, err := c.apiservicelister.Get(apiService.Name); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about simply calling syncDisabledAPIServices with combined enabledApiServices, disabledApiServices ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


for _, apiService := range apiServices {
if _, err := c.apiservicelister.Get(apiService.Name); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also consider a case in which a service has been already marked for deletion but hasn't been deleted?
(I think that would require checking the DeletionTimestamp field)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be idempotent. Deleting an object that's in the process of deletion should be a no-op. While Get should still return the object without an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is idempotent but we will hammer the API unnecessary. Does it make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this will reduce the toll. On the other hand I wonder if checking the DeletionTimestamp is something we already do in other controllers or whether this is a novel improvement. If the latter, I prefer to refactor the code more and implement generic methods like DeleteOnlyIfPresent. Probably automatically generated for all our APIs. So we don't re-invent the check all over the place. Nevertheless, as part of another PR. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we have an example in our operators. Nevertheless, checking for the DeletionTimestamp is easy to implement. Will do.


disErr := c.syncDisabledAPIServices(ctx, disabledApiServices)

ready, err := c.preconditionForEnabledAPIServices(enabledApiServices)
Copy link
Contributor

@p0lyn0mial p0lyn0mial Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if both syncDisabledAPIServices and preconditionForEnabledAPIServices return some error?
do we want to report errors from syncDisabledAPIServices ?
i think the errors will be ignored (not reported) as it is right now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If syncDisabledAPIServices returns an error, there are two cases that can happen:

  • preconditionForEnabledAPIServices returns err != nil or !ready in which case the APIServicesAvailable condition gets reported as false. Making error returned by syncDisabledAPIServices kinda obsolete since the APIServicesAvailable is already false.
  • preconditionForEnabledAPIServices returns err == nil and ready in which case the error returned by syncDisabledAPIServices gets checked. Producing APIServicesAvailable condition set to false. Just with a different error string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setting APIServicesAvailable when syncDisabledAPIServices returns some errors make sense ?

If all services were available and we failed to remove one service would you expect to find the error in APIServicesAvailable ? Maybe we should report it as a separate condition? (Especially that we already made distinction between enabled/disabled on the API level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation does not make the reported condition worse. Right now it will get reported with Reason: Error. Whether that happens before or after preconditionForEnabledAPIServices makes no difference. Failing to remove or create an APIservice object falls under the same category of an error. If we decide to distinguish between error related to removing and creating an APIService, we should probably introduce a new condition for both Failed to create and Failed to check discovery as well (implemented under syncEnabledAPIServices).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether there's a value in reporting non-nil error for return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...)) through a condition under:

	switch operatorConfigSpec.ManagementState {
		...
	case operatorsv1.Removed:
		enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
		if err != nil {
			return err
		}
		return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to go Degraded. I think that:

Type:    "APIServicesDeleted",
Status:  operatorv1.ConditionFalse,
Reason:  "DisabledAPIServicesPresent",
Message: err.Error(),

should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, We could split the APIServiceController.sync method into two flows for simplicity. I think I would do:

func (c *APIServiceController) sync(ctx context.Context, syncCtx factory.SyncContext) error {
	operatorConfigSpec, _, _, err := c.operatorClient.GetOperatorState()
	if err != nil {
		return err
	}

	switch operatorConfigSpec.ManagementState {
	case operatorsv1.Managed:
	case operatorsv1.Unmanaged:
		return nil
	case operatorsv1.Removed:
		enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
		if err != nil {
			return err
		}
		return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))
	default:
		syncCtx.Recorder().Warningf("ManagementStateUnknown", "Unrecognized operator management state %q", operatorConfigSpec.ManagementState)
		return nil
	}

       var errors []err
       // or if we rename syncDisabledAPIServices to something else then
       // we could call it syncDisabledAPIServices (which would also set the new status)
       err = c.reconcileDisabledServices()
       if err != nil { // append to errors }
       
       err = c.syncEnabledServices()
       if err != nil { // append to errors }

      return errors
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would syncEnabledServices include calling the preconditionForEnabledAPIServices as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think so :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an optimisation step we could collect conditions from the individual sync* methods and update the config just once :)

@ingvagabund
Copy link
Member Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 1, 2023
@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch from 8d1b529 to fcd73e9 Compare June 1, 2023 13:40
@ingvagabund
Copy link
Member Author

/unlabel tide/merge-method-squash

@ingvagabund
Copy link
Member Author

/remove-label tide/merge-method-squash

@ingvagabund
Copy link
Member Author

/shrug

@openshift-ci openshift-ci bot added ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯ and removed tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. labels Jun 1, 2023
@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch 3 times, most recently from 05aff02 to ea20618 Compare June 15, 2023 11:23
@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch from ea20618 to 8a3367e Compare June 15, 2023 11:28

for _, apiService := range apiServices {
if apiServiceObj, err := c.apiservicelister.Get(apiService.Name); err == nil {
if apiServiceObj.DeletionTimestamp != nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically objects can stuck in deletion (finalisers) for a long time.
Do we want to handle this case somehow (log, set something on the condition) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncEnabledAPIServices returns an error when the corresponding api service Available condition is not ready. Updated the code to return error when a corresponding service is not deleted yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: we decided to log instead of returning the error

if err != nil {
if _, _, updateErr := v1helpers.UpdateStatus(ctx, c.operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{
Type: "APIServicesAvailable",
Status: operatorv1.ConditionFalse,
Reason: "ErrorCheckingPrecondition",
Message: err.Error(),
})); updateErr != nil {
return errors.NewAggregate([]error{err, updateErr})
}), v1helpers.UpdateConditionFn(conditionAPIServicesDegraded)); updateErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to complicate the code but it looks like it might be easy to forget which conditions should be updated. I like the way it was done in the workload controller (

var updateGenerationFn func(newStatus *operatorv1.OperatorStatus) error
). Maybe we could do something similar here?

I think it boils down to:

  1. define conditions with default values
  2. register an update function for conditions in the defer statement
  3. when you need to end processing, just set the condition and return (it will be updated in the defer statement).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I did not find any use for updateGenerationFn though.

@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch from 8a3367e to ac3fc9b Compare June 19, 2023 11:43
Status: operatorv1.ConditionTrue,
}

if syncEnabledAPIServicesErr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check if for any error or always for simplicity, wdyt?

// the next process will perform the checks immediately after the startup
select {
case <-ctx.Done():
nerr := fmt.Errorf("the operator is shutting down, skipping updating availability of the aggreaged APIs, err = %v", syncEnabledAPIServicesErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adjust comment so that it is more generic

@ingvagabund ingvagabund force-pushed the apiservice-controller-allow-to-disable-apiservices branch from ac3fc9b to 75c7d51 Compare June 19, 2023 14:04
@p0lyn0mial
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund, p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2023

@ingvagabund: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@p0lyn0mial
Copy link
Contributor

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2023
@p0lyn0mial
Copy link
Contributor

just one more thing, it looks like we are going to always set the new condition, even if the toRemoveServices was empty.

@p0lyn0mial
Copy link
Contributor

just one more thing, it looks like we are going to always set the new condition, even if the toRemoveServices was empty.

it seems that setting the new condition conditionally would make the code more complicated. Besides on clusters with this feature disabled it will always be set to false.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 19, 2023
@openshift-merge-robot openshift-merge-robot merged commit 654ebf2 into openshift:master Jun 19, 2023
4 checks passed
@ingvagabund ingvagabund deleted the apiservice-controller-allow-to-disable-apiservices branch June 19, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants