New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WRKLDS-728: controller/apiservice: allow to disable/delete an API service resources #1534
WRKLDS-728: controller/apiservice: allow to disable/delete an API service resources #1534
Conversation
@ingvagabund: This pull request references WRKLDS-728 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// GetAPIServicesToMangeFunc provides a full list of managed APIService items. | ||
// The list needs to always contain all the managed APIServices so only managed | ||
// objects are removed without removing user-created/unmanaged objects. | ||
type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this function couldn't return services that are enabled :) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this is not a backward-compatible change which will break users of this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will you know if an api service is not enabled ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this function couldn't return services that are enabled :) ?
The controller needs a list of APIServices that are disabled. So they can be deleted if present.
how will you know if an api service is not enabled ?
Configurable by anyone who consumes the APIService controller. Atm, the OA operator constructs the list based on enabled capabilities.
also the title of this pr is a bit misleading for me, are we actually going to delete something? |
curr := coordinate{namespace: apiService.Spec.Service.Namespace, name: apiService.Spec.Service.Name} | ||
for _, managedAPIService := range apiServices { | ||
if !managedAPIService.Enabled { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are skipping disabled services I would simply return only enabled services in GetAPIServicesToMangeFunc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both newEndpointPrecondition
and checkDiscoveryForByAPIServices
can be probably given only a list of enabled services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I would prefer that :)
for _, managedApiService := range apiServices { | ||
// Remove disabled APIService | ||
if !managedApiService.Enabled { | ||
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, managedApiService.APIService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry, i was reading top-to-bottom, i think this is the place that will remove an api service
// GetAPIServicesToMangeFunc provides a full list of managed APIService items. | ||
// The list needs to always contain all the managed APIServices so only managed | ||
// objects are removed without removing user-created/unmanaged objects. | ||
type GetAPIServicesToMangeFunc func() ([]ManagedAPIService, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered adding a new function which will return services to be removed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or changing the signature of this method to return enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be an alternative. In any case, still a breaking change.
type ManagedAPIService struct { | ||
// Enabled signals whether an APIService is enabled. | ||
// Any disabled APIService is removed from a cluster. | ||
Enabled bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this field be used by all clients or only for some?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once such struct is created, it's intended to be used only by the controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wanted to ask was if all operands will be required to provide this information.
In general my preference would be to check how many operators consume this function.
If the number is low then I would change the signature of the method to enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://github.com/search?q=org%3Aopenshift%20NewAPIServiceController&type=code NewAPIServiceController
is used only once.
Based on https://github.com/search?q=org%3Aopenshift%20WithAPIServiceController&type=code WithAPIServiceController
is used only by openshift-apiserver and cluster authentication operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I wanted to ask was if all operands will be required to provide this information.
Yeah. They have to. https://github.com/openshift/cluster-authentication-operator/blob/1b2c022034879e09e324bbdc50b84b78ce8de4ee/pkg/operator/starter.go#L793-L798 builds the list explicitly as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to provide a filter function that if not provide will include every api service. On the other hand such a filter will get invoked all over again instead of constructing a list of enabled/disabled service only once at the beginning. The list of enabled/disabled api services is expected to be static or changed very rarely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, since there are only two operators using this controller I would:
- Change the signature of the method to
GetAPIServicesToMangeFunc func() (enabled []*apiregistrationv1.APIService, disabled []*apiregistrationv1.APIService, error)
- Rename
precondition
topreconditionForEnabledAPIServices
- Split the
sync
method into two flows, one for reconcilingenabled
services and the second for reconcilingdisabled
services.
Does it make sense ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It looks better now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have added a few more comments/questions, PTAL.
faf15e3
to
3ca87e7
Compare
@@ -118,7 +122,7 @@ func (c *APIServiceController) sync(ctx context.Context, syncCtx factory.SyncCon | |||
return err | |||
} | |||
|
|||
err = c.syncAPIServices(ctx, apiServices, syncCtx.Recorder()) | |||
err = c.syncAPIServices(ctx, enabledApiServices, disabledApiServices, syncCtx.Recorder()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a separate method for reconciling disalbedApiServices
and calling it before preconditionForEnabledAPIServices
(line 101)?
Having two separate methods would allow us to reconcile enabledApiServices
and disabledApiServices
independently. I think we always want to reconcile both and aggregate the errors. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
errs := []error{} | ||
var availableConditionMessages []string | ||
|
||
for _, apiService := range apiServices { | ||
for _, apiService := range disabledApiServices { | ||
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil && !apierrors.IsNotFound(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get an APIService from the cache
first as that would allow us to put less pressure on the server.
We should issue a live request only when we know the resource must be deleted.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
type GetAPIServicesToMangeFunc func() ([]*apiregistrationv1.APIService, error) | ||
// GetAPIServicesToMangeFunc provides a two list of managed APIService items. | ||
// Both lists need to always contain all the managed APIServices so only managed | ||
// objects are removed without removing user-created/unmanaged objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by without removing user-created/unmanaged objects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated. The controller needs to avoid deleting APIServices that are not part of the OpenShift API. I.e. the controller needs to explicitly know which APIServices are disabled instead of deleting all not enabled APIServices.
/retest-required |
|
||
operatorClient v1helpers.OperatorClient | ||
kubeClient kubernetes.Interface | ||
apiregistrationv1Client apiregistrationv1client.ApiregistrationV1Interface | ||
apiservicelister apiregistrationv1lister.APIServiceLister |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't that strange that this controller didn't use apiservicelister
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not taken into account when #667 was reviewed.
for _, apiService := range apiServices { | ||
if err := c.apiregistrationv1Client.APIServices().Delete(ctx, apiService.Name, metav1.DeleteOptions{}); err != nil { | ||
for _, apiService := range append(enabledApiServices, disabledApiServices...) { | ||
if _, err := c.apiservicelister.Get(apiService.Name); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about simply calling syncDisabledAPIServices
with combined enabledApiServices, disabledApiServices
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
for _, apiService := range apiServices { | ||
if _, err := c.apiservicelister.Get(apiService.Name); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also consider a case in which a service has been already marked for deletion but hasn't been deleted?
(I think that would require checking the DeletionTimestamp
field)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be idempotent. Deleting an object that's in the process of deletion should be a no-op. While Get
should still return the object without an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is idempotent but we will hammer the API unnecessary. Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this will reduce the toll. On the other hand I wonder if checking the DeletionTimestamp
is something we already do in other controllers or whether this is a novel improvement. If the latter, I prefer to refactor the code more and implement generic methods like DeleteOnlyIfPresent. Probably automatically generated for all our APIs. So we don't re-invent the check all over the place. Nevertheless, as part of another PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is novel approach. For example, it is being used in:
I can try to find more examples if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we have an example in our operators. Nevertheless, checking for the DeletionTimestamp
is easy to implement. Will do.
|
||
disErr := c.syncDisabledAPIServices(ctx, disabledApiServices) | ||
|
||
ready, err := c.preconditionForEnabledAPIServices(enabledApiServices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if both syncDisabledAPIServices
and preconditionForEnabledAPIServices
return some error?
do we want to report errors from syncDisabledAPIServices
?
i think the errors will be ignored (not reported) as it is right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If syncDisabledAPIServices
returns an error, there are two cases that can happen:
preconditionForEnabledAPIServices
returnserr != nil
or!ready
in which case theAPIServicesAvailable
condition gets reported as false. Making error returned bysyncDisabledAPIServices
kinda obsolete since theAPIServicesAvailable
is already false.preconditionForEnabledAPIServices
returnserr == nil
andready
in which case the error returned bysyncDisabledAPIServices
gets checked. ProducingAPIServicesAvailable
condition set to false. Just with a different error string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setting APIServicesAvailable
when syncDisabledAPIServices returns some errors
make sense ?
If all services were available and we failed to remove one service would you expect to find the error in APIServicesAvailable
? Maybe we should report it as a separate condition? (Especially that we already made distinction between enabled/disabled on the API level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation does not make the reported condition worse. Right now it will get reported with Reason: Error
. Whether that happens before or after preconditionForEnabledAPIServices
makes no difference. Failing to remove or create an APIservice object falls under the same category of an error. If we decide to distinguish between error related to removing and creating an APIService, we should probably introduce a new condition for both Failed to create
and Failed to check discovery
as well (implemented under syncEnabledAPIServices
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether there's a value in reporting non-nil error for return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))
through a condition under:
switch operatorConfigSpec.ManagementState {
...
case operatorsv1.Removed:
enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
if err != nil {
return err
}
return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want to go Degraded
. I think that:
Type: "APIServicesDeleted",
Status: operatorv1.ConditionFalse,
Reason: "DisabledAPIServicesPresent",
Message: err.Error(),
should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, We could split the APIServiceController.sync
method into two flows for simplicity. I think I would do:
func (c *APIServiceController) sync(ctx context.Context, syncCtx factory.SyncContext) error {
operatorConfigSpec, _, _, err := c.operatorClient.GetOperatorState()
if err != nil {
return err
}
switch operatorConfigSpec.ManagementState {
case operatorsv1.Managed:
case operatorsv1.Unmanaged:
return nil
case operatorsv1.Removed:
enabledApiServices, disabledApiServices, err := c.getAPIServicesToManageFn()
if err != nil {
return err
}
return c.syncDisabledAPIServices(ctx, append(enabledApiServices, disabledApiServices...))
default:
syncCtx.Recorder().Warningf("ManagementStateUnknown", "Unrecognized operator management state %q", operatorConfigSpec.ManagementState)
return nil
}
var errors []err
// or if we rename syncDisabledAPIServices to something else then
// we could call it syncDisabledAPIServices (which would also set the new status)
err = c.reconcileDisabledServices()
if err != nil { // append to errors }
err = c.syncEnabledServices()
if err != nil { // append to errors }
return errors
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would syncEnabledServices
include calling the preconditionForEnabledAPIServices
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I think so :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an optimisation step we could collect conditions
from the individual sync*
methods and update the config just once :)
/label tide/merge-method-squash |
8d1b529
to
fcd73e9
Compare
/unlabel tide/merge-method-squash |
/remove-label tide/merge-method-squash |
/shrug |
05aff02
to
ea20618
Compare
ea20618
to
8a3367e
Compare
|
||
for _, apiService := range apiServices { | ||
if apiServiceObj, err := c.apiservicelister.Get(apiService.Name); err == nil { | ||
if apiServiceObj.DeletionTimestamp != nil { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically objects can stuck in deletion (finalisers) for a long time.
Do we want to handle this case somehow (log, set something on the condition) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syncEnabledAPIServices
returns an error when the corresponding api service Available condition is not ready. Updated the code to return error when a corresponding service is not deleted yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: we decided to log instead of returning the error
if err != nil { | ||
if _, _, updateErr := v1helpers.UpdateStatus(ctx, c.operatorClient, v1helpers.UpdateConditionFn(operatorv1.OperatorCondition{ | ||
Type: "APIServicesAvailable", | ||
Status: operatorv1.ConditionFalse, | ||
Reason: "ErrorCheckingPrecondition", | ||
Message: err.Error(), | ||
})); updateErr != nil { | ||
return errors.NewAggregate([]error{err, updateErr}) | ||
}), v1helpers.UpdateConditionFn(conditionAPIServicesDegraded)); updateErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to complicate the code but it looks like it might be easy to forget which conditions should be updated. I like the way it was done in the workload controller (
var updateGenerationFn func(newStatus *operatorv1.OperatorStatus) error |
I think it boils down to:
- define conditions with default values
- register an update function for conditions in the defer statement
- when you need to end processing, just set the condition and return (it will be updated in the defer statement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I did not find any use for updateGenerationFn
though.
8a3367e
to
ac3fc9b
Compare
Status: operatorv1.ConditionTrue, | ||
} | ||
|
||
if syncEnabledAPIServicesErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check if for any error or always for simplicity, wdyt?
// the next process will perform the checks immediately after the startup | ||
select { | ||
case <-ctx.Done(): | ||
nerr := fmt.Errorf("the operator is shutting down, skipping updating availability of the aggreaged APIs, err = %v", syncEnabledAPIServicesErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just adjust comment so that it is more generic
ac3fc9b
to
75c7d51
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ingvagabund, p0lyn0mial The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ingvagabund: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold |
just one more thing, it looks like we are going to always set the new condition, even if the |
it seems that setting the new condition conditionally would make the code more complicated. Besides on clusters with this feature disabled it will always be set to /hold cancel |
SSIA
Needed by openshift/cluster-openshift-apiserver-operator#532