title | authors | reviewers | approvers | creation-date | last-updated | status | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
use-manifest-annotation-for-object-removal |
|
|
|
2020-08-03 |
2020-09-29 |
implementable |
- Enhancement is
implementable
- Design details are appropriately documented from clear requirements
- Test plan is defined
- Graduation criteria for dev preview, tech preview, GA
- User-facing documentation is created in openshift-docs
Currently CVO managed object removal is handled by jobs. Jobs have many immutable components, so they aren't a great match for CVO managed manifests. This enhancement replaces that approach with a new manifest annotation requesting the CVO delete the in-cluster object instead of creating/updating it. This will provide a more straightforward way for engineering to remove content.
There should be a straightforward way to remove CVO managed objects.
- Developers will be able to remove any of the currently managed CVO objects by modifying an existing manifest and adding the new delete annotation.
- This enhancement does not add any new objects for management by CVO, therefore the new delete annotation only applies to the currently supported manifests and any that may be added in the future.
- This enhancement neither adds nor requires any application specific delete logic within CVO.
When the following annotation appears in a CVO supported manifest and is set to "true" the associated object will be removed from the cluster by the CVO.
Values other than true
will result in a CVO failure.
However this should never occur in a release since any value other than true
will result in CVO CI failure.
apiVersion: apps/v1
...
metadata:
...
annotations:
release.openshift.io/delete: "true"
The existing CVO ordering scheme defined here will also be used for object removal. This both minimizes CVO code changes and provides a simple method of deleting multiple objects by reversing the order in which they were created. It is the developer's responsibility to ensure proper deletion ordering and to ensure that all items originally created by an object are deleted when that object is deleted. For example, an operator may have to be modified, or a new operator created, to take explicit actions to remove external resources. The modified or new operator would then be removed in a subsequent update.
Two delete finalization options are described below to help reach a consensus on the best approach.
Applicable to both options, CVO will produce a warning if a resource previously removed reappears.
The first approach is to handle deletion requests similar to how CVO handles create/update requests.
This is in a non-blocking manner whereby CVO issues the initial request to delete an object kicking off resource finalization and after which resource removal.
CVO does not wait for actual resource removal but instead continues.
As CVO manifest processing continues the object's .metadata.deletionTimestamp
, set when the delete was originally requested, informs CVO that this object has already been processed for deletion.
CVO will report when a delete is initiated, that the delete is ongoing when a manifest is processed again and found to have a deletion time stamp, and completed upon delete resource finalization.
If an object cannot be successfully removed CVO will set Upgradeable=false
which in turn blocks cluster update to the next minor release.
The advantage of this approach is that unfinalized manifests do not block the remainder of the current manifest graph's application. Also, since this approach is similar to current CVO manifest graph processing it will most likely be easier to implement and maintain.
The second approach is to handle deletion requests synchronously whereby CVO will wait for confirmation that the given object has been removed before continuing through the manifest graph.
The advantage of this approach is that it is deterministic - CVO will block until the given object is removed or we give up and fail the upgrade. But with this approach unfinalized manifests may block the remainder of the current manifest graph's application.
The user is an OpenShift developer responsible for the development and maintenance of an OpenShift component. The following user stories provide guidance on how resources may be removed but this will vary depending on the component. Ultimately it is the developer's responsibility to ensure the removal works by thoroughly testing. In all cases, and as general guidance, an operator should never allow itself to be removed if the operator's operand has not been removed.
Remove the cluster-autoscaler-operator deployment. The existing cluster-autoscaler-operator deployment manifest 0000_50_cluster-autoscaler-operator_07_deployment.yaml is modified to contain the delete annotation:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler-operator
namespace: openshift-machine-api
annotations:
release.openshift.io/delete: "true"
...
Additional manifest properties such as spec
may be set if convenient (e.g. because you are looking to make a minimal change vs. a previous version of the manifest), but those properties have no affect on manifests with the delete annotation.
In release 4.5 two jobs, openshift-service-catalog-controller-manager-remover and openshift-service-catalog-apiserver-remover, were created to remove the Service Catalog. Now, for release 4.6, these jobs and all their supporting cluster objects must also be removed. This User Story shows how Service Catalog removal would have been executed had this enhancement been in place, and will be completed given its current state in 4.5.
The Service Catalog is composed of two components, the cluster-svcat-apiserver-operator and the cluster-svcat-controller-manager-operator. Each of these components use manifests for creation/update of the component's required resources: namespace, roles, operator deployment, etc. The cluster-svcat-apiserver-operator had the following associated manifests:
0000_50_cluster-svcat-apiserver-operator_00_namespace.yaml
containing theopenshift-service-catalog-apiserver-operator
namespace. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_02_config.crd.yaml
containing a cluster-scoped CRD. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_03_config.cr.yaml
containing a cluster-scoped, create-only ServiceCatalogAPIServer. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_03_configmap.yaml
containing a ConfigMap in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the ConfigMap would be removed as part of the namespace deletion.0000_50_cluster-svcat-apiserver-operator_03_version-configmap.yaml
containing another ConfigMap in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the ConfigMap would be removed as part of the namespace deletion.0000_50_cluster-svcat-apiserver-operator_04_roles.yaml
containing a cluster-scoped ClusterRoleBinding. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_05_serviceaccount.yaml
containing a ServiceAccount in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the ServiceAccount would be removed as part of the namespace deletion.0000_50_cluster-svcat-apiserver-operator_06_service.yaml
containing a Service in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the Service would be removed as part of the namespace deletion.0000_50_cluster-svcat-apiserver-operator_07_deployment.yaml
containing a Deployment in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the Deployment would be removed as part of the namespace deletion.0000_50_cluster-svcat-apiserver-operator_08_cluster-operator.yaml
containing a cluster-scoped ClusterOperator. The deletion annotation would be added to this manifest.0000_90_cluster-svcat-apiserver-operator_00_prometheusrole.yaml
containing a Role in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the Role would be removed as part of the namespace deletion.0000_90_cluster-svcat-apiserver-operator_01_prometheusrolebinding.yaml
containing a RoleBinding in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the RoleBinding would be removed as part of the namespace deletion.0000_90_cluster-svcat-apiserver-operator_02-operator-servicemonitor.yaml
containing a ServiceMonitor in theopenshift-service-catalog-apiserver-operator
namespace. This manifest would be dropped, because the ServiceMonitor would be removed as part of the namespace deletion.
So the remaining manifests with deletion annotations would be the namespace and the cluster-scoped CRD, ServiceCatalogAPIServer, ClusterRoleBinding, and ClusterOperator. The ordering of the surviving manifests would not be particularly important, although keeping the namespace first to avoid removing the ClusterRoleBinding while the consuming Deployment was still running. Although in the event of racing deletions, it's hard to see how a Deployment whose ClusterRoleBinding had been removed could get up to much trouble. However, in situations like this where multiple deletions are required, it is up to the developer to name the manifests such that deletions occur in the correct order.
Similar handling would be required for the svcat-controller-manager operator.
If resources external to kubernetes must be removed the developer must provide the means to do so. This is expected to be done through modification of an operator to do the removals during it's finalization. If operator modification for object removal is necessary that operator would be deleted in a subsequent update. This eliminates the need for new, possibly complex, CVO logic to handle both an update and a delete of the same object.
If this enhancement had been implemented in 4.5 with the deletion manifests proposed above, the deletion manifests would have been preserved through 4.5.z and removed in 4.6. See the Upgrade / Downgrade Strategy section for details.
Because this enhancement was not implemented in 4.5, the cluster-svcat-apiserver-operator has the following associated manifests:
0000_50_cluster-svcat-apiserver-operator_00_namespace.yaml
containing theopenshift-service-catalog-removed
namespace. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_04_roles.yaml
containing a cluster-scoped ClusterRoleBinding. The deletion annotation would be added to this manifest.0000_50_cluster-svcat-apiserver-operator_05_serviceaccount.yaml
containing a ServiceAccount in theopenshift-service-catalog-removed
namespace. This manifest would be dropped, because the ServiceAccount would be removed as part of the namespace deletion.0000_90_cluster-svcat-apiserver-operator_01_remover_job.yaml
containing a Job in theopenshift-service-catalog-removed
namespace. This manifest would be dropped, because the Job would be removed as part of the namespace deletion.
Below is the flow for removing functionality that users might notice, like the web console.
- The first step is deprecating the functionality. During the deprecation release 4.y, the functionality should remain available, with the operator setting Upgradeable=False and linking release notes like these.
- Cluster Administrators must follow the linked release notes to opt in to the removal before updating to the next minor release in 4.(y+1). When the administrator opts in to the removal, the operator should stop setting Upgradeable=False.
- The operand may be removed when Cluster Administrators opt-in to the removal, or it may be left running and be removed during the transition to the next minor release 4.(y+1).
- The update to the next minor release 4.(y+1) may use the new manifest annotation to remove the operator, and, if they have not already been removed, any remaining operand components.
CVO will use its existing logic to discover and identify given manifests. Once the manifest has been properly identified CVO will check for the delete annotation release.openshift.io/delete. If found and set to "true" the associated object will be removed from the cluster. All other annotation values will result in a CVO failure during CVO CI testing.
A common risk is human error resulting in an incorrect annotation being placed in a manifest. This type of error will be caught by CVO CI tests and should therefore never make it to the field.
Another risk is that object deletion results in incomplete object removal. This risk is mitigated by the fact that the object is no longer needed, hence the deletion request, so its continued presence, or partial presence, should not affect cluster operation.
- Existing resource unit tests will be expanded to include deletion.
- e2e testing on a test cluster will also be performed.
GA. When it works, we ship it.
Special consideration must be given to subsequent updates which may still contain manifests with the delete annotation.
These manifests will result in object no longer exists
errors assuming the current release had properly and fully removed the given objects.
This enhancement proposes that it is acceptable for subsequent z-level updates to still contain the delete manifests but minor level updates should not and therefore the handling of the delete error will differ between these update levels.
A z-level update will be allowed to proceed while a minor level update will be blocked.
This will be accomplished through the existing CVO precondition mechanism which already behaves in this manner with regard to z-level and minor updates.
No special consideration.
Major milestones in the life cycle of a proposal should be tracked in Implementation History
.
The idea is to find the best form of an argument why this enhancement should not be implemented.
To be done once a finalization option is chosen.