Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cleaning up of hosted cluster cloud resources on destroy #1672

Merged
merged 1 commit into from Aug 16, 2022

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Aug 11, 2022

What this PR does / why we need it:
Enables the removal of cloud resources created during the lifetime of a
guest cluster. These include:

  • registry storage
  • ingress load balancer and dns records
  • load balancer services
  • persistent volumes

Removal of these resources will occur when the following annotation is
present:
hypershift.openshift.io/cleanup-cloud-resources: "true"

The CLI adds this annotation to a HostedCluster when invoking the
destroy command with the 'destroy-cloud-resources' flag:

hypershift destroy cluster aws --destroy-cloud-resources

Which issue(s) this PR fixes:
Fixes #HOSTEDCP-486

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@netlify
Copy link

netlify bot commented Aug 11, 2022

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit d9df84a
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/62f536016479ee00088adb30
😎 Deploy Preview https://deploy-preview-1672--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 11, 2022
Operations: []admissionregistrationv1.OperationType{
admissionregistrationv1.Create,
},
Rule: admissionregistrationv1.Rule{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be a second rule in the first validating webhook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixing

}
// If storage has already been removed, nothing to do
// When the registry operator has been removed, management state in status is currently cleared.
if registryConfig.Status.Storage.ManagementState == "" || registryConfig.Status.Storage.ManagementState == "Removed" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Why not use the const like in the CreateOrUpdate call below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this field is not typed, it's only a string. The other field is an enum that has constants.

for i := range pvcs.Items {
pvc := &pvcs.Items[i]
log.Info("Deleting persistent volume claim", "name", client.ObjectKeyFromObject(pvc).String())
if err := r.client.Delete(ctx, pvc); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not gate this with a DeletionTimestamp checks like with services?

And should't it be possible to avoid the duplicate code for services, pvcs and pvs, maybe with a configurable filter function?

if len(errs) > 0 {
return false, errors.NewAggregate(errs)
}
return false, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to track if we actually deleted anything and return that negated, just like we do with Services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in this case, we're looking for the existence of PVs ... those will get removed once the referring pods/pvcs have been removed. However, if we got here, the initial check for PVs indicated that we still have some.


// Needed for resource cleanup
&corev1.Service{}: allSelector,
&corev1.Pod{}: allSelector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm that really sucks, especially the pods? Can we maybe use an API reading client just for the new deletion path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just for the pods? I do want to watch Services because otherwise I wouldn't necessarily get requeued when they go away.

@alvaroaleman
Copy link
Contributor

@csrwng also, can we enable this for the e2e tests?

@csrwng
Copy link
Contributor Author

csrwng commented Aug 12, 2022

@alvaroaleman addressed your comments. Also made it the default for e2e clusters.

}
log.Info("Ensuring image registry storage is removed")
removed, err := r.ensureImageRegistryStorageRemoved(ctx)
if err != nil || !removed {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to not block cleanup of other resource type by the previous resource type and only return prior to the status condition setup if there are any remaining resources?

pv("bar"), pvc("bar"),
},
verify: verifyPVCsRemoved,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a pod test here and only put the pods into the uncachedClient? That aspect is really important

Enables the removal of cloud resources created during the lifetime of a
guest cluster. These include:
- registry storage
- ingress load balancer and dns records
- load balancer services
- persistent volumes

Removal of these resources will occur when the following annotation is
present:
hypershift.openshift.io/cleanup-cloud-resources: "true"

The CLI adds this annotation to a HostedCluster when invoking the
destroy command with the 'destroy-cloud-resources' flag:

  hypershift destroy cluster aws --destroy-cloud-resources
@csrwng
Copy link
Contributor Author

csrwng commented Aug 15, 2022

@alvaroaleman addressed your latest comments

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,csrwng]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 2 against base HEAD bf26914 and 8 for PR HEAD 7eebb6f in total

@alvaroaleman
Copy link
Contributor

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 1 against base HEAD bf26914 and 7 for PR HEAD 7eebb6f in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

@csrwng: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants