Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-23126: Fix a bug on deletion of a hostedcluster #3234

Merged
merged 1 commit into from Dec 7, 2023

Conversation

nunnatsa
Copy link
Contributor

@nunnatsa nunnatsa commented Nov 26, 2023

What this PR does / why we need it

A user destroying a HostedCluster can cause the HostedCluster to hang indefinitely if the destroy command times out during execution.

This is due to the hcp cli placing a finalizer on the HostedCluster during deletion which the cli tool later removes after waiting for some clean up actions to occur. If a user cancels the hcp destroy cluster command (or the command times out) while the cli is waiting for cleanup, then the HostedCluster will hang indefinitely with a DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the setting and removing the special finalizer, for platform that actually creates additional resources. Also, this PR sets the hostedCluster as the owner for these secrets, so they will be deleted with the HostedCluster.

Which issue(s) this PR fixes

Fixes #OCPBUGS-23126

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 26, 2023
@openshift-ci-robot
Copy link

@nunnatsa: This pull request references Jira Issue OCPBUGS-23126, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 26, 2023
@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI and removed do-not-merge/needs-area labels Nov 26, 2023
@nunnatsa
Copy link
Contributor Author

/test e2e-aws
/test e2e-kubevirt-aws-ovn

@openshift-ci openshift-ci bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label Nov 26, 2023
@nunnatsa nunnatsa changed the title WIP: OCPBUGS-23126: Fix a bug on deletion of a hostedcluster OCPBUGS-23126: Fix a bug on deletion of a hostedcluster Nov 27, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2023
@nunnatsa
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link

@nunnatsa: This pull request references Jira Issue OCPBUGS-23126, which is invalid:

  • expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nunnatsa
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 27, 2023
@openshift-ci-robot
Copy link

@nunnatsa: This pull request references Jira Issue OCPBUGS-23126, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @LiangquanLi930

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@nunnatsa: This pull request references Jira Issue OCPBUGS-23126, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @LiangquanLi930

In response to this:

What this PR does / why we need it

A user destroying a HostedCluster can cause the HostedCluster to
hang indefinitely if the destroy command times out during execution

This is due to the hcp cli placing a finalizer on the HostedCluster
during deletion which the cli tool later removes after waiting for
some clean up actions to occur. If a user cancels the hcp destroy cluster command (or the command times out) while the cli is waiting
for cleanup, then the HostedCluster will hang indefinitely with a
DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the
setting and removing the special finalizer, for platform that
actually creates additional resources. Also, this PR removes the
secret cleaning from the cmd, and instead, set the hostedCluster
as the owner for these secrets, so they will be deleted with the
HostedCluster.

Which issue(s) this PR fixes

Fixes #OCPBUGS-23126

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@nunnatsa: This pull request references Jira Issue OCPBUGS-23126, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @LiangquanLi930

In response to this:

What this PR does / why we need it

A user destroying a HostedCluster can cause the HostedCluster to hang indefinitely if the destroy command times out during execution.

This is due to the hcp cli placing a finalizer on the HostedCluster during deletion which the cli tool later removes after waiting for some clean up actions to occur. If a user cancels the hcp destroy cluster command (or the command times out) while the cli is waiting for cleanup, then the HostedCluster will hang indefinitely with a DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the setting and removing the special finalizer, for platform that actually creates additional resources. Also, this PR removes the secret cleaning from the cmd, and instead, set the hostedCluster as the owner for these secrets, so they will be deleted with the HostedCluster.

Which issue(s) this PR fixes

Fixes #OCPBUGS-23126

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nunnatsa nunnatsa force-pushed the fix-OCPBUGS-23126 branch 5 times, most recently from 4bdec0a to 7129c9e Compare November 30, 2023 12:00
A user destroying a HostedCluster can cause the HostedCluster to
hang indefinitely if the destroy command times out during execution

This is due to the hcp cli placing a finalizer on the HostedCluster
during deletion which the cli tool later removes after waiting for
some clean up actions to occur. If a user cancels the `hcp destroy
cluster` command (or the command times out) while the cli is waiting
for cleanup, then the HostedCluster will hang indefinitely with a
DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the
setting and removing the special finalizer, for platform that
actually creates additional resources. Also, this PR sets the
hostedCluster as the owner for these secrets, so they will be
deleted with the HostedCluster.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
@nunnatsa
Copy link
Contributor Author

nunnatsa commented Dec 5, 2023

/test e2e-aws

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Dec 5, 2023

/test e2e-aws
/test e2e-kubevirt-aws-ovn

1 similar comment
@nunnatsa
Copy link
Contributor Author

nunnatsa commented Dec 5, 2023

/test e2e-aws
/test e2e-kubevirt-aws-ovn

Copy link
Contributor

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2023
Copy link
Contributor

openshift-ci bot commented Dec 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel, nunnatsa

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 7f8fcfb and 2 for PR HEAD 5d4ed3f in total

@nunnatsa
Copy link
Contributor Author

nunnatsa commented Dec 6, 2023

/retest-required

1 similar comment
@nunnatsa
Copy link
Contributor Author

nunnatsa commented Dec 6, 2023

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 39a0c45 and 1 for PR HEAD 5d4ed3f in total

@sjenning
Copy link
Contributor

sjenning commented Dec 6, 2023

/retest-required

@davidvossel
Copy link
Contributor

/cherry-pick release-4.14

@openshift-cherrypick-robot

@davidvossel: once the present PR merges, I will cherry-pick it on top of release-4.14 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Dec 6, 2023

@nunnatsa: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-roks 5d4ed3f link false /test e2e-ibmcloud-roks
ci/prow/e2e-ibmcloud-iks 5d4ed3f link false /test e2e-ibmcloud-iks

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 46ac206 and 0 for PR HEAD 5d4ed3f in total

@openshift-merge-bot openshift-merge-bot bot merged commit 5b9c536 into openshift:main Dec 7, 2023
12 of 14 checks passed
@openshift-ci-robot
Copy link

@nunnatsa: Jira Issue OCPBUGS-23126: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-23126 has been moved to the MODIFIED state.

In response to this:

What this PR does / why we need it

A user destroying a HostedCluster can cause the HostedCluster to hang indefinitely if the destroy command times out during execution.

This is due to the hcp cli placing a finalizer on the HostedCluster during deletion which the cli tool later removes after waiting for some clean up actions to occur. If a user cancels the hcp destroy cluster command (or the command times out) while the cli is waiting for cleanup, then the HostedCluster will hang indefinitely with a DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the setting and removing the special finalizer, for platform that actually creates additional resources. Also, this PR sets the hostedCluster as the owner for these secrets, so they will be deleted with the HostedCluster.

Which issue(s) this PR fixes

Fixes #OCPBUGS-23126

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@davidvossel: #3234 failed to apply on top of branch "release-4.14":

Applying: Fix a bug on deletion of a hostedcluster
Using index info to reconstruct a base tree...
M	api/fixtures/example.go
M	cmd/cluster/core/create.go
M	cmd/cluster/core/destroy.go
M	hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
M	hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
Auto-merging hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
CONFLICT (content): Merge conflict in hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
Auto-merging cmd/cluster/core/destroy.go
CONFLICT (content): Merge conflict in cmd/cluster/core/destroy.go
Auto-merging cmd/cluster/core/create.go
Auto-merging api/fixtures/example.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix a bug on deletion of a hostedcluster
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nunnatsa nunnatsa deleted the fix-OCPBUGS-23126 branch December 7, 2023 07:10
nunnatsa added a commit to nunnatsa/hypershift that referenced this pull request Dec 7, 2023
PR openshift#3234 used the wrong HostedCluster annotation to check if the cluster
may be considered as already deleted.

This PR fixes the annotation name to
`"hypershift.openshift.io/destroy-grace-period"`

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-hypershift-container-v4.15.0-202312070751.p0.g5b9c536.assembly.stream for distgit hypershift.
All builds following this will include this PR.

nunnatsa added a commit to nunnatsa/hypershift that referenced this pull request Dec 7, 2023
A user destroying a HostedCluster can cause the HostedCluster to hang
indefinitely if the destroy command times out during execution.

This is due to the hcp cli placing a finalizer on the HostedCluster
during deletion which the cli tool later removes after waiting for
some clean up actions to occur. If a user cancels the hcp destroy
cluster command (or the command times out) while the cli is waiting
for cleanup, then the HostedCluster will hang indefinitely with
a DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the
setting and removing the special finalizer, for platform that actually
creates additional resources. Also, this PR sets the hostedCluster as
the owner for these secrets, so they will be deleted with the
HostedCluster.

This PR is a manual cherry-pick of openshift#3234

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
nunnatsa added a commit to nunnatsa/hypershift that referenced this pull request Dec 7, 2023
A user destroying a HostedCluster can cause the HostedCluster to hang
indefinitely if the destroy command times out during execution.

This is due to the hcp cli placing a finalizer on the HostedCluster
during deletion which the cli tool later removes after waiting for
some clean up actions to occur. If a user cancels the hcp destroy
cluster command (or the command times out) while the cli is waiting
for cleanup, then the HostedCluster will hang indefinitely with
a DeletionTimestamp != nil.

This PR only performs the additional cleaning logic, including the
setting and removing the special finalizer, for platform that actually
creates additional resources. Also, this PR sets the hostedCluster as
the owner for these secrets, so they will be deleted with the
HostedCluster.

This PR is a manual cherry-pick of openshift#3234

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants