Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-20085: IBMCloud: Handle disk delete errors #7515

Merged

Conversation

cjschaef
Copy link
Member

Handle cases when IBM Cloud disk deletion returns an error, without any response data.

@cjschaef
Copy link
Member Author

/retest-required

@sadasu
Copy link
Contributor

sadasu commented Sep 26, 2023

/test e2e-ibmcloud-ovn

@sadasu
Copy link
Contributor

sadasu commented Sep 26, 2023

@cjschaef do you want to open a Jira issue for this ? 1. This seems like a bug fix 2. Will be needed if you want to backport this to older releases.

@cjschaef
Copy link
Member Author

cjschaef commented Oct 2, 2023

@sadasu
Yeah, I will have to get a Jira opened to get this backported. I'll see if I can compile the details and do that, and retitle this with that OCPBUG once I have it.

cc @MayXuQQ I'll take your data to open the bug, but reproducing the error will likely be hard, since it is a very unique case from IBM Cloud API's.

@cjschaef
Copy link
Member Author

cjschaef commented Oct 4, 2023

/retitle OCPBUGS-20085: IBMCloud: Handle disk delete errors

@openshift-ci openshift-ci bot changed the title IBMCloud: Handle disk delete errors OCPBUGS-20085: IBMCloud: Handle disk delete errors Oct 4, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Oct 4, 2023
@openshift-ci-robot
Copy link
Contributor

@cjschaef: This pull request references Jira Issue OCPBUGS-20085, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Handle cases when IBM Cloud disk deletion returns an error, without any response data.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@cjschaef: This pull request references Jira Issue OCPBUGS-20085, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

In response to this:

Handle cases when IBM Cloud disk deletion returns an error, without any response data.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from gpei October 4, 2023 14:52
@cjschaef
Copy link
Member Author

cjschaef commented Oct 4, 2023

/test e2e-ibmcloud-ovn

@cjschaef
Copy link
Member Author

cjschaef commented Oct 5, 2023

/retest

1 similar comment
@cjschaef
Copy link
Member Author

cjschaef commented Oct 6, 2023

/retest

@MayXuQQ
Copy link
Contributor

MayXuQQ commented Oct 6, 2023

@cjschaef I just found this bug once

@MayXuQQ
Copy link
Contributor

MayXuQQ commented Oct 7, 2023

found this issue in https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-ibmcloud-ipi-rt-f28/1710327579702988800
version: 4.13.0-0.nightly-2023-10-05-083326

level=info msg=Deleted instance "ci-op-lg1g50r7-6990f-fhmz9-worker-3-b9645"
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=3, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=4, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=5, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=6, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=7, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=8, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
level=info msg=UNEXPECTED RESULT, Re-attempting execution .., attempt=9, retry-gap=10, max-retry-Attempts=30, stopRetry=false, error=
E1006 23:21:22.092062 37 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 188 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x6309820?, 0x229b65d0})
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0001182a0?})
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x6309820, 0x229b65d0})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1()
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc0001c8b00, 0xc0005316e8)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc0001c8b00, {{0xc00159a2a0, 0x29}, {0xc00159a2d0, 0x28}, {0xc000c7a624, 0x9}, {0x7c72f73, 0x4}, {0xc00159a2a0, ...}})
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc0001c8b00)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x508
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000580000})
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1dde5418?, 0xc000130000?}, 0xc000487e90?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:235 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x1dde5418, 0xc000130000}, 0xd0?, 0x132ea05?, 0x30?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:582 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x1dde5418, 0xc000130000}, 0x4130a7?, 0x28?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:568 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0xc000916718?, 0xc000487f70?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:557 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc0001c8b00, {{0x7c73c5c?, 0xc000487fd0?}, 0xc001672140?}, 0x53eaaa?, 0xc00088001c?)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xba5
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x537bdea]

goroutine 188 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0001182a0?})
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x6309820, 0x229b65d0})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion.func1()
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:84 +0x12a
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).Retry(0xc0001c8b00, 0xc0005316e8)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:99 +0x73
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).waitForDiskDeletion(0xc0001c8b00, {{0xc00159a2a0, 0x29}, {0xc00159a2d0, 0x28}, {0xc000c7a624, 0x9}, {0x7c72f73, 0x4}, {0xc00159a2a0, ...}})
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:78 +0x14f
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyDisks(0xc0001c8b00)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/disk.go:118 +0x508
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction.func1()
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:201 +0x3f
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x18, 0xc000580000})
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x1dde5418?, 0xc000130000?}, 0xc000487e90?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:235 +0x57
k8s.io/apimachinery/pkg/util/wait.poll({0x1dde5418, 0xc000130000}, 0xd0?, 0x132ea05?, 0x30?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:582 +0x38
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfiniteWithContext({0x1dde5418, 0xc000130000}, 0x4130a7?, 0x28?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:568 +0x49
k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0xc000916718?, 0xc000487f70?)
/go/src/github.com/openshift/installer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:557 +0x46
github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).executeStageFunction(0xc0001c8b00, {{0x7c73c5c?, 0xc000487fd0?}, 0xc001672140?}, 0x53eaaa?, 0xc00088001c?)
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:198 +0x108
created by github.com/openshift/installer/pkg/destroy/ibmcloud.(*ClusterUninstaller).destroyCluster
/go/src/github.com/openshift/installer/pkg/destroy/ibmcloud/ibmcloud.go:172 +0xba5

@cjschaef
Copy link
Member Author

cjschaef commented Oct 9, 2023

/retest

3 similar comments
@cjschaef
Copy link
Member Author

/retest

@cjschaef
Copy link
Member Author

/retest

@cjschaef
Copy link
Member Author

/retest

@sadasu
Copy link
Contributor

sadasu commented Oct 18, 2023

/test e2e-ibmcloud-ovn

@sadasu
Copy link
Contributor

sadasu commented Oct 18, 2023

@cjschaef what do the errors in e2e-ibmcloud-ovn indicate?

@sadasu
Copy link
Contributor

sadasu commented Oct 19, 2023

/approve

e2e-ibmcloud-ovn has been perma-failing and this fix has not changed that.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2023
@cjschaef
Copy link
Member Author

@sadasu Sorry for the delay, yes the major failure I have seen is the event check for csi-drivers, which appears to occur frequently on 4.15.

I haven't gotten a chance to open a Jira, but have notified IBM Cloud Storage to investigate further, likely cleaning up the repeated events causing the failure.

I am hoping the other failures are flakes, but I'll try another few times to see if everything clears up and get an issue open for events should not repeat pathologically for ns/openshift-cluster-csi-drivers

@cjschaef
Copy link
Member Author

Known CSI driver bug for IBM Cloud opened
https://issues.redhat.com/browse/OCPBUGS-22331

/test e2e-ibmcloud-ovn

@cjschaef
Copy link
Member Author

/retest

@cjschaef
Copy link
Member Author

/retest

Copy link
Contributor

openshift-ci bot commented Feb 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sadasu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}
if err != nil {
if details == nil || details.StatusCode != http.StatusNotFound {
return errors.Wrapf(err, "Failed to delete disk name=%s, id=%s.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold %s", item.name, item.id, item.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrapf(err, "Failed to delete disk name=%s, id=%s.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold %s", item.name, item.id, item.id)
return fmt.Errorf("Failed to delete disk name=%s, id=%s.If this error continues to persist for more than 20 minutes then please try to manually cleanup the volume using - ibmcloud is vold %s: %w", item.name, item.id, item.id, err)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@cjschaef cjschaef force-pushed the ibmcloud_destroy_disk_failure branch 2 times, most recently from 3d1103b to 8b60474 Compare February 6, 2024 17:03
Handle cases when IBM Cloud disk deletion returns an error,
without any response data.

Related: https://issues.redhat.com//browse/OCPBUGS-20085
@cjschaef cjschaef force-pushed the ibmcloud_destroy_disk_failure branch from 8b60474 to 13f2f9e Compare February 6, 2024 17:25
Copy link
Contributor

@barbacbd barbacbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 16bc6eb into openshift:master Feb 6, 2024
21 of 22 checks passed
@openshift-ci-robot
Copy link
Contributor

@cjschaef: Jira Issue OCPBUGS-20085: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-20085 has been moved to the MODIFIED state.

In response to this:

Handle cases when IBM Cloud disk deletion returns an error, without any response data.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cjschaef cjschaef deleted the ibmcloud_destroy_disk_failure branch February 6, 2024 20:19
@cjschaef
Copy link
Member Author

cjschaef commented Feb 6, 2024

/cherry-pick release-4.15

@openshift-cherrypick-robot

@cjschaef: new pull request created: #7984

In response to this:

/cherry-pick release-4.15

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cjschaef
Copy link
Member Author

cjschaef commented Feb 6, 2024

/cherry-pick release-4.14

@openshift-cherrypick-robot

@cjschaef: new pull request created: #7988

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cjschaef
Copy link
Member Author

cjschaef commented Feb 6, 2024

/cherry-pick release-4.13

@openshift-cherrypick-robot

@cjschaef: new pull request created: #7989

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cjschaef
Copy link
Member Author

cjschaef commented Feb 6, 2024

/cherry-pick release-4.12

@openshift-cherrypick-robot

@cjschaef: new pull request created: #7990

In response to this:

/cherry-pick release-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-installer-altinfra-container-v4.16.0-202402070113.p0.g16bc6eb.assembly.stream.el8 for distgit ose-installer-altinfra.
All builds following this will include this PR.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.16.0-0.nightly-2024-02-07-073830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants