Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent plugin from returning in-use error for source volumes #297

Conversation

timoreimann
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR addresses a regression in #261 causing plugins to return an in-use error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot or a volume) is deleted before the sourced volume is.

See #289 (comment) for details.

Does this PR introduce a user-facing change?:

Clean up sourced volumes prior to deleting sourcing resources

@apurv15 could you give this a try with your driver?

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 29, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @timoreimann. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 29, 2020
// in-use error.
By("cleaning up deleting the volume created from snapshot")
delVol2Req := MakeDeleteVolumeReq(sc, volume2.GetVolume().GetVolumeId())
_, err = r.DeleteVolume(context.Background(), delVol2Req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the cleanup code didn't delete volumes in the right order, correct?

I think we should fix that instead of adding custom cleanup code in the tests themselves. Suppose we add another check after creating the second volume and before deleting it: if that check then fails, test cleanup will fail like it does now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked how volumes are tracked, but perhaps clean up with last-in-first-out order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently always delete snapshots first, then volumes. Deletion in LIFO order sounds like the right approach to me, I think it should work. (For this particular test case, it means that we'd delete volume 2 first, then the snapshot, then the volume 1.)

I'll work on making the necessary updates to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the discussed adjustments. Will comment some of the changes via Github comments to add more clarity.

Please have a look and let me know what you think.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 5, 2020
type VolumeInfo struct {
// managedResourceInfos is a map of managed resource infos keyed by resource
// IDs.
type resourceInfos map[string]resourceInfo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, this could also be typed as []resourceInfo which would make the LIFO property more natural/obvious. On the downside, it would also bring extra effort to remove items from the slice as we clean up resources (volumes, snapshots) during Cleanup.

I decided to go with a map and introduced an order field to keep track of when resources where added to the list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the volume ID and snapshot ID as key, right? Is is guaranteed that those never conflict?

it would also bring extra effort to remove items from the slice as we clean up resources (volumes, snapshots) during Cleanup

That extra effort is just finding the item and then creating a slice without it, right? That doesn't seem that hard.

Without having seen what it looks like, I suspect that []resourceInfo will be the better solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change. I think you were right, this looks much simpler.

}

// volumeInfo keeps track of the information needed to delete a volume.
type volumeInfo struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unexported the type since it's effectively an implementation detail. I guess strictly speaking, this could be considered a breaking change, though I doubt anyone uses it (or rather hope not so).

If we're concerned about this, I'm also happy to undo the change and safe it for when we have more things to change that justify a new major release.

// Node on which the volume was published, empty if none
// or publishing is not supported.
NodeID string

// Volume ID assigned by CreateVolume.
VolumeID string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed VolumeID because it's now part of the key in resourceInfos. Same considerations as with the unexporting of [Vv]olumeInfo apply, though.

@apurv15
Copy link

apurv15 commented Oct 9, 2020

@timoreimann : We are seeing that a lot more test cases are failing with this change.
e.g.
ListSnapshots [Controller Server]
should return snapshots that match the specified snapshot id
/root/csi-sanity-new/src/github.com/kubernetes-csi/csi-test/pkg/sanity/controller.go:1126
STEP: reusing connection to CSI driver at /var/lib/kubelet/plugins/org.veritas.infoscale/csi.sock
STEP: creating mount and staging directories
STEP: creating first unrelated snapshot
STEP: creating target snapshot
STEP: creating second unrelated snapshot
STEP: listing snapshots
cleanup volume: deleting vol_c0o3m11862x7xmlbt5o9
cleanup volume: DeleteVolume failed: rpc error: code = FailedPrecondition desc = Could not delete volume vol_c0o3m11862x7xmlbt5o9, as it has dependent snapshots ['snap_043b9dl9u20l60xj4f2i'] with sync in progress

@timoreimann
Copy link
Contributor Author

@apurv15 I'm not exactly sure why things don't work for you. 🤔

I pushed another commit that outputs extra debug messages when you set the DEBUG_RESOURCE_MANAGER environment variable to something non-empty. Could you try this out and share the log output, especially the [DEBUG]-prefixed messages?

@pohly let me know if we think it makes sense to keep the poor man's debug logging approach in or not.

@pohly
Copy link
Contributor

pohly commented Oct 30, 2020

let me know if we think it makes sense to keep the poor man's debug logging approach in or not.

Why not just klog.V(4) or so?

@timoreimann
Copy link
Contributor Author

This one fell off my radar. I'll make sure to address the last comment soon.

This addresses a regression in [1] causing plugins to return an in-use
error (FAILED_PRECONDITION) when a sourcing resource (i.e., a snapshot
or a volume) is deleted before the sourced volume is.

[1]: kubernetes-csi#261
@timoreimann
Copy link
Contributor Author

@pohly

Why not just klog.V(4) or so?

We currently set a log.Logger to write into a GinkgoWriter, causing the output to be suppressed if all tests pass. Unless I'm missing something, klog only supports setting the output at the global package level. I'm a bit reluctant to do that as it'd mean that we could overwrite klog settings that csi-test library consumers may have set for their tests.

WDYT?

@pohly
Copy link
Contributor

pohly commented Jan 19, 2021

We currently set a log.Logger to write into a GinkgoWriter, causing the output to be suppressed if all tests pass.

We could do the same in the csi-sanity binary and enable the klog flags, which will have the same effect. We can't do it when used as lib, but then users who care can do something similar in their main function.

That should go a long way towards nicer logging without doing anything special (like a DEBUG_RESOURCE_MANAGER that most users won't know about).

func (cl *Resources) Cleanup() {
cl.mutex.Lock()
defer cl.mutex.Unlock()
ctx := context.Background()

cl.deleteSnapshots(ctx, 2)
cl.deleteVolumes(ctx, 2)
// Clean up resources in LIFO order to account for dependency order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PVC and VolumeSnapshot have independent life cycles in Kubernetes. If we fix the sanity test to make sure snapshots are always deleted before volumes are deleted, driver maintainers won't know it is possible for users to delete PVCs first and may run into the problems later on in production.

So I think we should actually have sanity tests that delete snapshot before and after the volume's deletion to make sure a driver can handle this independent life cycle. Yes, this means driver will have to implement logic to do soft deletion if volume and snapshot are dependent to each other on the storage system in order to pass the sanity tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had commented on #289
Spec provides flexibility for PVC and its snapshots to be dependent or not. Spec https://github.com/container-storage-interface/spec/blob/master/spec.md says following:

===
CSI plugins SHOULD treat volumes independent from their snapshots.

If the Controller Plugin supports deleting a volume without affecting its existing snapshots, then these snapshots MUST still be fully operational and acceptable as sources for new volumes as well as appear on ListSnapshot calls once the volume has been deleted.

When a Controller Plugin does not support deleting a volume without affecting its existing snapshots, then the volume MUST NOT be altered in any way by the request and the operation must return the FAILED_PRECONDITION error code and MAY include meaningful human-readable information in the status.message field.

So, the test case should not validate just one behavior as the spec allows volumes and snapshots to be dependent and that is the case for some of the drivers.

Copy link
Contributor Author

@timoreimann timoreimann Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With LIFO-ordered deletion implemented by this PR, I think we should satisfy the minimum requirement that the spec mandates, which is that plugins do not or cannot treat volumes and snapshots independently. That is, if a volume is created from a snapshot, we first delete the snapshot and then the volume. (I also assume the same logic should apply if a volume is sourced from another volume.)

However, @apurv15 mentioned in a previous comment that csi-test built from this change would cause more tests to fail. 🤔 I guess I should give the PR another hard look to check if I missed anything. (@apurv15, did you ever get around running it with the enhanced debug log messages as I suggested?)

@xing-yang would it make sense to file a separate issue to track creating a new test for the missed scenario you mentioned (while taking into account the specifics of the spec as @apurv15 quoted above)? IMHO, this PR should focus on addressing the regression I introduced and returning to the original cleanup order.

Copy link
Contributor

@xing-yang xing-yang Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also storage systems that cannot delete snapshots before deleting PVCs. Our existing e2e tests also delete in a specific order, deleting snapshots first and then deleting PVCs. However, that caused some CSI drivers to fail because they require an opposite ordering of deletion which is PVCs must be deleted before snapshots are deleted. There's an issue opened to track and fix this in e2e tests so that deletion of PVC and snapshots should be independent of each other:
kubernetes/kubernetes#96719

Our sanity test should be consistent with e2e test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xing-yang how should csi-test (and the e2e tests) account for the two possible orderings?

Should we try one order, and if we receive a particular error, try the inverse?

Or is the expectation that the driver acknowledges delete requests and completes the cleanup asynchronously once both the snapshot and volume deletes have been issued?

csi-test currently does not check that deletes have completed, so we might naturally support the latter scenario but not the former.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be solved in this PR. It's more like an idea for some dedicated tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding cleanup order: of both cases are valid, then picking one for cleanup (LIFO at the moment) is correct, i.e. we can merge this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this comment Clean up resources in LIFO order to account for dependency order., we should remove the text "to account to dependency order" as we are not trying to make sure snapshots are deleted before volumes are deleted.

Copy link
Contributor Author

@timoreimann timoreimann Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current form, the code cleans up resources "backwards": the resource (either volume or snapshot) that was created last is cleaned up first, followed by cleaning up the resource that was created second-to-last, and so on. Given that sourcing volumes must exist before the snapshots that are spawn of, this means that we first delete a snapshot followed by its sourcing volume.

Doesn't that imply that we do make sure that snapshots are deleted before volumes are @xing-yang?

Or are we talking about the scenario where a volume is being created from a snapshot and a possible dependency in that chain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this a bit further on Monday. @msau42 pointed out that it is valid when a CSI driver refuses to delete PVCs that have snapshots. It is also valid to refuse deleting snapshots when there is still a PVC. In other words, a CSI driver only needs to support one order of deletions, but not both.

This means that picking LIFO order will work for some drivers, but not all.

The right approach is to retry deletion after it failed once, i.e. try to delete snapshot, delete pvc, then if snapshot deletion failed, try again. I think this can be added relatively easily by iterating twice over all resources instead of just once.

We also discussed writing tests that explicitly cover both deletion orders. One test can do "delete snapshot, delete pvc, delete snapshot again (if necessary)", the other "delete pvc, delete snapshot, delete pvc again (if necessary)". This then works with all drivers and ensures that both is tested. But this should be added in a separate PR.

@timoreimann timoreimann force-pushed the prevent-plugin-from-returning-in-use-error-for-source-volumes branch from 82b7d90 to 537f237 Compare January 20, 2021 23:38
@timoreimann
Copy link
Contributor Author

@pohly I replaced the custom logger and env var flag with klog as discussed. Also optimized / reduced the amount of logging a bit more.

Please take another look.

@timoreimann
Copy link
Contributor Author

I should manually test-drive this PR a bit more, so for now

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 22, 2021
@@ -299,31 +296,33 @@ func (cl *Resources) cleanupVolume(ctx context.Context, offset int, volumeID str
Secrets: cl.Context.Secrets.ControllerUnpublishVolumeSecret,
},
)
logger.Errorf(err, "ControllerUnpublishVolume failed: %s", err)
ExpectWithOffset(offset, err).NotTo(HaveOccurred(), "[volume cleanup] ControllerUnpublishVolume failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously errors during cleanup were logged, but cleanup continued, right? Now it aborts.

Error handling during cleanup is tricky. Ideally, as much as possible should get cleaned up and then a summary of all unsolved problems should be raised as a test failure. The latter ensures that a test that itself ran fine but then has problems during cleanup is reported as failed overall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, somehow I missed that we used to continue to clean up even when we hit errors.

I made another update to collect all errors and assert at the end of the Cleanup call. I think this goes even beyond what we had before, where we'd assert at the end of a cleanup{Volume,Snapshot} call.

One thing I want to mention is that failure to, say, node-unpublish a volume, could mean that follow-up cleanup operations on the same resource (node-unstage, controller-unpublish, delete) may then be more likely to fail as well. I'm wondering if this aggravates understanding the produced error messages, and whether it could make more sense to not try follow-up cleanup operations within a single resource (but continue to try to clean up all resources).

I followed your suggestion for now, but let me know if you have thoughts on the above and whether it should influence another update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that errors are likely to cause other errors. But that is nothing new, developers should be familiar with follow-up errors and focused on the first one when trying to understand what went wrong.

So I think this behavior is fine.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, timoreimann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2021
@xing-yang
Copy link
Contributor

I'm fine to have this PR merged first and have another PR to handle the volume/snapshot deletion ordering issue.

@xing-yang
Copy link
Contributor

/lgtm

@timoreimann
Copy link
Contributor Author

If no one plans to, I can drive the follow up PR.

Thanks everyone involved.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2021
@k8s-ci-robot k8s-ci-robot merged commit 44a7535 into kubernetes-csi:master Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants