feat(zfspv): handling unmounted volume . #76

pawanpraka1 · 2020-04-06T15:59:19Z

There can be cases where openebs namespace has been accidentally deleted. If that happens, the volume CRs deletion will be triggered. Volume CR deletion will trigger dataset deletion process. Prior to the actual deletion of the data, the driver will attempt to do the following:

Unmount the dataset(not the case with zvol as it will be unmounted via NodeUnPublish) Unmounting the dataset will Set the mount="no"
Delete the zvol or dataset

But since the volume is actively consumed by a pod, the destroy will fail, as the volume is busy. The zvols continue to operate. However with datasets, it is umounted resulting in the setting of mount="no", but then delete will fail.

Now, there are two actions that the user can take:
(a) Continue to clean-up, so that setup can be re-created
(b) Reinstall openebs and try to create volumes and point to the underlying datasets.

In case (a), the user will have to go delete the application pods and then start deleting the PVCs. However, the PV deletion will fail, because the steps to clean up do not find that volume is mounted and will be aborted.

Let us assume case (b), and prior to actually reinstalling, as the volumes are still intact, applications are expected to continue to access the data.

However, if a node restart occurs, due to the mount being set to no, the pods will not be able to access the volume. The data stored by the pod will not persist as it is not backed by persistence storage.

To recover from the partial clean up steps. the following needs to be done on each of the nodes:

zfs get mounted : check if there is any unmounted dataset with this option as "no".
For all the datasets that showed mounted as no, do the following:
zfs mount <dataset name>
The above commands will result in mounting the dataset.

Here in this PR :

automating the manual steps performed above to check that volume can be mounted, even if a manual operation left it in a non-mountable state.
helping with the case (a), by into NodeUnPublish operation - that continues to destroy in whatever state the volume may be in.
add helpful debug messages.

Signed-off-by: Pawan pawan@mayadata.io

There can be cases where openebs namespace has been accidently deleted (Optoro case: https://mdap.zendesk.com/agent/tickets/963), There the driver attempted to destroy the dataset which will first umount the dataset and then try to destroy it, the destroy will fail as volume is busy. Here, as mentioned in the steps to recover, we have to manually mount the dataset ``` 6. The driver might have attempted to destroy the volume before going down, which sets the mount as no(this strange behavior on gke ubuntu 18.04), we have to mount the dataset, go to the each node and check if there is any unmounted volume zfs get mounted if there is any unmounted dataset with this option as "no", we should do the below :- mountpath=zfs get -Hp -o value mountpoint <dataset name> zfs set mountpoint=none zfs set mountpoint=<mountpath> this will set the dataset to be mounted. ``` So in this case the volume will be unmounted and still mountpoint will set to the mountpath, so if application pod is deleted later on, it will try to mount the zfs dataset, here just setting the `mountpoint` is not sufficient, as if we have unmounted the zfs dataset (via zfs destroy in this case), so we have to explicitely mount the dataset **otherwise application will start running without any persistence storage**. Here automating the manual steps performed to resolve the problem, we are checking in the code that if zfs dataset is not mounted after setting the mountpoint property, attempt to mount it. This is not the case with the zvol as it does not attempt to unmount it, so zvols are fine. Also NodeUnPublish operation MUST be idempotent. If this RPC failed, or the CO does not know if it failed or not, it can choose to call NudeUnPublishRequest again. So handled this and returned successful if volume is not mounted also added descriptive error messages at few places. Signed-off-by: Pawan <pawan@mayadata.io>

codecov-io · 2020-04-07T09:34:42Z

Codecov Report

Merging #76 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #76   +/-   ##
=======================================
  Coverage   23.57%   23.57%           
=======================================
  Files          14       14           
  Lines         475      475           
=======================================
  Hits          112      112           
  Misses        362      362           
  Partials        1        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6033789...273540c. Read the comment docs.

pkg/driver/agent.go

pawanpraka1 added enhancement Add new functionality to existing feature pr/hold-review hold the review. labels Apr 6, 2020

pawanpraka1 added this to the v0.6.0 milestone Apr 6, 2020

pawanpraka1 requested a review from kmova April 6, 2020 15:59

pawanpraka1 added this to In progress in ZFS Local PV Apr 6, 2020

pawanpraka1 added this to RC2 - Due Apr 10, 2020 in 1.9 Release Tracker - Due Apr 15th. Apr 6, 2020

pawanpraka1 removed the pr/hold-review hold the review. label Apr 6, 2020

pawanpraka1 moved this from In progress to In Review in ZFS Local PV Apr 6, 2020

pawanpraka1 added the pr/hold-review hold the review. label Apr 7, 2020

pawanpraka1 force-pushed the csi-spec branch from 8dd24ee to 33b2576 Compare April 7, 2020 07:52

pawanpraka1 changed the title ~~feat(zfspv): implementing unpublish as per CSI spec.~~ feat(zfspv): handling unmounted volume . Apr 7, 2020

pawanpraka1 force-pushed the csi-spec branch from 33b2576 to 273540c Compare April 7, 2020 09:23

pawanpraka1 removed the pr/hold-review hold the review. label Apr 7, 2020

kmova reviewed Apr 9, 2020

View reviewed changes

pkg/driver/agent.go Show resolved Hide resolved

kmova approved these changes Apr 9, 2020

View reviewed changes

kmova merged commit 3a1a8e7 into openebs:master Apr 9, 2020

ZFS Local PV automation moved this from In Review to Done Apr 9, 2020

1.9 Release Tracker - Due Apr 15th. automation moved this from RC2 - Due Apr 10, 2020 to Done Apr 9, 2020

pawanpraka1 deleted the csi-spec branch April 9, 2020 15:25

pawanpraka1 mentioned this pull request Apr 9, 2020

feat(zfspv): handling unmounted volume #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(zfspv): handling unmounted volume . #76

feat(zfspv): handling unmounted volume . #76

pawanpraka1 commented Apr 6, 2020 •

edited by kmova

codecov-io commented Apr 7, 2020

feat(zfspv): handling unmounted volume . #76

feat(zfspv): handling unmounted volume . #76

Conversation

pawanpraka1 commented Apr 6, 2020 • edited by kmova

codecov-io commented Apr 7, 2020

Codecov Report

pawanpraka1 commented Apr 6, 2020 •

edited by kmova