Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zfspv): handling unmounted volume #78

Merged
merged 1 commit into from Apr 11, 2020
Merged

feat(zfspv): handling unmounted volume #78

merged 1 commit into from Apr 11, 2020

Conversation

pawanpraka1
Copy link
Contributor

@pawanpraka1 pawanpraka1 commented Apr 9, 2020

Cherry-pick to v0.6.x : #76
There can be cases where openebs namespace has been accidentally deleted. If that happens, the volume CRs deletion will be triggered. Volume CR deletion will trigger dataset deletion process. Prior to the actual deletion of the data, the driver will attempt to do the following:

Unmount the dataset(not the case with zvol as it will be unmounted via NodeUnPublish) Unmounting the dataset will Set the mount="no"
Delete the zvol or dataset
But since the volume is actively consumed by a pod, the destroy will fail, as the volume is busy. The zvols continue to operate. However with datasets, it is umounted resulting in the setting of mount="no", but then delete will fail.

Now, there are two actions that the user can take:
(a) Continue to clean-up, so that setup can be re-created
(b) Reinstall openebs and try to create volumes and point to the underlying datasets.

In case (a), the user will have to go delete the application pods and then start deleting the PVCs. However, the PV deletion will fail, because the steps to clean up do not find that volume is mounted and will be aborted.

Let us assume case (b), and prior to actually reinstalling, as the volumes are still intact, applications are expected to continue to access the data.

However, if a node restart occurs, due to the mount being set to no, the pods will not be able to access the volume. The data stored by the pod will not persist as it is not backed by persistence storage.

To recover from the partial clean up steps. the following needs to be done on each of the nodes:

zfs get mounted : check if there is any unmounted dataset with this option as "no".
For all the datasets that showed mounted as no, do the following:
zfs mount
The above commands will result in mounting the dataset.
Here in this PR :

automating the manual steps performed above to check that volume can be mounted, even if a manual operation left it in a non-mountable state.
helping with the case (a), by into NodeUnPublish operation - that continues to destroy in whatever state the volume may be in.
add helpful debug messages.
Signed-off-by: Pawan pawan@mayadata.io

There can be cases where openebs namespace has been accidently deleted (Optoro case: https://mdap.zendesk.com/agent/tickets/963), There the driver attempted to destroy the dataset which will first umount the dataset and then try to destroy it, the destroy will fail as volume is busy. Here, as mentioned in the steps to recover, we have to manually mount the dataset
```
6. The driver might have attempted to destroy the volume before going down, which sets the mount as no(this strange behavior on gke ubuntu 18.04), we have to mount the dataset, go to the each node and check if there is any unmounted volume
zfs get mounted
if there is any unmounted dataset with this option as "no", we should do the below :-
mountpath=zfs get -Hp -o value mountpoint <dataset name>
zfs set mountpoint=none
zfs set mountpoint=<mountpath>
this will set the dataset to be mounted.
```

So in this case the volume will be  unmounted and still mountpoint will set to the mountpath, so if application pod is deleted later on, it will try to mount the zfs dataset, here just setting the `mountpoint` is not sufficient, as if we have unmounted the zfs dataset (via zfs destroy in this case), so we have to explicitely mount the dataset **otherwise application will start running without any persistence storage**. Here automating the manual steps performed to resolve the problem, we are checking in the code that if zfs dataset is not mounted after setting the mountpoint property, attempt to mount it.

This is not the case with the zvol as it does not attempt to unmount it, so zvols are fine.

Also NodeUnPublish operation MUST be idempotent. If this RPC failed, or the CO does not know if it failed or not, it can choose to call NudeUnPublishRequest again. So handled this and returned successful if volume is not mounted also added descriptive error messages at few places.

Signed-off-by: Pawan <pawan@mayadata.io>
@pawanpraka1 pawanpraka1 added the enhancement New feature or request label Apr 9, 2020
@pawanpraka1 pawanpraka1 added this to the v0.6.0 milestone Apr 9, 2020
@pawanpraka1 pawanpraka1 requested a review from kmova April 9, 2020 15:30
@codecov-io
Copy link

codecov-io commented Apr 9, 2020

Codecov Report

Merging #78 into v0.6.x will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           v0.6.x      #78   +/-   ##
=======================================
  Coverage   23.57%   23.57%           
=======================================
  Files          14       14           
  Lines         475      475           
=======================================
  Hits          112      112           
  Misses        362      362           
  Partials        1        1           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6033789...3a1a8e7. Read the comment docs.

@kmova kmova merged commit 91ae9e4 into v0.6.x Apr 11, 2020
@kmova kmova added this to Done in ZFS Local PV via automation Apr 11, 2020
@kmova kmova added this to Done in 1.9 Release Tracker - Due Apr 15th. via automation Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
ZFS Local PV
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants