Skip to content

feat(upgrade, api, core): upgrade old BlockDevices to new BlockDevices with new UUID#442

Merged
kmova merged 33 commits intoopenebs-archive:masterfrom
akhilerm:upgrade-bd-api
Aug 6, 2020
Merged

feat(upgrade, api, core): upgrade old BlockDevices to new BlockDevices with new UUID#442
kmova merged 33 commits intoopenebs-archive:masterfrom
akhilerm:upgrade-bd-api

Conversation

@akhilerm
Copy link
Copy Markdown
Contributor

@akhilerm akhilerm commented Jun 18, 2020

Signed-off-by: Akhil Mohan akhil.mohan@mayadata.io

Why is this PR required? What issue does it fix?:
The BDs existing in old UUID scheme need to be migrated to BDs in new scheme.
BlockDevices will be creates for disks that are being used by zfs localPV and mayastor.

What this PR does?:
Handles cases for co-exisiting of BDs in legacy and GPTBased uuid scheme
Cases which are handled:

  1. All BlockDevices that are not in Unclaimed (Claimed and Released) state and uses the legacy uuid scheme will be annotated with internal.openebs.io/uuid-scheme=legacy
  2. BlockDevices that are in Unclaimed state will be made inactive and new BlockDevice resource using the GPT scheme will be created.
  3. Devices that are newly connected (first time appearing in the cluster) will have BlockDevice resources created using the GPT scheme.
  4. Virtual Devices (which donot have a unique identifier) which are currently in Claimed state, when disconnected and reconnected at a different path , new BlockDevice resource will be created with the legacy uuid scheme and the internal.openebs.io/uuid-scheme=legacy annotation. This is done so as to make the changes backward compatible with applications that are currently using those BlockDevice resources.

BlockDevice resource will be created for devices that are used by ZFS-LocalPV. Those blockdevices will be labelled openebs.io/block-device-tag=zfs-localpv so that the operator cannot claim those resources.

MayaStor disks: Mayastor disks will be ignored, because we need a unique way to identify disks that are already in use by mayastor. (This is because mayastor disks does not have any partitions.)

Pre-requisites before performing the upgrade:

  1. all the devices that are in use should be claimed (i.e There are chances that multiple block device resource can exist if the device path has changed. It should be made sure that the actual BD resource that corresponds to the device in use should be in claimed state. Commonly occurs in cstor, because zfs scan the /dev directory and finds the disk even if path has changed.)
  2. all data disks that is being used by the storage engine should be connected to the node and 1. should be satisfied
  3. all manually created and claimed blockdevices should excluded in the configmap

Does this PR require any upgrade changes?:
No.

If the changes in this PR are manually verified, list down the scenarios covered::

  • LocalPV created on virtual disk -> Upgrade NDM -> Resource updated with annotations. (Restarts also handled)
  • LocalPV created on virtual disk -> disconnected the disk -> disk comes at a different path. In this case, the device wont be used by local PV anymore since the mountpath has changed and pod cannot mount the device. (same case for moving to another node also.) After upgrade a new resource will be created using fs uuid. If path does not change and pod mounts the device then works as expected by adding the annotation.
  • LocalPV created on physical disk -> upgrade NDM -> Resource updated with annotations (Restart also will be similar case)
  • cstor on virtual disk -> upgrade NDM -> move virtual disk to another node
  • cstor on virtual disk -> upgrade NDM -> disconnect and reconnect the device
  • cstor on physical disk -> upgrade NDM -> move disk to different node (also disconnect and reconnect the device)

Any additional information for your reviewer? :
NOTES:

  • After a legacy resource is unclaimed, a rescan/restart of pod would have to performed so as to keep the data in sync
  • Does not support migration of localPV raw block.

Flow Diagram for the upgrade of LocalPV is available here. The gist also contains the changes that are required in the steps for cstor.

Checklist:

  • Fixes #3043
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/openebs-docs is used to track them:

@akhilerm akhilerm marked this pull request as ready for review June 29, 2020 19:56
@akhilerm akhilerm requested review from kmova and pawanpraka1 June 29, 2020 19:57
@akhilerm akhilerm self-assigned this Jun 29, 2020
@akhilerm akhilerm added pr/documentation-pending The changes need to be documented pr/hold-merge The PR should not be merged now pr/release-note-alpha PR should be included in alpha features documentation/release pr/upgrade-automated Upgrade for the changes in this PR is automated labels Jun 29, 2020
@akhilerm akhilerm added this to the 0.7 milestone Jun 29, 2020
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jul 6, 2020

Codecov Report

Merging #442 into master will decrease coverage by 2.36%.
The diff coverage is 10.99%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #442      +/-   ##
==========================================
- Coverage   42.90%   40.53%   -2.37%     
==========================================
  Files          71       71              
  Lines        3228     3488     +260     
==========================================
+ Hits         1385     1414      +29     
- Misses       1735     1966     +231     
  Partials      108      108              
Impacted Files Coverage Δ
cmd/ndm_daemonset/probe/deletehandler.go 18.42% <0.00%> (-18.43%) ⬇️
cmd/ndm_daemonset/probe/eventhandler.go 48.97% <ø> (ø)
cmd/ndm_daemonset/probe/mountprobe.go 0.00% <ø> (ø)
cmd/ndm_daemonset/probe/usedbyprobe.go 15.38% <0.00%> (ø)
pkg/select/blockdevice/filters.go 15.86% <0.00%> (-1.57%) ⬇️
pkg/select/blockdevice/select.go 0.00% <0.00%> (ø)
cmd/ndm_daemonset/probe/addhandler.go 6.47% <3.33%> (-5.35%) ⬇️
cmd/ndm_daemonset/probe/uuid.go 88.63% <76.19%> (-11.37%) ⬇️
cmd/ndm_daemonset/controller/blockdevice.go 87.17% <100.00%> (+0.16%) ⬆️
cmd/ndm_daemonset/controller/blockdevicestore.go 75.15% <100.00%> (+0.31%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc0a40d...0e74dc1. Read the comment docs.

@akhilerm akhilerm removed the pr/hold-merge The PR should not be merged now label Jul 7, 2020
Copy link
Copy Markdown
Contributor

@pawanpraka1 pawanpraka1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments. still reviewing it.

Comment thread ndm-operator.yaml
Comment thread cmd/ndm_daemonset/probe/deleteDeviceHandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/deleteDeviceHandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/deleteDeviceHandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/eventhandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/eventhandler.go
Comment thread cmd/ndm_daemonset/probe/deleteDeviceHandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/eventhandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/eventhandler.go Outdated
@kmova
Copy link
Copy Markdown
Contributor

kmova commented Jul 11, 2020

@akhilerm -- finally got my slot to review and went through one round. I still need to go through addDeviceHandler.go file.

My initial impressions on the rest of the files:

  • The overall logic seems to work.
  • I really liked the way you are refactoring the eventhandler.go to help with future maintenance. Left a few comments on the naming/refactoring bits.
  • Delete Event Handling is pretty straightforward. We probably need to start maintaining a test lists that we have to cover as we go forwards - for example, handle a delete event for the BD that has several different combinations based on the states like:
    • active/inactive/unknown
    • legacy-uuid/uuid
    • claimed/unclaimed/released
    • with/without partition

Copy link
Copy Markdown
Contributor

@pawanpraka1 pawanpraka1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akhilerm in legacy case, when we will populate the deviceInfo.UUID? it seems like it will not be set as we are setting UUID in GPT case only?
eventhandler.go:81 => existingBlockDeviceResource := pe.Controller.GetExistingBlockDeviceResource(bdAPIList, deviceInfo.UUID)

Comment thread cmd/ndm_daemonset/probe/deletehandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/addhandler.go
Comment thread cmd/ndm_daemonset/probe/addhandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/addhandler.go Outdated
@kmova
Copy link
Copy Markdown
Contributor

kmova commented Jul 23, 2020

@akhilerm - with the changes in the PR can you see if the following tests work:

(a) handle the change in the path name
* setup cstor on /dev/sdb (virtual disk)
* rename the /dev/sdb ( as /dev/sdx ) manually in the bd
* restart ndm

 Expected Result:
 => NDM should not create a new block device for /dev/sdb 
 => NDM should update the path information to the existing BD

(b) handle the movement of disk from node1 to node2
* setup cstor on /dev/sdb (virtual disk) (say this is on node-1)
* move the /dev/sdb (virtual disk) from node 1 to node 2
* restart ndm on node1 and node2 (in case the changes are not handled automatically)

 Expected Result:
 => NDM should not create a new block device for /dev/sdb 
 => NDM should update the node information to the existing BD

@akhilerm
Copy link
Copy Markdown
Contributor Author

@kmova @pawanpraka1 Here is the reference doc for the upgrade flow.

@akhilerm
Copy link
Copy Markdown
Contributor Author

akhilerm commented Aug 3, 2020

@kmova Both the above mentioned test cases :
(a) handle the change in the path name and
(b) handle the movement of disk from node1 to node2 are working as expected.

…gacy UUID

Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Comment thread cmd/ndm_daemonset/probe/addhandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/deletehandler.go Outdated
kmova
kmova previously approved these changes Aug 6, 2020
Copy link
Copy Markdown
Contributor

@kmova kmova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good!

Have a comment in the delete handler that needs to be addressed. And generic comments on adding more comments in both add/delete handler codes.

change legacy check in delete to the end.

Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
@akhilerm
Copy link
Copy Markdown
Contributor Author

akhilerm commented Aug 6, 2020

@kmova addressed the review comments and tested the changes.

Comment thread cmd/ndm_daemonset/probe/deletehandler.go Outdated
Comment thread cmd/ndm_daemonset/probe/addhandler.go
Comment thread cmd/ndm_daemonset/probe/addhandler.go
klog.V(4).Infof("device: %s upgraded", bd.DevPath)
return nil
}

Copy link
Copy Markdown
Contributor

@pawanpraka1 pawanpraka1 Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if we can break addBlockDevice into smaller part? it is a huge function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take it up in a later PR @pawanpraka1 .

Comment thread cmd/ndm_daemonset/probe/addhandler.go Outdated
Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
Comment thread cmd/ndm_daemonset/probe/addhandler.go Outdated
pawanpraka1
pawanpraka1 previously approved these changes Aug 6, 2020
Copy link
Copy Markdown
Contributor

@pawanpraka1 pawanpraka1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. few minor comments. Can you add the flow chart in the PR description.

Signed-off-by: Akhil Mohan <akhil.mohan@mayadata.io>
@kmova kmova merged commit e30a718 into openebs-archive:master Aug 6, 2020
@akhilerm akhilerm deleted the upgrade-bd-api branch November 2, 2020 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr/documentation-pending The changes need to be documented pr/release-note-alpha PR should be included in alpha features documentation/release pr/upgrade-automated Upgrade for the changes in this PR is automated

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants