failure to remove detached disks via pool resize. Fixes #2099 #2184

phillxnet · 2020-06-17T17:11:52Z

Fixes #2099
"failure to remove detached disks via pool resize"
Fixes #2077
"remove/resize pool for detached single dev pool fails"

Additional unrelated trivial fix included:
Fixes #2182
"Usage displayed instead of available free space Issue" #2182

Detached disk management changes proposed:

When a pool has no missing members remove db held pool associations from all prior ‘detached’ members. These disks then show up as regular detached devices and can be removed by their ‘bin’ icons. This allows for continued surfacing of prior attached status for these devices while returning source of truth to the pool itself re detached members. Issue failure to remove detached disks via pool resize #2099
Special-case the removal of a detached device from a pool that has only detached members. Issue remove/resize pool for detached single dev pool fails #2077
Restrict detached device removal selection: one-at-a-time and with no detached/attached mixing. This reduces our problem space and so eases sanity checking. Attempted multi detached device selection informs that all detached will be removed on pools with only detached members anyway.
Improve error message and log entry on missing device removal failure: reiterating degraded,rw advise and reboot suggestion.

Note on interplay between the first two elements of above.

For pools with only detached members, we preserve the function of maintaining the members pool association (in light of the first element above), even though we normally assess missing device status via a pool model property (has_missing_dev) which can’t in the case of no attached members be known. This functionality is due to a prior check where, in the case of zero attached members, we proceed no further as no mount is then possible, and so do not inadvertently remove that pools single disk association. This maintains the possibility to detach all of a pools members and then re-attach them ‘hole sale’ (in powered off state) and for all associated configs (shares, exports, etc) to be re-established automatically. As if we were to remove all disk pool associations automatically in the case of all detached members, all it’s associated shares/exports would also be removed. (TEST THIS).

Testing:

Changing all serials for all drives, during power down in KVM, results in none of the ‘detached’ former members having a pool association. This means these prior detached pool associated members are then removable via their bin icon, and as such we still surface their prior attached status. #2099

Removing a single detached member from pools that have only detached members results, by design, in all detached members having their pool association removed, and on next pool state refresh, the pool itself being removed. See #2077 detailed caveat below.

Caveats:

N.B. as is common on a few of our Web-UI ‘operations’ two pool details page refreshes are often required for final pool state to be displayed. I.e. a removed detached disk entry will linger, post the completion of the operation, until this double refresh has been performed. And in the case of detached disk removal from pools that have only detached member, we have the ‘Ragged Edge’ issue detailed in my comments on issue #2077 which can be addressed in a future pull request.

Ready for review, requesting @FroggyFlox .
I have separated out the various main elements, as detailed above, into their own commits to aid review and have also ensured our project goals of string.format() and black formatting to all modified files.

…tor#2182 Fixes rockstor#2182

In some known cases a misleading/contradictory error message results when attempting to remove a missing device: that of there being no missing devices when btrfs fi show indicates otherwise. Handle this case by re-asserting the degraded,rw mount options suggested in our Maintenance required contextual text and further suggest a reboot may be required. It is suggested that aim to provide an unmount/remount capability in the future that could also be suggested here.

…kstor#2099 When a pool has no missing members remove db held pool associations from all prior ‘detached’ members. These disks then show up as regular detached devices and can be removed by their ‘bin’ icons. This allows for continued surfacing of prior attached status for these devices while returning source of truth to the pool itself re detached members.

…2077 rockstor#2099 - Special-case the removal of a detached device from a pool that has only detached members. Issue rockstor#2077 - Restrict detached device removal selection: one-at-a-time and with no detached/attached mixing. This reduces our problem space and so eases sanity checking. Attempted multi detached device selection informs that all detached will be removed on pools with only detached members anyway. - In combination with other rockstor#2099 related changes auto pool deletion is facilitated post all detached disk removals.

FroggyFlox

@phillxnet ,

I just tested (to the best of my understanding, at least), and it works great!

The steps I followed were:

in KVM, create two new disks with serial aaa and bbb.
Boot Rockstor, and create a pool test_pool_03 with these two disks in raid1.
Shutdown Rockstor.
Change each disk's serial number to a2a2a2 and b2b2b2.
Reboot Rockstor.
Inspect Disks and Pools pages.

Prior to your PR, this resulted in the Pools page showing 4 disks members: the two old ones labeled as detached-disk_name + the two "new" ones (with the new serial numbers). No "bin" icon was surfaced in the Disks page and I had to use the Resize/ReRaid to remove the disks and then delete.

After your PR, this resulted in the logs showing the following at boot:

[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.
[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.

The Pools page now shows only the two new disks as members of test_pool_03.
The Disks page now shows the two old detached disks with a "bin" icon that I can successfully use to remove them from the list.

Nice!

FroggyFlox · 2020-06-19T12:13:41Z

src/rockstor/storageadmin/views/command.py

+            # pool info as source of truth re missing.
+            if not p.has_missing_dev:
+                for disk in p.disk_set.filter(name__startswith="detached-"):
+                    logger.info("Removing detached disk from Pool {}: no missing "


Quick thought: Could we further detail the disk name (or any meaningful info) on the disk in question here? While testing, two detached disks needed to be removed and that results in the following:

[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found. [19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.

It thus results in duplicates (or tri-, etc...) when that happens. Tiny detail, but if there's some some of identification readily available for the detached disk in question being removed, it could be helpful.

Good point except that the name of a detached disk is essentially random. The bit after detached is just a uuid (long historical reason to that) so the name wouldn't help. That random element is also changed on every disk refresh!! And other than that I guess the best thing would be the serial. Needn't hold up the pr merge though as almost all removals will be singular devices anyway given only meaningful removals (of viable pools) in raid 6 will be more than one given it's the only raid level that can handle more than one disk failure currently anyway (bar upstream btrfs raid1c3 and raid1c4).

I say we earmark this as a future logging improvement as we will be back here before long anyway. Nice idea though.

phillxnet · 2020-06-19T20:05:14Z

@FroggyFlox Thanks for the review, double check. I think we are fairly safe with this one as it stands.

Much appreciated. I'll let is sit for a bit and see if anything more dawns on us and if not then I'll pop it in ready for the next testing release rpm.

phillxnet added 6 commits June 16, 2020 15:47

usage displayed instead of available free in disk removal check rocks…

c8671ea

…tor#2182 Fixes rockstor#2182

move pool.py to string.format() rockstor#2099

f2cb3c6

move pool.py to black formatting rockstor#2099

62c9093

FroggyFlox approved these changes Jun 19, 2020

View reviewed changes

phillxnet changed the title ~~failure to remove detached disks via pool resize.Fixes #2099~~ failure to remove detached disks via pool resize. Fixes #2099 Jun 24, 2020

phillxnet merged commit f23bef5 into rockstor:master Jun 24, 2020

phillxnet deleted the 2099_failure_to_remove_detached_disks_via_pool_resize branch June 24, 2020 15:56

phillxnet mentioned this pull request Jan 18, 2023

Allow delete of un-mounted/un-mountable pool #2395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failure to remove detached disks via pool resize. Fixes #2099 #2184

failure to remove detached disks via pool resize. Fixes #2099 #2184

phillxnet commented Jun 17, 2020

FroggyFlox left a comment

FroggyFlox Jun 19, 2020

phillxnet Jun 19, 2020

phillxnet commented Jun 19, 2020

failure to remove detached disks via pool resize. Fixes #2099 #2184

failure to remove detached disks via pool resize. Fixes #2099 #2184

Conversation

phillxnet commented Jun 17, 2020

Detached disk management changes proposed:

Note on interplay between the first two elements of above.

Testing:

Caveats:

FroggyFlox left a comment

Choose a reason for hiding this comment

FroggyFlox Jun 19, 2020

Choose a reason for hiding this comment

phillxnet Jun 19, 2020

Choose a reason for hiding this comment

phillxnet commented Jun 19, 2020