New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failure to remove detached disks via pool resize. Fixes #2099 #2184
failure to remove detached disks via pool resize. Fixes #2099 #2184
Conversation
In some known cases a misleading/contradictory error message results when attempting to remove a missing device: that of there being no missing devices when btrfs fi show indicates otherwise. Handle this case by re-asserting the degraded,rw mount options suggested in our Maintenance required contextual text and further suggest a reboot may be required. It is suggested that aim to provide an unmount/remount capability in the future that could also be suggested here.
…kstor#2099 When a pool has no missing members remove db held pool associations from all prior ‘detached’ members. These disks then show up as regular detached devices and can be removed by their ‘bin’ icons. This allows for continued surfacing of prior attached status for these devices while returning source of truth to the pool itself re detached members.
…2077 rockstor#2099 - Special-case the removal of a detached device from a pool that has only detached members. Issue rockstor#2077 - Restrict detached device removal selection: one-at-a-time and with no detached/attached mixing. This reduces our problem space and so eases sanity checking. Attempted multi detached device selection informs that all detached will be removed on pools with only detached members anyway. - In combination with other rockstor#2099 related changes auto pool deletion is facilitated post all detached disk removals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested (to the best of my understanding, at least), and it works great!
The steps I followed were:
- in KVM, create two new disks with serial
aaa
andbbb
. - Boot Rockstor, and create a pool
test_pool_03
with these two disks in raid1. - Shutdown Rockstor.
- Change each disk's serial number to
a2a2a2
andb2b2b2
. - Reboot Rockstor.
- Inspect Disks and Pools pages.
Prior to your PR, this resulted in the Pools page showing 4 disks members: the two old ones labeled as detached-disk_name
+ the two "new" ones (with the new serial numbers). No "bin" icon was surfaced in the Disks page and I had to use the Resize/ReRaid to remove the disks and then delete.
After your PR, this resulted in the logs showing the following at boot:
[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.
[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.
The Pools page now shows only the two new disks as members of test_pool_03
.
The Disks page now shows the two old detached disks with a "bin" icon that I can successfully use to remove them from the list.
Nice!
# pool info as source of truth re missing. | ||
if not p.has_missing_dev: | ||
for disk in p.disk_set.filter(name__startswith="detached-"): | ||
logger.info("Removing detached disk from Pool {}: no missing " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick thought: Could we further detail the disk name (or any meaningful info) on the disk in question here? While testing, two detached disks needed to be removed and that results in the following:
[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.
[19/Jun/2020 07:59:36] INFO [storageadmin.views.command:100] Removing detached disk from Pool test_pool_03: no missing devices found.
It thus results in duplicates (or tri-, etc...) when that happens. Tiny detail, but if there's some some of identification readily available for the detached disk in question being removed, it could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point except that the name of a detached disk is essentially random. The bit after detached is just a uuid (long historical reason to that) so the name wouldn't help. That random element is also changed on every disk refresh!! And other than that I guess the best thing would be the serial. Needn't hold up the pr merge though as almost all removals will be singular devices anyway given only meaningful removals (of viable pools) in raid 6 will be more than one given it's the only raid level that can handle more than one disk failure currently anyway (bar upstream btrfs raid1c3 and raid1c4).
I say we earmark this as a future logging improvement as we will be back here before long anyway. Nice idea though.
@FroggyFlox Thanks for the review, double check. I think we are fairly safe with this one as it stands. Much appreciated. I'll let is sit for a bit and see if anything more dawns on us and if not then I'll pop it in ready for the next testing release rpm. |
Fixes #2099
"failure to remove detached disks via pool resize"
Fixes #2077
"remove/resize pool for detached single dev pool fails"
Additional unrelated trivial fix included:
Fixes #2182
"Usage displayed instead of available free space Issue" #2182
Detached disk management changes proposed:
Note on interplay between the first two elements of above.
For pools with only detached members, we preserve the function of maintaining the members pool association (in light of the first element above), even though we normally assess missing device status via a pool model property (has_missing_dev) which can’t in the case of no attached members be known. This functionality is due to a prior check where, in the case of zero attached members, we proceed no further as no mount is then possible, and so do not inadvertently remove that pools single disk association. This maintains the possibility to detach all of a pools members and then re-attach them ‘hole sale’ (in powered off state) and for all associated configs (shares, exports, etc) to be re-established automatically. As if we were to remove all disk pool associations automatically in the case of all detached members, all it’s associated shares/exports would also be removed. (TEST THIS).
Testing:
Changing all serials for all drives, during power down in KVM, results in none of the ‘detached’ former members having a pool association. This means these prior detached pool associated members are then removable via their bin icon, and as such we still surface their prior attached status. #2099
Removing a single detached member from pools that have only detached members results, by design, in all detached members having their pool association removed, and on next pool state refresh, the pool itself being removed. See #2077 detailed caveat below.
Caveats:
N.B. as is common on a few of our Web-UI ‘operations’ two pool details page refreshes are often required for final pool state to be displayed. I.e. a removed detached disk entry will linger, post the completion of the operation, until this double refresh has been performed. And in the case of detached disk removal from pools that have only detached member, we have the ‘Ragged Edge’ issue detailed in my comments on issue #2077 which can be addressed in a future pull request.
Ready for review, requesting @FroggyFlox .
I have separated out the various main elements, as detailed above, into their own commits to aid review and have also ensured our project goals of string.format() and black formatting to all modified files.