docs/fix: disk replacement workflow and nmdctl corrections#91
docs/fix: disk replacement workflow and nmdctl corrections#91
Conversation
Complete workflow for replacing failed disks with critical warning about mount order: filesystems MUST be mounted before rebuild starts or parity will be corrupted.
- Size display was using binary math (1024) with decimal labels (TB/GB). Now uses decimal math to match the labels and unRAID's display. - Reload warning now includes required unmount step.
|
Sorry, this doesn't make sense:
Mounting before or after rebuild starts should not make any difference. The driver does not know about mounted filesystems. |
|
Regarding the other changes, I originally decided to go with base-2 units, and while it is probably "wrong" to then use gigabytes as units instead of gibibytes, I'm not sure I want to change it right away. Reload warning, the |
I understand the driver doesn't explicitly track mount state, this is more about what came up in my testing. So a couple of clarifications:
Also, this is the order of operation Unraid uses. I know you mentioned you don't use unraid, so I figured I'd let you know. How about as a middle ground, because this did come up for me in testing and maybe it's just an edge case. So how about instead of "will corrupt parity," we could say "may cause parity inconsistency in some configurations" or "some users have reported issues when mounting after rebuild starts." Either way I'd like to see I’d like some kind of documentation for a recommended workflow. If your issue is specifically on the caution block being too definitive, hopefully this middle ground makes sense to you. |
I apologize for not being clear with my explanation. I honestly don't have a strong opinion on whether to use GB or GiB. My issue was that the code was doing the math for one thing but labeling it the other. So the way you had it before, you were doing which would give you GiB/TiB but labeling it GB/TB. If you want to stay with GiB/TiB That's fine. We just need to label it that way. So instead what we should do is As I said, I don't really have strong opinions either way as long as we're consistent and accurate — As for your second point about the reload module being changed to call stop array, I 100% agree that's probably the better way of doing it. The reason I didn't do that is because when making my first pull request on any new code base, I like to do the least intrusive changes and I figured just changing the warning was less intrusive. |
|
Just let me know how you want to proceed and I'll modify my pull request accordingly. |
|
Can you open an issue detailing how to reproduce the parity corruption issue? Please list all the commands you run, including I am unable to reproduce such an issue, and it would go against how the driver actually works. It is most likely you have accidentally mounted the raw disk device at some stage instead of the nonraid block device. As for the unit changes, you can also open an issue about it, but there are many examples of CLI tools using base-2 and GB as symbols, and switching to gibibyte symbols might make the outputs look busy, so I'm inclined to leave it as-is. Even ZFS skirts around the "problem" by not including the byte symbol in their human-readable output at all, just using T/G/M suffixes... And finally for the reload warning, you can open a new PR if you want to change the reload function to call I'll close this PR, in the future it would be best if you could open separate PRs for separate changes - and in most cases an issue before PR is good, so we can first discuss why a change is needed. (I also added this into the contributing guidelines just now, so no worries and thanks for your efforts so far!) |
Summary
Changes
Documentation (README.md)
Complete 7-step disk replacement workflow:
Added CAUTION block explaining that mounting after rebuild starts will corrupt parity.
nmdctl fixes
nmdctl unmount && nmdctl reload && nmdctl startinstead of justreload && start(which fails if filesystems are mounted).Testing
These changes were discovered and validated during hands-on disk replacement testing on bare metal Arch Linux (kernel 6.12.1) with NonRAID DKMS 1.3.2 / kernel module 2.9.35.