range_tree: Add zfs_recover_rt parameter and extra debug info#17094
range_tree: Add zfs_recover_rt parameter and extra debug info#17094ihoro wants to merge 1 commit into
Conversation
ac2b7cc to
963c816
Compare
|
The update includes:
|
|
Looks like there's a build issue: |
Indeed, there should have been one more dev iteration on my side. It should be fixed now. |
|
For some reason a bunch of the runners are failing. I'm going to manually restart then. |
|
Please rebase on master; that will pull in the new .github/workflow/* files and allow the tests to complete. |
Thank you.
Sure, it's been moved 3 commits up, over the recent workflow changes, it should work now. |
amotin
left a comment
There was a problem hiding this comment.
I see plenty of places where use case is not defined. We could look better. Also I think in many cases we could still log pool and vdev, even if we have no metaslab number to report.
There are production cases where unexpected range tree segment adding/removal leads to panic. The root cause investigation requires more debug info about the range tree and the segments in question when it happens. In addition, the zfs_recover_rt parameter allows converting such panics into warnings with a potential space leak as a trade-off. Signed-off-by: Igor Ostapenko <igor.ostapenko@klarasystems.com>
It seems it's time to rename the The unknown (now generic) flag is intentionally used for the range tree instances where special treatment is not expected. Sometimes it's not about allocated/free space or it's a temporary tree which is based on already "recovered" other ones. Anyway, I think I could review the instances once again, probably someone should not be GENERIC.
Yes, it's worth the extra code to be maximum useful. It will come with the next iteration of the patch. |
| * name string, which can be marked as dynamic to be freed along with the tree | ||
| * instance destruction. | ||
| */ | ||
| #define ZFS_RANGE_TREE_F_UC_GENERIC (1 << 0) |
There was a problem hiding this comment.
I don't think "GENERIC" is really meaningful. Easier and cleaner I think would be just to pass 0 if we can't say anything better (we really should).
|
I disagree on logic outlined the diagrams, regardless if it can already be triggered by some module parameter: It is IMHO a very bad idea to allow the latter to happen. |
| if (delta < 0 && delta * -1 >= zfs_rs_get_fill(rs, rt)) { | ||
| zfs_panic_recover("zfs: attempting to decrease fill to or " | ||
| "below 0; probable double remove in segment [%llx:%llx]", | ||
| zfs_panic_recover_rt("zfs: rt_instance=%s: attempting to " |
There was a problem hiding this comment.
Cosmetics, but here and in other places I would reduce "rt_instance=" to "rt=", since it provides no useful information, and the line is too long. Or otherwise write full "range tree" to make it more human-readable if we don't care about length.
|
@ihoro So how about remove unneeded |
|
@ihoro ping? |
|
The debug info part has been extracted into a separate PR: #17581 |
|
This needs a rebase after #17581 has beed merged. |
|
I've got back into the context of this and there is a desire to revise the interface and expectations from the end user perspective. I think we could discuss the following topics:
The above is a conceptual discussion of what we would like to have. If we go down the technical road then we could discuss naming options, whether we want to use bool-like knobs or bitfield ones, whether it should collaborate with the existing |
This makes some sense to me. Duplicate frees found during loading has already happened and are already on disk, and didn't lead anywhere so far. Would it happen earlier (in other places), it could be more informative. |
To build on this a little bit, what I think we want is to add a |
Would it be feasible to 'panic' just the affected pool, preferably in a way that allows to (force) unload anything related to it without needing a (hard) reboot... instead halting the whole system? |
Would we be able to handle it as Brian described, we would not need to panic at all. |
|
Is there any update on this? I just run into space corruption issue on my homelab server with Happy to use the pool in its current state to test these changes and hopefully helping recover from the problem. |
|
@ihoro FYI, I managed to use this PR to recover my pool. Thanks a lot for your work! |
Motivation and Context
There are production cases when loading of a metaslab leads to a ZFS panic due to unexpected entries in its spacemap (presumably). The assertions in
zfs_range_tree_add_impl()andzfs_range_tree_remove_impl()fail due to overlapping or missing segments, etc. A business would like to go ahead with such pools while the root cause is being investigated.Description
The idea is to allow loading such metaslabs with a potential space leak as a trade-off instead of a potential data loss.
We already have
zfs_recovermodule parameter to mitigate various issues, including some range tree cases, and this patch addszfs_recover_msparameter to localize the recovery behavior to the metaslab loading process only.The following diagrams are expected to help with the details:
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by.