-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a First Aid utility #90
Comments
Maybe make an assistant that guides the user through the process, and along the way eplains in plain English what is being run and why, and that it means. Ask for confirmation before doing anything innvasive. Maybe have a details section that shows and explains the commands being run, linking/citing their man page. This way the user learns about zfs tools while the job is being performed (unless on the Mac, where commands are run but not really explained.) The low-hanging fruit (much better than having nothing!) would be an assistant that just tells the user the steps, but has them type in the commands for themselves. Once this has been tried and tested, one could then offer to execute the commands for them in a later version. Is the following workflow sane? Are there additional/better steps? Phase 1: Recovery (non-invasive)
At this point we may be able to copy data from e.g., /mnt/usr/home. Using tar
Getting
What can we do so that the operation does not stop if some files cannot be read? We want to copy off as much data as possible. Using rsync
This nicely prints out errors but continues.
Since rsync verifies what it copies it should probably be preferred over Since rsync discards defective files, it may be advisable to consider adding some additional process that would try to at least partically recover files that could not be copied with rsync. How? Phase 2: Repair (Invasive)
Despite this error, everything but
When it is done it says:
At this point we still have data errors:
Note At this point I gave up since I could not find a way to fix the errors. Phase 3: Diagnose
|
Generally, worth noting: https://www.freebsd.org/cgi/man.cgi?query=zpool-import(8) option Keyword: extreme https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/ "… the only choice to repair the data is to restore the pool from backup …" is not always true. Sometimes all that's required is a scrub. @probonopd I should recommend separate diagnosis of what happened in your case. Happy to discuss in Matrix. |
No change in my case:
Is there any way to at least understand what is wong with that I only see boot-related issues in zrepl snapshots, so I still wonder why the system became unbootable.
|
|
There's that wish, but there's a parallel need to know how much of what's to be copied was subject to a prior error. For this, I should treat output from … subsequent use of rsync(1) might copy (from good media) data that is corrupt.
Given this, after a scrub:
– those are the most appropriate actions. (For a ZFS pool to self-heal, to 'fix' itself, typically requires more then one device in the pool.) It'll be good for someone with ZFS expertise to have a glance at this case. |
My line of thinking is:
|
ZFSI hesitate before referring to Wikipedia but these words from https://en.wikipedia.org/wiki/ZFS#Data_recovery are relevant:
Essentially: there's inadequate redundancy. Re: #90 (comment) if the S.M.A.R.T. status is to be trusted then I wonder whether there was a problem with your SATA connection at the time(s) of problem(s) occurring.
zdb(8) is your friend however it is:
Cases can be very diverse. Generalised repair or recovery of data from a compromised ZFS pool is (I think) out of scope for a helloSystem First Aid utility. If you attempt this without openzfs/zfs#7912 I foresee frustration and/or disappointment – if not for you, then (eventually) for some other user of the OS. Other file systemsA front end to fsck(8) is a good idea but this, I think, falls more under the umbrella of #61 Data recovery
Be aware of things such as this: – however if you envision any such thing within helloSystem, then you should be prepared for end users to expect or demand support from the helloSystem community, when it's more appropriate to seek support elsewhere. A world of pain. |
Not sure whether |
So, Storage/Disk Utility should not have check or repair capabilities? |
I was thinking of separate utilities because implementing First Aid seems much easier to me than Disk Utility, but once we have Disk Utility we could put the functionality in there for sure. |
#90 (comment) we have the result of the self-test but I forgot to ask the obvious – thanks to a hint from idwer in
|
|
I think recoverdisk is not usable for SSDs, but we should offer that route for optical media and spinning mechanical drives in the First Aid utility |
Re: helloSystem/Utilities#33 (comment) and maybe overlapping with #61 Consider enhancing helloSystem's custom installer for FreeBSD to:
For (1) and (2) the essence should be for the (multi-purpose) recovery system to be never spoilt by an end user. So a suitably sized partition may be preferable to a recovery boot environment within the same pool as helloSystem boot environments. recoveryOS and diagnostics environments on Mac computers - Apple Support |
Thanks for your thorough research @grahamperrin but I think our focus should be on recorvery and repair for native filesystems before we even think about "alien" ones. Probably repairing defective APFS filesystems is best left to macOS. |
Thanks, I added APFS because it's the primary file/storage system for Apple users, moreso because helloSystem is designed to appeal to users of Apple hardware. |
From one day to the next, the bootloader greeted me with
This teaches me a couple of things:
The text was updated successfully, but these errors were encountered: