New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System freezes or reboots importing a degraded mirror image #806
Comments
OK so, if that is Intel, can we boot with keepsyms=1 on so we get a stack trace? I will create a 2 file mirror pool and offline one to see if I can make it happen here. |
OK don't need stack, I can reproduce here. One sec |
Update: |
Is this with todays build? |
No, the homebrew version installed a few days ago:
The file It would be better being able to import the pool in readwrite mode and continue using it. Update: Sorry, I need a different version, for Monterey-12 (not arm64). I made a mistake and ran the command on the wrong computer. The correct output is:
|
Ah ok, so you run arm64 - let me make you a build |
Thanks. First a note here to improve the homebrew version. The uninstall should work without manually issuing commands as root. I ran:
Running as root didn't work:
So I manually had to remove the services as root:
Then it worked (as regular user):
|
From above:
Thanks! |
The homebrew issues you might have to bring up with them, looks like it needs sudo on the launchctl lines. The Intel version was also posted, it is next to the arm64 one today. https://openzfsonosx.org/forum/download/file.php?id=458 and yes, Catalina will work on Monterey. (You can easily check what your machine is with "uname -a" or just "arch" since the brew info isnt clear) |
It seems to work now! I brought both mirror images online, and they resilvered.
Now scrub is running. However, I wonder whether there is something making the system slow.
|
Run a spindump while its being slow? Maybe there is a process taking all the cpu. Scrubbing does start slow, but just 12G in 30mins does seem too-slow. You could reboot in case it is something taking CPU, the scrub should restart. |
Made spindumps and a reboot, but the system remains slow.
Because of the size and for privacy reasons I have sent you the full spindumps via email to the address mentioned here, but the relevant threads (cpu time > 9.0s) seem to be: kernel_task:
launchd:
|
Hmm yeah what is launchd doing there, quite busy looking for memory. Does Looks like it is memory starved, so we could look at halving your ARC. This is also what we would experience when we've had memory leaks in the past, so it could be worth checking how much ZFS has, and if it is ever growing. |
After exactly one hour less than 10G (had to start a new scrub after reboot):
|
What is ARC, how can I halve it, and how can I check how much memory ZFS has (via ssh)? |
OK, if launchd tries to start something over and over, it should be in the logs. Check /var/log to see, maybe system_log. Maybe one of the ZFS scripts is broken and it tries to start it. Memory, you can use "sysctl kstat" to get all, generally far too much, so maybe add "sysctl kstat | grep inuse" . There's also the "total allocated by all ZFS".. which is... one sec kstat.spl.misc.spl_misc.spl_osif_malloc_bytes: 268431360 |
|
|
Bah hmm, run Memory usage looks ok. |
Nothing on ZFS in |
we are looking for launchd output tho, to see why its so busy |
Nothing on |
To see just launchd: to see last 10 mins |
One entry that appears every second:
|
(Then nothing follows.)
|
Certainly you should look at unloading net.langui.FTPServer. |
I will do that. Maybe uninstall it altogether, as I don't need it anymore. One second. |
Yeah, its just in there, but unlikely to be the root. You could unload the ZFS ones to see if launchd stops being in top.
Possibly with "sudo". |
Ran |
Result for
|
Yep |
Works, but still a 100% cpu Result for
|
Try it with unload->disable and unload->stop ? Not sure if the "unload" command will both disable and stop |
... still the 100% ... |
|
Hmm what is surprising. What about disabling spotlight in the ZFS datasets; |
Spotlight is already disabled for all ZFS filesystems. |
And no ZFS filesystems are currently mounted. |
Not sure then, you could run spindump against launchd and try to see what its doing. I can't check my mail until I'm home again this evening. |
There is a weird amount of "fsync" going on, and exfat is very slow with fsync. ZFS issues one for every write, and launch keeps issuing them as well. |
Ok, the exfat HDD might be the problem.
I had made a backup of one image to the exfat HDD and am using now one image on a hfs HDD together with the backup image on the exfat HDD. |
I will take the image on the exfat HDD offline. |
Ran: |
There is still a 100% |
Thanks for your help. |
By the way, which one is the correct project to file bug reports? |
And again, thanks for your excellent support! |
Ok I shall update the website, if I remember :) Although I do still check both, the openzfs one is more correct |
I have updated the link here: https://openzfsonosx.org/wiki/Getting_involved#Fix_and_report_bugs |
The system freezes with:
sudo zpool import -d pool.mirror1.img pool
The system reboots with (readonly mode):
sudo zpool import -N -d pool.mirror1.img pool
This happens although the pool is recognized:
Diagnose after reboot:
The text was updated successfully, but these errors were encountered: