-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS 10.15 Catalina support ✅ #721
Comments
I attempted to check if Catalina worked last week, but found that VMWare Fusion does not work with it yet. I've been waiting for a fix for fusion :) |
The only issue I have found with Catalina and 1.9.1 rc1 is that ZFS pools no longer auto mount on login. I have to run sudo zpool import xxx manually. I think it's to do with allowing access to removable volumes but I don't know how to fix that! |
@dgsga Hmm, or the permissions to use launchDaemons for this kind of stuff - don't know if the zpool-import-all script actually still gets run or not |
I encountered anther bug on Catalina. When under high IO (I think), Catalina will crash (without showing the kernel panic screen) with half a second of loud fan noise. I encountered this a lot in Beta 1 and therefore revert to 10.14, I have not saved the crash report as I thought it was a Catalina problem and will be address by apple. ((I think it was a segfault, but not sure if I remembered it correctly)) However, today when I tried out 10.15b5, it happened exactly once but significantly less often than before. Unfortunately this time I don't get a crash report but I will try my best to reproduce it and upload the report once I success. |
@JMoVS I'm using 1.9.2 and all the pools are imported automatically (but the volume is internal ssd instead of external) |
@michael-yuji I've had exactly the same problem with a kernel panic when under high IO such as Spotlight, Photos.app or Sync.app indexing. The same thing occasionally happened in Mojave, where spl.kext rather than zfs.kext was highlighted in the kp report. |
One of my laptop is using 1.8.1 with Catalina and it panics, this time luckily I got a crash report:
|
zfs and spl 1.8.1 are really quite old by now, can you try upgrading to 1.9.2 and let us know if it happens there as well? Also, are you familiar with boot-args? keepsyms=1 would be helpful |
Sure, it was an accident when I boot from this laptop and use it for a while and crashed (which I am very happy about it cuz I finally got a crash report). I am going to upgrade it and use it until it panic again lol. |
Heads up: Apple's being problematic and telling some of us to update to Catalina (even on unsupported Macs (MacPro5,1 & 4,1) for some bizarre reason) on certain bugs in Mojave. I'm kinda baffled and have attempted to have conversations with Apple dev staff via Bug Report/Feedback Ass., but not much luck. Essentially I've reported "blah is happening in 10.14.6" and their reply "Please try beta X of 10.15" and let us know if the problem is resolved. I'm pretty disturbed and upset by this behavior by Apple, but I've heard of others hitting the same issue now too as I search the web. I'm working on moving to Catalina here myself at the moment via the "unsupported methods" to see if my problems are indeed resolved as Apple has instructed, but it's a headache and some issues such as ZFS trouble has me worried. |
this is still a problem with 1.9.2 an Catalina beta 7 |
OK, so in Catalina it appears our
When trussing we get
The sources for DAProbe.c:
Which seems to imply we aren't matching (although it picks zfs.fs ok, then reject it?) As 0x2D is 45, the error is ENOTSUP, which means we are probably running afoul of these tests: https://github.com/appleopen/DiskArbitration/blob/master/diskarbitrationd/DAFileSystem.c#L645 However, I have tried copying |
OK, turns out we should have a |
@lundman Does it hurt to put that t the fsck_zfs also for older versions? Otherwise could you push a commit to master to fix this? |
Not at all, should be fixed for all versions yep |
OpenZFSonOsX-Catalina-1.9.2.zip I have done a test build using Xcode 11, and Catalina, which also has the |
Halfway-Off Topic, but what does concern me (a lot) now is: How's a future of openzfsonosx (post-Catalina) possible, with the deprecation of kexts? How would a volume- and filesystem be even thinkable in userland? Will it wait for photoshop to finish rendering before committing the ZIL? Lorenzo |
I have just compiled the latest commit on Catalina DP8 using the Xcode 11 GM, all is working perfectly here. Thanks Jorgen for all your hard work. |
Apple has made developing on osx a little less friendly in recent times, that is true, and there probably will be a day in the future when we can no longer maintain support. But until that time! |
Also, as far as anything Apple has said so far, there are specific categories of kernel extensions that Apple is transitioning to DriverKit (USB HID devices, serial devices, NICs), NetworkingDriverKit, and Endpoint Security extensions... and filesystems are not one of those categories. It seems unlikely to me that Apple will completely eliminate the ability to install kernel extensions on macOS. |
I can just about guarantee that any panics that have the ZFS kext in it will create a flag with them and they'll more seriously consider it. I wonder if there's a way to build an exception handling mechanism into ZFS that will catch a panic before it goes back to the kernel and send that data over here for processing?... |
Also, if that's really a concern, maybe just don't send the panic reports to Apple if you're generating lots of them due to testing/adding new features/etc... I haven't had a ZFS panic in nearly forever running the stable releases with my couple of pools. |
I installed Jorgen's test build, but unfortunately that did not solve the problem of frequent panics for me. Panics happen now more often since installing 15.1 beta 3 (it had been pretty stable since 15.0 beta 5 or so) , possibly related to that Mail decided to re-download all my hundred thousands of emails -- so I'm not sure if the frequent reboots are related to more disk activity or some additional changes in beta 3. |
If you are having panics on Catalina, we'd need to have the stack pasted, with keepsyms=1 so we can take a look at it. |
here's the stack I saved last time, I'll set keepsyms=1 for next time...
|
... and here the most recent crash (on a different machine) with keepsyms=1
|
I'd love to be as optimistic. But what if Apple® simply doesn't care about other filesystems than those they support directly? They're tying more and more functionality (see the /Users APFS "Volume(s)") directly to their own filesystem. Even more, they actually want us to interact with the filesystems at a more abstract, "guided" level. The Mac used to be the platform for software development lately, be it for mac apps or for anything else (except maybe for .NET). The day they close down on all this - with a loud scream of pain - I'll have to have a new "home" up and running... Best to All. And Yes, until then, I'll be keeping my reality distortion field clean and colorful, and install, test, and most of all: enjoy each and every new release of openzfsonosx...! :-) |
Well, that's .. something. So it triggered a reap, and discovered a corrupt memory segment ( |
@lopezio you sound like my clone. I don't mean to keep hijacking this thread (yeah, I think we need another place to talk about this), but I do want to simply say this is what I believe (and clearly see) and have also been talking about "around the cooler" with folks. I've also heard some Apple engineers who used to work there say the same things and hear the same things from others that still do work there. Mobile and app security for their own cash security is their baby now - not us devs and high end users. |
@lundman unfortunately the panic has become rather frequent with 15.1beta3, pretty consistently happening under load (e.g. I keep my mail library in a ZVOL, and having Apple Mail catch up on incoming emails seems to consistently cause the panic...). It's also happening both on my MacBook Pro and my Mac mini, and the issue goes away when I boot back into MacOS 14, with the same 1.9.2 release. I'm wondering if not more people are seeing this? |
@rottegift, many thanks! I will work on some replies! Just a moment, please. |
hermione:~ mdw$ sysctl zfs spl Please note that, even though it says "1.9.3-0" for the version, I have installed 1.9.3.1, i.e., the latest version from the OpenZFS on X website. Just to be absolutely sure of this, I just reinstalled 1.9.3.1 again, and I verify that this information stays the same. Just FYI! |
hermione:~ mdw$ zpool status -v
errors: No known data errors |
The data consists of 56 plain text files, all ASCII characters, nothing strange at all, rendered by a C++ program that I've been using for 14 years. There are 4 additional files: a bash script, a backup of the bash script, the nohup.out file, and the binary generated by the C++ file. I've been copying by Finder just using drag-and-drop, although I'm pretty sure that things die if I copy the files by bash; I think I was moving them that way a week ago, and I can try it that way again, if you like. The data is highly compressible. I think I was getting something like 8.0x compression in the zpool when it only contains this data. |
|
|
This is always the case: I can't use the Finder to reboot, because it always says: So I generally reboot the computer from another Mac (with remote login using ssh and the command: sudo shutdown -r now). Alternatively, if I reboot by holding down the power button on the Mac, it usually requires 2 reboots. During the first reboot, I usually get 90% of the way through the startup splash screen and then I don't get any further, so I usually need to hold down the power button again and reboot, and things go very cleanly the second time. So I elected to just start rebooting using ssh from another machine, because only 1 reboot is required. I'm going to go reboot right now. Then I will come back to your questions 5 and 6. Thanks for your patience. |
I did this as an administrator, in a bash shell, just FYI: It wrote exactly 82407849984 bytes and then died. The bash shell that I was using is hung now. I can only see the size of the data transfer by using another bash shell. |
OK, maybe I spoke too soon, when I said that the bash shell died. It now said: 78591+0 records in |
Indeed, it looks like something got hung right around the time that I killed the process, because at time 13:33, the resulting file size was 82407849984 bytes, and then at 13:36, the file size became 82408636416 bytes. |
I'm going to reboot and try this process of transferring random bits, one more time. I'll be back. A reboot probably is not required, but I am just doing it to refresh this experiment entirely. The reboot is usually only needed because when a process like this dies, the operating system still has references to each file that was trying to transfer, as you know, so it won't even allow me to nicely reboot, when we die during a regular file transfer. |
OK, looks like this time it wrote 82680348672 bytes and then basically died at 13:47 |
Here's the present spindump, before I go! |
Still stuck at 84115718144 bytes. I'll be back in (say) 20 minutes. |
Ok, this is all interesting. I'll have some more thoughts in a little while, but in the mean time can you share the output of
Also mds_stores in spindump.4.txt is an enormous factor and it would be helpful if you would:
prior to doing any further tests of writing into "tank" in the next few hours. mds_stores is part of Spotlight and is being very aggressive at chasing the already-written data in the spindump.4.txt case, but falls behind because it's low priority, and thus starts causing actual I/Os to the disk because the data it wants has aged out of the cache. It is also almost certainly holding mmap() references on the files you're write()-ing to indirectly via DesktopServicesHelper, which is an unhelpful complication. Finally, DesktopServicesHelper appears to be writing small chunks to multiple files, and the compressibility of the data and the slowdown is causing a lot of additional slow memory allocations. It's possible that after a significant wait, your hang would resolve itself as when you thought the hang had happened earlier. That's not a workaround either, and there's no point waiting for more than, say, 30 minutes after the apparent hang. The wait may allow for the draining of a sort of priority inversion that low-IO-priority mds_stores is causing by mmap()ing files that are being written to by high-priority DesktopServicesHelper. Unfortunately this crossed while you were reporting some results of your dd test, and while I was dealing with other things, so I haven't had the chance to absorb the results of whatever happened during dd. If you also hang during dd, especially if you hang with mdutil -i off and .metadata_never_index in place, please take a spindump during the apparent hang. |
Still stuck at 84115718144 bytes... and now I'll share one more spindump and then kill that process. |
in the mean time can you share the output of Yes, indeed! Here you go: kstat.zfs.darwin.tunable.async_write_min_dirty_pct: 30 |
Aha, I hate Spotlight! Why didn't I think of turning off Spotlight?!?! I should have thought of that. OK, now I ran both of these: $ sudo mdutil -i off /Volumes/tank and now the Indexing is disabled. Wow, I wish I had thought of that earlier. |
I'm not sure if you would prefer for me to rest from further tests at the moment, or to go ahead and try some more, with the Spotlight turned off. I'll wait to hear from you, before I do any more writing or reading to/from the tank. I'll just leave things alone for now. Indeed, I'll do a reboot so that I'm ready to start fresh, whenever you are. I can't thank you enough for your suggestions! I'll avoid doing anything else until I hear back from you. Thank you! |
Go ahead and try more with spotlight off. The problem is one thread in spindump5.txt (the zio_execute thread with the kernel_memory_alloc), and I cannot tell if it's the writer-driven thread or the reader-driven thread. The thread is causing massive headaches for the kernel allocator, in particular the OS is spending a lot of time hunting for a good place to grab more memory for zfs. How much memory is in your system? Do you have anything at all in /etc/zfs/zsysctl.conf ? Have you set any tunables yourself at this point? The 23 GiB for the dbuf cache implies a system memory of a terabyte and a half, whereas the 4GiB dirty data means the memory is large enough that we cap it at zfs_dirty_data_max_max (which is not a runtime tunable). One of the problems with the dbuf cache being that big is that (a) it holds uncompressed copies of the data that would also want to be in the ARC compressed and (b) it's effectively FIFO (LRU with no reuse if mds_store lags behind enough, which it will, or if spotlight is disabled) so a sequential write like you're doing results in filling up a dbuf buffer, and if the dbuf cache is full, we have to where we evict some dbufs from its tail before allocating at the head. It gets a bit more complicated because there is a trailling reader (mds_store) which will also pull in compressed blocks into the arc, which will then be decompressed and put in to the dbuf effectively-FIFO cache before being copied into mds_store's address space. Because of that complication it's hard to tell whether it's the reader wanting more dbufs or the reader wanting more dbufs than ZFS's allocator has on hand. Either way, ZFS is pestering the operating system with allocations, and likely frees. XNU's default kernel allocator is not awesome when pestered like that. :-( My thought is that the dbuf cache is somehow mis-sized for your real system memory. (Edit: it is, because dbuf.c uses 1/64 of system mem by default, see #750 ). In particular your kstat.zfs.darwin.tunable.dbuf_cache_max_bytes is unaccountably enormous and that's definitely not helping things (and might be the root of the problem here). The huge dbuf cache, and that there are several zio_execute threads misbehaving in spindump4.txt, leads me to think the problem is on the writer side. Your kstat.zfs.darwin.tunable.zfs_dirty_data_max is also very large (and in particular too large for realistic spinny disks). What's happening there is that because your writer threads (dd or DesktopServicesHelper) are faster than the actual spinny disks, then after the first five to ten seconds seconds of writing test, they will be stuffing data into the dbuf cache up to zfs_dirty_data_max. Try the following:
and see how that affects your writing tests. Again, if things lock up, a spindump file will be helpful to look at. |
Oh, @mdw333 actually has that much RAM ("a Mac Pro with 1.5 TB of RAM"), so the dbuf cache size not unaccountable, since it depends on dbuf_cache_shift = 5. We should cap the dbuf cache at something reasonable, like 1 GiB, rather than killing systems with hundreds of GiB or more of RAM, @lundman and @rottegift . It's hard to imagine a system that has that much RAM and really really needs that big an amount of uncompressed copies of ARC data lingering around in the dbuf cache, and not hard to imagine (and in fact we have just found) a dbuf cache being too big in practice on a system with even just 64 GiB of memory. @mdw333 : setting the dbuf cache to 1 GiB will help with your workload, and in more general workloads. You could even make it as small as 512 MB. If your testing against the sysctl in the previous message bears out this diagnosis, you can add these two to /etc/zfs/zsysctl.conf (which will take effect on your kext load, or reboot) :
|
I apologize for disappearing for two hours! I didn't realize that my wife and kiddos were going to a short music concert at our library, so I went with them. Thanks for diagnosing things! Yes, I do have 1.5 TB of RAM installed in this beast. OK, I ran what you suggested (as Administrator), namely: in the bash shell, and then, in that same shell, I called: Here's the spin dump. |
@rottegift if we ever figure all of this out, I'm going to owe you a beer! (or an orange juice, if you are a teetotaler like me) I'm willing and able to continue adjusting anything that you think I should adjust. I hope that this hard work will make it easier for other people, down the road. I want to help, and I appreciate your help so far! |
I am fond of replicates, so I did the experiment again. Almost exactly the same effect. I got 103993180160 bytes in the data transfer this time before things died. sudo sysctl kstat.zfs.darwin.tunable.zfs_dirty_data_max=536870912 kstat.zfs.darwin.tunable.dbuf_cache_max_bytes=1073741824 sudo dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k |
One more replicate of the experiment, for good measure! (I can't help it! I'm a scientist.) |
@lundman ... has that time now arrived, with mac OS 11 |
What is the full situation with macOS 11 Big Sur? I'd like to understand the core OS / kext, etc. situation completely. With the joke that was WWDC we've just put holding patterns on all Apple equipment upgrades at this point due to this situation. Windows isn't really an option and we still need macOS software, but we're looking for emulation paths and other options since Apple has proven hostile to flexibility and other ideals that are critical for some of us and was the original reason for having moved to Apple 18 years ago. |
macOS 11.0 aka Big Sur will still have Kernel Extensions. That's based on what the WWDC Session about Silicon on what they said in the "Platforms State of the Union" (aka Developer Keynote) and "Explore the new system architecture of Apple Silicon Macs" sessions. Judging from that there seems to be little change in IOKit. They mentioned one change related to IOMMU for DMA, but it's not clear if that's just for the Apple SoC or for x86 as well. Please refrain from commenting on this specific issue unless it's about the specific "problems" discussed here. It helps those who are interested or affected about what's discussed up in earlier comments. Thank you |
Status on kext on Big Sur: openzfsonosx/openzfs#8 |
It's still early, so I don't expect there to already be support for the new macOS Catalina beta, but surprisingly it worked! I figured I'd open a ticket to help track progress on any bugs. (Also to serve as a resource for people like me who Googled
"zfs" "macOS" OR "osx" "catalina" OR "10.15"
and got no real results.)10.15 Beta (19A487)
1.9.0-1
OpenZFS_on_OS_X_1.9.0.dmg/OpenZFS on OS X 1.9.0 Mojave.pkg
After downloading it from the homepage, I ran the Mojave installer on my system and it failed the first time with a yellow warning at the end of the last page in Install.app.
However, after immediately trying a second time it seems to have succeeded and be working perfectly now.
The one minor thing that could be fixed is to enable installing via homebrew cask (once more people confirm it's stable):
The text was updated successfully, but these errors were encountered: