Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS 10.15 Catalina support ✅ #721

Open
pirate opened this issue Jul 8, 2019 · 114 comments
Open

macOS 10.15 Catalina support ✅ #721

pirate opened this issue Jul 8, 2019 · 114 comments

Comments

@pirate
Copy link

pirate commented Jul 8, 2019

It's still early, so I don't expect there to already be support for the new macOS Catalina beta, but surprisingly it worked! I figured I'd open a ticket to help track progress on any bugs. (Also to serve as a resource for people like me who Googled "zfs" "macOS" OR "osx" "catalina" OR "10.15" and got no real results.)

  • macOS 10.15 Beta (19A487)
  • openzfsonosx 1.9.0-1
  • installer used OpenZFS_on_OS_X_1.9.0.dmg/OpenZFS on OS X 1.9.0 Mojave.pkg

After downloading it from the homepage, I ran the Mojave installer on my system and it failed the first time with a yellow warning at the end of the last page in Install.app.
However, after immediately trying a second time it seems to have succeeded and be working perfectly now.

➜  zpool --version
zfs-1.9.0-1
zfs-kmod-1.9.0-1
➜  sudo gdd if=/dev/zero of=~/Desktop/test.zpool bs=1M count=128
➜  sudo zpool create test ~/Desktop/test.zpool
➜  sudo zfs mount -a
➜  sudo zpool status
➜  sudo echo "test" > /Volumes/test.txt && sync && cat /Volumes/test.txt
# everything works as expected for raw file vdevs
➜  sudo zpool create -f -o ashift=12 \
            -O casesensitivity=insensitive \
            -O normalization=formD \
            -O compression=lz4 \
            -O utf8only=on \
            -O sync=disabled \
            test2 mirror disk6 disk7
# everything also works as expected for two mirrors on Samsung FIT 32GB USB key vdevs
# files read and write correctly, and the pool still work after disconnecting and reconnecting the usb keys

The one minor thing that could be fixed is to enable installing via homebrew cask (once more people confirm it's stable):

➜  brew cask install openzfs
==> Caveats
To install and/or use openzfs you may need to enable its kernel extension in:
  System Preferences → Security & Privacy → General
For more information refer to vendor documentation or this Apple Technical Note:
  https://developer.apple.com/library/content/technotes/tn2459/_index.html

==> Satisfying dependencies
Error: Cask openzfs depends on macOS release being one of [10.9, 10.10, 10.11, 10.12, 10.13, 10.14], but you are running release 10.15.
@lundman
Copy link
Contributor

lundman commented Jul 8, 2019

I attempted to check if Catalina worked last week, but found that VMWare Fusion does not work with it yet. I've been waiting for a fix for fusion :)

@ghost
Copy link

ghost commented Jul 23, 2019

The only issue I have found with Catalina and 1.9.1 rc1 is that ZFS pools no longer auto mount on login. I have to run sudo zpool import xxx manually. I think it's to do with allowing access to removable volumes but I don't know how to fix that!

@JMoVS
Copy link
Contributor

JMoVS commented Aug 2, 2019

@dgsga Hmm, or the permissions to use launchDaemons for this kind of stuff - don't know if the zpool-import-all script actually still gets run or not

@michael-yuji
Copy link

I encountered anther bug on Catalina. When under high IO (I think), Catalina will crash (without showing the kernel panic screen) with half a second of loud fan noise.

I encountered this a lot in Beta 1 and therefore revert to 10.14, I have not saved the crash report as I thought it was a Catalina problem and will be address by apple. ((I think it was a segfault, but not sure if I remembered it correctly))

However, today when I tried out 10.15b5, it happened exactly once but significantly less often than before. Unfortunately this time I don't get a crash report but I will try my best to reproduce it and upload the report once I success.

@michael-yuji
Copy link

@JMoVS I'm using 1.9.2 and all the pools are imported automatically (but the volume is internal ssd instead of external)

@ghost
Copy link

ghost commented Aug 4, 2019

@michael-yuji I've had exactly the same problem with a kernel panic when under high IO such as Spotlight, Photos.app or Sync.app indexing. The same thing occasionally happened in Mojave, where spl.kext rather than zfs.kext was highlighted in the kp report.

@michael-yuji
Copy link

michael-yuji commented Aug 26, 2019

One of my laptop is using 1.8.1 with Catalina and it panics, this time luckily I got a crash report:

panic(cpu 2 caller 0xffffff800a065b5a): Kernel trap at 0xffffff7f8c2fd7e5, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x00002007210001e0, CR3: 0x000000000e3f1000, CR4: 0x00000000003626e0
RAX: 0x0000000000000010, RBX: 0xffffff9210bcdfb0, RCX: 0xffffff9210bcdfc0, RDX: 0xffffff81fabbbe18
RSP: 0xffffff81fabbbdc0, RBP: 0xffffff81fabbbdc0, RSI: 0xffffff9210bcdfb0, RDI: 0xffffff9208818fb0
R8:  0x0000200721000000, R9:  0x0000000000000002, R10: 0x0000000000000001, R11: 0xffffff9211046fc0
R12: 0xffffff91fbf391b8, R13: 0xffffff9210bcdfc0, R14: 0x0000000000000010, R15: 0xffffff9208818fb0
RFL: 0x0000000000010282, RIP: 0xffffff7f8c2fd7e5, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x00002007210001e0, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff81fabbb820 : 0xffffff8009f3cb9b 
0xffffff81fabbb870 : 0xffffff800a073d45 
0xffffff81fabbb8b0 : 0xffffff800a0657ab 
0xffffff81fabbb900 : 0xffffff8009ee3bb0 
0xffffff81fabbb920 : 0xffffff8009f3c287 
0xffffff81fabbba20 : 0xffffff8009f3c66b 
0xffffff81fabbba70 : 0xffffff800a6ccc69 
0xffffff81fabbbae0 : 0xffffff800a065b5a 
0xffffff81fabbbc60 : 0xffffff800a06585c 
0xffffff81fabbbcb0 : 0xffffff8009ee3bb0 
0xffffff81fabbbcd0 : 0xffffff7f8c2fd7e5 
0xffffff81fabbbdc0 : 0xffffff7f8c2f90d1 
0xffffff81fabbbe00 : 0xffffff7f8c2f93ec 
0xffffff81fabbbe30 : 0xffffff7f8c2fb9ad 
0xffffff81fabbbe70 : 0xffffff7f8c301554 
0xffffff81fabbbeb0 : 0xffffff7f8c2fc8b8 
0xffffff81fabbbee0 : 0xffffff7f8c300a5c 
0xffffff81fabbbf10 : 0xffffff7f8c305f16 
0xffffff81fabbbfa0 : 0xffffff8009ee313e 
      Kernel Extensions in backtrace:
         net.lundman.spl(1.8.1)[F931881B-FB27-3712-8C57-4DF33E9CCD48]@0xffffff7f8c2f8000->0xffffff7f8d4ecfff

BSD process name corresponding to current thread: kernel_task

Mac OS version:
19A536g

Kernel version:
Darwin Kernel Version 19.0.0: Fri Aug  9 21:59:46 PDT 2019; root:xnu-6153.0.139.161.2~2/RELEASE_X86_64
Kernel UUID: E2D7BDCF-3936-31FC-B884-D01BB1F44587
Kernel slide:     0x0000000009c00000
Kernel text base: 0xffffff8009e00000
__HIB  text base: 0xffffff8009d00000
System model name: MacBookPro13,3 (Mac-A5C67F76ED83108C)
System shutdown begun: NO
Panic diags file available: YES (0x0)

System uptime in nanoseconds: 5872071293263
last loaded kext at 3593327050318: >usb.cdc.acm	5.0.0 (addr 0xffffff7f8e9be000, size 32768)
last unloaded kext at 4310866791230: >!UMergeNub	900.4.2 (addr 0xffffff7f8e6cf000, size 12288)
loaded kexts:
com.intel.kext.intelhaxm	7.3.2
net.lundman.zfs	1.8.1
net.lundman.spl	1.8.1
@kext.AMDFramebuffer	3.0.0
@kext.AMDRadeonX4000	3.0.0
>AudioAUUC	1.70
@kext.AMDRadeonServiceManager	3.0.0
>!AGraphicsDevicePolicy	4.1.30
@fileutil	20.036.15
@filesystems.autofs	3.0
@AGDCPluginDisplayMetrics	4.1.30
>!AHV	1
|IOUserEthernet	1.0.1
|IO!BSerialManager	7.0.0d105
>!AUpstreamUserClient	3.6.8
>pmtelemetry	1
>AGPM	111.1.18
>X86PlatformShim	1.0.0
>!APlatformEnabler	2.7.0d0
>!A!ISKLGraphics	14.0.0
@Dont_Steal_Mac_OS_X	7.0.0
>AGDCBacklightControl	4.1.30
>!AHDA	283.13
@kext.AMD9500!C	3.0.0
>!AThunderboltIP	3.1.2
>!AHIDALSService	1
>eficheck	1
>!AMuxControl	4.1.30
>SMCMotionSensor	3.0.4d1
>!A!IPCHPMC	2.0.1
>!AGFXHDA	100.1.421
>!AEmbeddedOSSupportHost	1
>AirPort.BrcmNIC	1400.1.1
>!A!ISKLGraphicsFramebuffer	14.0.0
>!A!ISlowAdaptiveClocking	4.0.0
>!AMCCSControl	1.10
>!AVirtIO	1.0
@filesystems.hfs.kext	522.0.5
@!AFSCompression.!AFSCompressionTypeDataless	1.0.0d1
@BootCache	40
@!AFSCompression.!AFSCompressionTypeZlib	1.0.0
>!ATopCaseHIDEventDriver	153
@filesystems.apfs	1412.0.16
@private.KextAudit	1.0
>!ASmartBatteryManager	161.0.0
>!AACPIButtons	6.1
>!ARTC	2.0
>!ASMBIOS	2.1
>!AACPIEC	6.1
>!AAPIC	1.7
$!AImage4	1
@nke.applicationfirewall	302
$TMSafetyNet	8
@!ASystemPolicy	2.0.0
|EndpointSecurity	1
@kext.AMDRadeonX4100HWLibs	1.0
@kext.AMDRadeonX4000HWServices	3.0.0
@kext.triggers	1.0
|IOAVB!F	800.16
>!ASSE	1.0
>DspFuncLib	283.13
@kext.OSvKernDSPLib	529
@!AGPUWrangler	4.1.30
>!ABacklightExpert	1.1.0
>!AHDA!C	283.13
|IOHDA!F	283.13
@kext.AMDSupport	3.0.0
>!AGraphicsControl	4.1.30
|IOAudio!F	300.2
@vecLib.kext	1.2.0
|IONDRVSupport	558
|IO!BHost!CUARTTransport	7.0.0d105
|IO!BHost!CTransport	7.0.0d105
>!A!ILpssUARTv1	3.0.60
>!A!ILpssUARTCommon	3.0.60
>!AOnboardSerial	1.0
|IO80211!F	1200.12.2b1
>mDNSOffloadUserClient	1.0.1b8
>corecapture	1.0.4
@!AGraphicsDeviceControl	4.1.30
|IOAccelerator!F2	438.1.17
|IOSlowAdaptiveClocking!F	1.0.0
>!ASMBus!C	1.0.18d1
|IOGraphics!F	558
>X86PlatformPlugin	1.0.0
>IOPlatformPlugin!F	6.0.0d8
@plugin.IOgPTPPlugin	800.14
|IOEthernetAVB!C	1.1.0
|IOSkywalk!F	1
>usb.cdc.ncm	5.0.0
>usb.!UiBridge	1.0
>usb.cdc	5.0.0
>usb.networking	5.0.0
>usb.!UHostCompositeDevice	1.2
|IOSerial!F	11
|IOSurface	269.6
@filesystems.hfs.encodings.kext	1
>!AActuatorDriver	3400.32
>!AHIDKeyboard	209
>!AHS!BDriver	153
>IO!BHIDDriver	7.0.0d105
|IO!B!F	7.0.0d105
|IO!BPacketLogger	7.0.0d105
>!AMultitouchDriver	3400.32
>!AInputDeviceSupport	3400.25
>!AHSSPIHIDDriver	58
>!AThunderboltDPInAdapter	6.1.9
>!AThunderboltDPAdapter!F	6.1.9
>!AThunderboltPCIDownAdapter	2.5.2
>!AHSSPISupport	58
>!A!ILpssSpi!C	3.0.60
|IONVMe!F	2.1.0
>!AThunderboltNHI	5.5.8
>!AHPM	3.4.4
|IOThunderbolt!F	7.4.5
>!A!ILpssI2C!C	3.0.60
>!A!ILpssDmac	3.0.60
>!A!ILpssI2C	3.0.60
>!A!ILpssGspi	3.0.60
>usb.!UXHCIPCI	1.2
>usb.!UXHCI	1.2
>usb.!UHostPacketFilter	1.0
|IOUSB!F	900.4.2
>!AEFINVRAM	2.1
>!AEFIRuntime	2.1
|IOSMBus!F	1.1
|IOHID!F	2.0.0
$quarantine	4
$sandbox	300.0
@kext.!AMatch	1.0.0d1
>DiskImages	493.0.0
>!AFDEKeyStore	28.30
>!AEffaceable!S	1.0
>!AKeyStore	2
>!UTDM	489.0.2
|IOSCSIBlockCommandsDevice	422.0.1
>!ACredentialManager	1.0
>KernelRelayHost	1
>!ASEPManager	1.0.1
>IOSlaveProcessor	1
|IOTimeSync!F	800.14
|IONetworking!F	3.4
|IOUSBMass!SDriver	157.0.1
|IOSCSIArchitectureModel!F	422.0.1
|IO!S!F	2.1
|IOUSBHost!F	1.2
>!UHostMergeProperties	1.2
>usb.!UCommon	1.0
>!ABusPower!C	1.0
|CoreAnalytics!F	1
>!AMobileFileIntegrity	1.0.5
@kext.CoreTrust	1
|IOReport!F	47
>!AACPIPlatform	6.1
>!ASMC	3.1.9
>watchdog	1
|IOPCI!F	2.9
|IOACPI!F	1.4
@kec.pthread	1
@kec.Libm	1
@kec.corecrypto	1.0

@JMoVS
Copy link
Contributor

JMoVS commented Aug 26, 2019

zfs and spl 1.8.1 are really quite old by now, can you try upgrading to 1.9.2 and let us know if it happens there as well?

Also, are you familiar with boot-args? keepsyms=1 would be helpful

@michael-yuji
Copy link

zfs and spl 1.8.1 are really quite old by now, can you try upgrading to 1.9.2 and let us know if it happens there as well?

Also, are you familiar with boot-args? keepsyms=1 would be helpful

Sure, it was an accident when I boot from this laptop and use it for a while and crashed (which I am very happy about it cuz I finally got a crash report). I am going to upgrade it and use it until it panic again lol.

@ylluminate
Copy link

Heads up: Apple's being problematic and telling some of us to update to Catalina (even on unsupported Macs (MacPro5,1 & 4,1) for some bizarre reason) on certain bugs in Mojave. I'm kinda baffled and have attempted to have conversations with Apple dev staff via Bug Report/Feedback Ass., but not much luck. Essentially I've reported "blah is happening in 10.14.6" and their reply "Please try beta X of 10.15" and let us know if the problem is resolved. I'm pretty disturbed and upset by this behavior by Apple, but I've heard of others hitting the same issue now too as I search the web.

I'm working on moving to Catalina here myself at the moment via the "unsupported methods" to see if my problems are indeed resolved as Apple has instructed, but it's a headache and some issues such as ZFS trouble has me worried.

@LATBauerdick
Copy link

this is still a problem with 1.9.2 an Catalina beta 7

@lundman
Copy link
Contributor

lundman commented Sep 6, 2019

OK, so in Catalina it appears our zfs.fs is not being used, this means the devdisk mounts will fail - so you are better off having devdisk=off for now.

diskarbitrationd.log:

14:21:27   probed disk, id = /dev/disk3s1, with zfs, ongoing.
14:21:27   probed disk, id = /dev/disk3s1, with zfs, failure.
14:21:27   unable to probe /dev/disk3s1 (status code 0x0000002D).

When trussing we get

  124/0x2b8:  write_nocancel(0x3, "14:21:27   probed disk, id = /dev/disk3s1, with zfs, ongoing.\n\0", 0x3E)		 = 62 0
  124/0x2b8:  open_nocancel(".\0", 0x0, 0x1)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8740, 0x0)		 = 0 0
  124/0x2b8:  fcntl_nocancel(0x4, 0x32, 0x7FFEE2DF8950)		 = 0 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  stat64("/\0", 0x7FFEE2DF86B0, 0x0)		 = 0 0
  124/0x2b8:  stat64("/Library/Filesystems/zfs.fs\0", 0x7FFEE2DF8AC0, 0x0)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F18, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 104 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F38, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 256 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open("/Library/Filesystems/zfs.fs/Contents/Info.plist\0", 0x0, 0x1B6)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8380, 0x0)		 = 0 0
  124/0x2b8:  read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>BuildMachineOSBuild</key>\n\t<string>18A391011</string>\n\t<key>CFBundleDevelopment", 0x10C5)		 = 4293 0
  124/0x2b8:  close(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF73F8, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/en.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 152 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/Base.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/English.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel(".\0", 0x0, 0x1)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8740, 0x0)		 = 0 0
  124/0x2b8:  fcntl_nocancel(0x4, 0x32, 0x7FFEE2DF8950)		 = 0 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  stat64("/\0", 0x7FFEE2DF86B0, 0x0)		 = 0 0
  124/0x2b8:  stat64("/Library/Filesystems/zfs.fs\0", 0x7FFEE2DF8AC0, 0x0)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F18, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 104 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF6F38, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 256 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open("/Library/Filesystems/zfs.fs/Contents/Info.plist\0", 0x0, 0x1B6)		 = 4 0
  124/0x2b8:  fstat64(0x4, 0x7FFEE2DF8380, 0x0)		 = 0 0
  124/0x2b8:  read(0x4, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n<plist version=\"1.0\">\n<dict>\n\t<key>BuildMachineOSBuild</key>\n\t<string>18A391011</string>\n\t<key>CFBundleDevelopment", 0x10C5)		 = 4293 0
  124/0x2b8:  close(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF73F8, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 1984 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/en.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 152 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/Base.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  open_nocancel("/Library/Filesystems/zfs.fs/Contents/Resources/English.lproj/.\0", 0x1100004, 0x31741450)		 = 4 0
  124/0x2b8:  fstatfs64(0x4, 0x7FFEE2DF7218, 0x0)		 = 0 0
  124/0x2b8:  getdirentries64(0x4, 0x7FC69F80E600, 0x2000)		 = 112 0
  124/0x2b8:  close_nocancel(0x4)		 = 0 0
  124/0x2b8:  write_nocancel(0x3, "14:21:27   probed disk, id = /dev/disk3s1, with zfs, failure.\n\0", 0x3E)		 = 62 0
  124/0x2b8:  write_nocancel(0x3, "14:21:27 unable to probe /dev/disk3s1 (status code 0x0000002D).\n\0", 0x40)		 = 64 0

The sources for DAProbe.c:

   if ( status )
    {
        /*
         * We have found no probe match for this media object.
         */

        if ( context->filesystem )
        {
            CFStringRef kind;

            kind = DAFileSystemGetKind( context->filesystem );

            DALogDebug( "  probed disk, id = %@, with %@, failure.", context->disk, kind );

            if ( status != FSUR_UNRECOGNIZED )
            {
                DALogError( "unable to probe %@ (status code 0x%08X).", context->disk, status );
            }

Which seems to imply we aren't matching (although it picks zfs.fs ok, then reject it?)

As 0x2D is 45, the error is ENOTSUP, which means we are probably running afoul of these tests:

https://github.com/appleopen/DiskArbitration/blob/master/diskarbitrationd/DAFileSystem.c#L645

However, I have tried copying hfsutil's plist and entire hfs.fs/ directory to no avail.

@lundman
Copy link
Contributor

lundman commented Sep 6, 2019

OK, turns out we should have a /Library/Filesystems/zfs.fs/Contents/Resources/fsck_zfs. We do compile one in cmd/fsck_zfs which is more or less just /bin/true. With that in the bundle, everything appears to function as expected.

@JMoVS
Copy link
Contributor

JMoVS commented Sep 6, 2019

@lundman Does it hurt to put that t the fsck_zfs also for older versions? Otherwise could you push a commit to master to fix this?

@lundman
Copy link
Contributor

lundman commented Sep 6, 2019

Not at all, should be fixed for all versions yep

@lundman
Copy link
Contributor

lundman commented Sep 11, 2019

OpenZFSonOsX-Catalina-1.9.2.zip

I have done a test build using Xcode 11, and Catalina, which also has the zfs_util fixes for mounts. Please give feedback.

@lopezio
Copy link

lopezio commented Sep 11, 2019

Halfway-Off Topic, but what does concern me (a lot) now is: How's a future of openzfsonosx (post-Catalina) possible, with the deprecation of kexts? How would a volume- and filesystem be even thinkable in userland? Will it wait for photoshop to finish rendering before committing the ZIL?
Is the tremendous and impeccable work done by the openzfs team and by @lundman destined to be trashed by this (sorry, can't find a better word) sort of "fascist" direction Apple is taking in regard of their OS and services..? Maybe more a topic for a forum than for a github bug...
Best Regards,

Lorenzo

@ghost
Copy link

ghost commented Sep 11, 2019

I have just compiled the latest commit on Catalina DP8 using the Xcode 11 GM, all is working perfectly here. Thanks Jorgen for all your hard work.

@lundman
Copy link
Contributor

lundman commented Sep 12, 2019

Apple has made developing on osx a little less friendly in recent times, that is true, and there probably will be a day in the future when we can no longer maintain support. But until that time!

@dmzimmerman
Copy link

Also, as far as anything Apple has said so far, there are specific categories of kernel extensions that Apple is transitioning to DriverKit (USB HID devices, serial devices, NICs), NetworkingDriverKit, and Endpoint Security extensions... and filesystems are not one of those categories. It seems unlikely to me that Apple will completely eliminate the ability to install kernel extensions on macOS.

@ylluminate
Copy link

I can just about guarantee that any panics that have the ZFS kext in it will create a flag with them and they'll more seriously consider it. I wonder if there's a way to build an exception handling mechanism into ZFS that will catch a panic before it goes back to the kernel and send that data over here for processing?...

@dmzimmerman
Copy link

Also, if that's really a concern, maybe just don't send the panic reports to Apple if you're generating lots of them due to testing/adding new features/etc... I haven't had a ZFS panic in nearly forever running the stable releases with my couple of pools.

@LATBauerdick
Copy link

I installed Jorgen's test build, but unfortunately that did not solve the problem of frequent panics for me.

Panics happen now more often since installing 15.1 beta 3 (it had been pretty stable since 15.0 beta 5 or so) , possibly related to that Mail decided to re-download all my hundred thousands of emails -- so I'm not sure if the frequent reboots are related to more disk activity or some additional changes in beta 3.

@lundman
Copy link
Contributor

lundman commented Sep 12, 2019

If you are having panics on Catalina, we'd need to have the stack pasted, with keepsyms=1 so we can take a look at it.

@LATBauerdick
Copy link

here's the stack I saved last time, I'll set keepsyms=1 for next time...

panic(cpu 2 caller 0xffffff801806acaa): Kernel trap at 0xffffff7f9c23027a, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000138, CR3: 0x000000002c527000, CR4: 0x00000000003626e0
RAX: 0x00000000000007a8, RBX: 0xffffff92c43cefb0, RCX: 0x0000000000000000, RDX: 0x0000000003000000
RSP: 0xffffff921891bde0, RBP: 0xffffff921891be10, RSI: 0xffffff922238d120, RDI: 0xffffff922238d190
R8:  0x0000000000000001, R9:  0x0000000000000002, R10: 0x0000000000000001, R11: 0x0000000000000000
R12: 0xffffff92c43ce7c8, R13: 0xffffff922238d190, R14: 0xffffff922238d118, R15: 0xffffff922238d000
RFL: 0x0000000000010202, RIP: 0xffffff7f9c23027a, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x0000000000000138, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff921891b840 : 0xffffff8017f41b6b 
0xffffff921891b890 : 0xffffff8018078e95 
0xffffff921891b8d0 : 0xffffff801806a8fe 
0xffffff921891b920 : 0xffffff8017ee8bb0 
0xffffff921891b940 : 0xffffff8017f41257 
0xffffff921891ba40 : 0xffffff8017f4163b 
0xffffff921891ba90 : 0xffffff80186d2879 
0xffffff921891bb00 : 0xffffff801806acaa 
0xffffff921891bc80 : 0xffffff801806a9a8 
0xffffff921891bcd0 : 0xffffff8017ee8bb0 
0xffffff921891bcf0 : 0xffffff7f9c23027a 
0xffffff921891be10 : 0xffffff7f9c22c1dc 
0xffffff921891be80 : 0xffffff7f9c23141b 
0xffffff921891bec0 : 0xffffff7f9c22c8e6 
0xffffff921891bef0 : 0xffffff7f9c230948 
0xffffff921891bf20 : 0xffffff7f9c235d56 
0xffffff921891bfa0 : 0xffffff8017ee813e 
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[FD34B77F-63E0-3672-9A30-63213502A433]@0xffffff7f9c228000->0xffffff7f9d41dfff

BSD process name corresponding to current thread: kernel_task
Boot args: chunklist-security-epoch=0 -chunklist-no-rev2-dev

Mac OS version:
19A558d

@LATBauerdick
Copy link

... and here the most recent crash (on a different machine) with keepsyms=1

panic(cpu 2 caller 0xffffff8007e6acaa): Kernel trap at 0xffffff7f8a105380, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000200721000138, CR3: 0x000000000c2b5000, CR4: 0x00000000003626e0
RAX: 0xffffff81f84f3cd8, RBX: 0xffffff81f84f3fb0, RCX: 0x0000200721000000, RDX: 0x0000000003000000
RSP: 0xffffff81f692bdd0, RBP: 0xffffff81f692be00, RSI: 0xffffff81f6951120, RDI: 0xffffff81f6951190
R8:  0x0000000000000051, R9:  0x00000000000001ed, R10: 0x0000000000000001, R11: 0x0000000000000000
R12: 0xffffff81f84f3cd8, R13: 0xffffff81f6951190, R14: 0xffffff81f6951118, R15: 0xffffff81f6951000
RFL: 0x0000000000010286, RIP: 0xffffff7f8a105380, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x0000200721000138, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff81f692b830 : 0xffffff8007d41b6b mach_kernel : _handle_debugger_trap + 0x47b
0xffffff81f692b880 : 0xffffff8007e78e95 mach_kernel : _kdp_i386_trap + 0x155
0xffffff81f692b8c0 : 0xffffff8007e6a8fe mach_kernel : _kernel_trap + 0x4ee
0xffffff81f692b910 : 0xffffff8007ce8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff81f692b930 : 0xffffff8007d41257 mach_kernel : _DebuggerTrapWithState + 0x17
0xffffff81f692ba30 : 0xffffff8007d4163b mach_kernel : _panic_trap_to_debugger + 0x21b
0xffffff81f692ba80 : 0xffffff80084d2879 mach_kernel : _panic + 0x61
0xffffff81f692baf0 : 0xffffff8007e6acaa mach_kernel : _sync_iss_to_iks + 0x2aa
0xffffff81f692bc70 : 0xffffff8007e6a9a8 mach_kernel : _kernel_trap + 0x598
0xffffff81f692bcc0 : 0xffffff8007ce8bb0 mach_kernel : _return_from_trap + 0xe0
0xffffff81f692bce0 : 0xffffff7f8a105380 net.lundman.spl : _kmem_findslab + 0x44
0xffffff81f692be00 : 0xffffff7f8a10119b net.lundman.spl : _kmem_error + 0x3b
0xffffff81f692be70 : 0xffffff7f8a106521 net.lundman.spl : _kmem_magazine_destroy + 0xce
0xffffff81f692beb0 : 0xffffff7f8a1018b6 net.lundman.spl : _kmem_depot_ws_reap + 0x6c
0xffffff81f692bee0 : 0xffffff7f8a105a2e net.lundman.spl : _kmem_cache_reap + 0x66
0xffffff81f692bf10 : 0xffffff7f8a10af6b net.lundman.spl : _taskq_thread + 0x1b9
0xffffff81f692bfa0 : 0xffffff8007ce813e mach_kernel : _call_continuation + 0x2e
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f8a0fd000->0xffffff7f8b2f1fff

BSD process name corresponding to current thread: kernel_task
Boot args: -v keepsyms=1

Mac OS version:
19A558d

Kernel version:
Darwin Kernel Version 19.0.0: Sat Aug 31 18:49:12 PDT 2019; root:xnu-6153.11.15~8/RELEASE_X86_64
Kernel UUID: 7878452F-EDBA-3FDA-8430-29920E2E2C99
Kernel slide:     0x0000000007a00000
Kernel text base: 0xffffff8007c00000
__HIB  text base: 0xffffff8007b00000
System model name: MacBookPro13,3 (Mac-A5C67F76ED83108C)
System shutdown begun: NO
Panic diags file available: YES (0x0)

System uptime in nanoseconds: 3051218033384
last loaded kext at 128256200596: com.getdropbox.dropbox.kext	1.10.3 (addr 0xffffff7f8b2f2000, size 49152)
last unloaded kext at 441864637437: >!AXsanScheme	3 (addr 0xffffff7f897fa000, size 40960)
loaded kexts:
com.getdropbox.dropbox.kext	1.10.3
org.pqrs.driver.Karabiner.VirtualHIDDevice.v061000	6.10.0
net.lundman.zfs	1.9.2
net.lundman.spl	1.9.2
@kext.AMDFramebuffer	3.0.0
@kext.AMDRadeonX4000	3.0.0
@kext.AMDRadeonServiceManager	3.0.0
>AudioAUUC	1.70
>!AGraphicsDevicePolicy	4.1.46
@fileutil	20.036.15
@filesystems.autofs	3.0
@AGDCPluginDisplayMetrics	4.1.46
>!AHV	1
|IOUserEthernet	1.0.1
|IO!BSerialManager	7.0.0f4
>!AUpstreamUserClient	3.6.8
>AGPM	111.1.18
>!APlatformEnabler	2.7.0d0
>X86PlatformShim	1.0.0
>pmtelemetry	1
>!A!ISKLGraphics	14.0.0
@Dont_Steal_Mac_OS_X	7.0.0
>AGDCBacklightControl	4.1.46
>!AHDA	283.13
@kext.AMD9500!C	3.0.0
>!AHIDALSService	1
>!AThunderboltIP	3.1.3
>eficheck	1
>!AMuxControl	4.1.46
>SMCMotionSensor	3.0.4d1
>!AGFXHDA	100.1.421
>!A!IPCHPMC	2.0.1
>!AEmbeddedOSSupportHost	1
>AirPort.BrcmNIC	1400.1.1
>!A!ISKLGraphicsFramebuffer	14.0.0
>!A!ISlowAdaptiveClocking	4.0.0
>!AMCCSControl	1.12
>!AVirtIO	1.0
@filesystems.hfs.kext	522.0.9
@!AFSCompression.!AFSCompressionTypeDataless	1.0.0d1
@BootCache	40
@!AFSCompression.!AFSCompressionTypeZlib	1.0.0
>!ATopCaseHIDEventDriver	153
@filesystems.apfs	1412.11.4
@private.KextAudit	1.0
>!ASmartBatteryManager	161.0.0
>!AACPIButtons	6.1
>!ARTC	2.0
>!ASMBIOS	2.1
>!AACPIEC	6.1
>!AAPIC	1.7
$!AImage4	1
@nke.applicationfirewall	302
$TMSafetyNet	8
@!ASystemPolicy	2.0.0
|EndpointSecurity	1
@kext.AMDRadeonX4100HWLibs	1.0
@kext.AMDRadeonX4000HWServices	3.0.0
@kext.triggers	1.0
|IOAVB!F	800.17
>!ASSE	1.0
>DspFuncLib	283.13
@kext.OSvKernDSPLib	529
@!AGPUWrangler	4.1.46
>!ABacklightExpert	1.1.0
>!AHDA!C	283.13
|IOHDA!F	283.13
>X86PlatformPlugin	1.0.0
>!AGraphicsControl	4.1.46
|IOAudio!F	300.2
@vecLib.kext	1.2.0
|IONDRVSupport	558.3
>IOPlatformPlugin!F	6.0.0d8
|IO!BHost!CUARTTransport	7.0.0f4
|IO!BHost!CTransport	7.0.0f4
>!A!ILpssUARTv1	3.0.60
>!A!ILpssUARTCommon	3.0.60
>!AOnboardSerial	1.0
|IO80211!F	1200.12.2b1
>mDNSOffloadUserClient	1.0.1b8
>corecapture	1.0.4
@kext.AMDSupport	3.0.0
@!AGraphicsDeviceControl	4.1.46
|IOAccelerator!F2	438.1.25
|IOSlowAdaptiveClocking!F	1.0.0
>!ASMBus!C	1.0.18d1
|IOGraphics!F	558.3
@plugin.IOgPTPPlugin	800.14
|IOEthernetAVB!C	1.1.0
|IOSkywalk!F	1
>usb.cdc.ncm	5.0.0
>usb.!UiBridge	1.0
>usb.cdc	5.0.0
>usb.networking	5.0.0
>usb.!UHostCompositeDevice	1.2
|IOSerial!F	11
|IOSurface	269.6
@filesystems.hfs.encodings.kext	1
>!AActuatorDriver	3400.34
>!AHIDKeyboard	209
>!AHS!BDriver	153
>IO!BHIDDriver	7.0.0f4
|IO!B!F	7.0.0f4
|IO!BPacketLogger	7.0.0f4
>!AMultitouchDriver	3400.34
>!AInputDeviceSupport	3400.27
>!AHSSPIHIDDriver	58
>!AThunderboltDPInAdapter	6.2.2
>!AThunderboltDPAdapter!F	6.2.2
>!AThunderboltPCIDownAdapter	2.5.2
>!AHSSPISupport	58
>!A!ILpssSpi!C	3.0.60
|IONVMe!F	2.1.0
>!AThunderboltNHI	5.5.8
>!AHPM	3.4.4
|IOThunderbolt!F	7.4.5
>!A!ILpssI2C!C	3.0.60
>!A!ILpssDmac	3.0.60
>!A!ILpssI2C	3.0.60
>!A!ILpssGspi	3.0.60
>usb.!UXHCIPCI	1.2
>usb.!UXHCI	1.2
>usb.!UHostPacketFilter	1.0
|IOUSB!F	900.4.2
>!AEFINVRAM	2.1
>!AEFIRuntime	2.1
|IOSMBus!F	1.1
|IOHID!F	2.0.0
$quarantine	4
$sandbox	300.0
@kext.!AMatch	1.0.0d1
>DiskImages	493.0.0
>!AFDEKeyStore	28.30
>!AEffaceable!S	1.0
>!AKeyStore	2
>!UTDM	489.0.2
|IOSCSIBlockCommandsDevice	422.0.2
>!ACredentialManager	1.0
>KernelRelayHost	1
>!ASEPManager	1.0.1
>IOSlaveProcessor	1
|IOTimeSync!F	800.14
|IONetworking!F	3.4
|IOUSBMass!SDriver	157.11.1
|IOSCSIArchitectureModel!F	422.0.2
|IO!S!F	2.1
|IOUSBHost!F	1.2
>!UHostMergeProperties	1.2
>usb.!UCommon	1.0
>!ABusPower!C	1.0
|CoreAnalytics!F	1
>!AMobileFileIntegrity	1.0.5
@kext.CoreTrust	1
|IOReport!F	47
>!AACPIPlatform	6.1
>!ASMC	3.1.9
>watchdog	1
|IOPCI!F	2.9
|IOACPI!F	1.4
@kec.pthread	1
@kec.Libm	1
@kec.corecrypto	1.0

@lopezio
Copy link

lopezio commented Sep 16, 2019

Also, as far as anything Apple has said so far, there are specific categories of kernel extensions that Apple is transitioning to DriverKit (USB HID devices, serial devices, NICs), NetworkingDriverKit, and Endpoint Security extensions... and filesystems are not one of those categories. It seems unlikely to me that Apple will completely eliminate the ability to install kernel extensions on macOS.

I'd love to be as optimistic. But what if Apple® simply doesn't care about other filesystems than those they support directly? They're tying more and more functionality (see the /Users APFS "Volume(s)") directly to their own filesystem. Even more, they actually want us to interact with the filesystems at a more abstract, "guided" level.
Having "uncontrolled" filesystems just doesn't seem to fit into that logic. Moreover, "we"'re just too few to make a difference. And if you read the articles about the new "Security" measures taken in Catalina lately (and stop ignoring the trends started way before Mojave, first and foremost all the stuff around SIP and which influence even advanced users have on it - not), it cannot go unnoticed that actually the whole Open Source community on the Mac is heavilly affected. It's a political direction that's even superseeding Microsoft® (!) on this matter. I grew up with the Mac, and with OSX as one of my main tools I made my living until now, and as probably many of us, I heavilly contributed to the distribution of macOs among family, friends, collegues, partners.

The Mac used to be the platform for software development lately, be it for mac apps or for anything else (except maybe for .NET). The day they close down on all this - with a loud scream of pain - I'll have to have a new "home" up and running...

Best to All. And Yes, until then, I'll be keeping my reality distortion field clean and colorful, and install, test, and most of all: enjoy each and every new release of openzfsonosx...! :-)

@lundman
Copy link
Contributor

lundman commented Sep 16, 2019

0xffffff81f692bce0 : 0xffffff7f8a105380 net.lundman.spl : _kmem_findslab + 0x44
0xffffff81f692be00 : 0xffffff7f8a10119b net.lundman.spl : _kmem_error + 0x3b
0xffffff81f692be70 : 0xffffff7f8a106521 net.lundman.spl : _kmem_magazine_destroy + 0xce
0xffffff81f692beb0 : 0xffffff7f8a1018b6 net.lundman.spl : _kmem_depot_ws_reap + 0x6c
0xffffff81f692bee0 : 0xffffff7f8a105a2e net.lundman.spl : _kmem_cache_reap + 0x66
0xffffff81f692bf10 : 0xffffff7f8a10af6b net.lundman.spl : _taskq_thread + 0x1b9

Well, that's .. something. So it triggered a reap, and discovered a corrupt memory segment (kmem_error) - at this point it would be very interesting to read the output from kmem_error - but that would require connecting with lldb to the panicked machine from another machine.

@ylluminate
Copy link

@lopezio you sound like my clone. I don't mean to keep hijacking this thread (yeah, I think we need another place to talk about this), but I do want to simply say this is what I believe (and clearly see) and have also been talking about "around the cooler" with folks. I've also heard some Apple engineers who used to work there say the same things and hear the same things from others that still do work there. Mobile and app security for their own cash security is their baby now - not us devs and high end users.

@LATBauerdick
Copy link

@lundman unfortunately the panic has become rather frequent with 15.1beta3, pretty consistently happening under load (e.g. I keep my mail library in a ZVOL, and having Apple Mail catch up on incoming emails seems to consistently cause the panic...).

It's also happening both on my MacBook Pro and my Mac mini, and the issue goes away when I boot back into MacOS 14, with the same 1.9.2 release.

I'm wondering if not more people are seeing this?

@mdw333
Copy link

mdw333 commented Feb 8, 2020

@rottegift, many thanks! I will work on some replies! Just a moment, please.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. Please supply the output of "sysctl zfs spl" and "sw_vers".

hermione:~ mdw$ sysctl zfs spl
zfs.kext_version: 1.9.3-0
spl.kext_version: 1.9.3-0
hermione:~ mdw$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.3
BuildVersion: 19D76

Please note that, even though it says "1.9.3-0" for the version, I have installed 1.9.3.1, i.e., the latest version from the OpenZFS on X website. Just to be absolutely sure of this, I just reinstalled 1.9.3.1 again, and I verify that this information stays the same. Just FYI!

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. I am running the experiment a third time for you, so that I can answer the questions as I go. The data is transferring right now. The zpool status never changes, whether I run it before, during, or after the hang. It always looks like this:

hermione:~ mdw$ zpool status -v
pool: tank
state: ONLINE
scan: none requested
config:

NAME                                            STATE     READ WRITE CKSUM
tank                                            ONLINE       0     0     0
  raidz2-0                                      ONLINE       0     0     0
    media-05565E83-BFC8-8C4F-8D39-7F46A7302A32  ONLINE       0     0     0
    media-F06E9BFB-C5AD-7849-9913-D7D9D0C35B33  ONLINE       0     0     0
    media-1DC52120-F0E6-D349-BFEB-6A5028F675B5  ONLINE       0     0     0
    media-A2133D63-658E-6F4A-913F-03D641DD9C6B  ONLINE       0     0     0
    media-0B6B49F0-4BC8-2F49-A897-AEA701094E59  ONLINE       0     0     0
    media-2359F813-BDDA-5849-ABCF-CBD85130AC9B  ONLINE       0     0     0

errors: No known data errors

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. What's the nature of the data that you're copying? Is it highly compressible? Is it random? Is it video? Or is it something like system files (including e.g. the kexts themselves)? How are you copying the data? Finder drag-and-drop? Some app?
    Something on the command line? Please be as specific as you can, it will help enormously.

The data consists of 56 plain text files, all ASCII characters, nothing strange at all, rendered by a C++ program that I've been using for 14 years. There are 4 additional files: a bash script, a backup of the bash script, the nohup.out file, and the binary generated by the C++ file. I've been copying by Finder just using drag-and-drop, although I'm pretty sure that things die if I copy the files by bash; I think I was moving them that way a week ago, and I can try it that way again, if you like. The data is highly compressible. I think I was getting something like 8.0x compression in the zpool when it only contains this data.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. I will come back to your question 3. I haven't (yet) posted the spindump, because the data transfer was successful that time. It happens rarely, perhaps 10% of the time! I'm going to remove the data and transfer it again.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. This time, the process died after copying about 89 GB out of 522 GB.
    I ran spindump once while the process was running (see spindump1.txt), again while I thought the process was totally dead (see spindump2.txt), but then the process moved just a little more, so I ran spindump a third time (see spindump3.txt). I hope that is clear. Just for good measure, while typing this message, and the transfer is completely dead, I ran spindump a 4th time (see spindump4.txt).

spindump1.txt
spindump2.txt
spindump3.txt
spindump4.txt

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. Seventh thing: does the machine hang entirely, forcing you to hit the power switch, or does the CLI/GUI still function enough for you to reboot using the apple menu or a shutdown command ?

This is always the case: I can't use the Finder to reboot, because it always says:
"The Finder can’t quit because some operations are still in progress."
"You can cancel or stop the operations, and then try again."

So I generally reboot the computer from another Mac (with remote login using ssh and the command: sudo shutdown -r now).

Alternatively, if I reboot by holding down the power button on the Mac, it usually requires 2 reboots. During the first reboot, I usually get 90% of the way through the startup splash screen and then I don't get any further, so I usually need to hold down the power button again and reboot, and things go very cleanly the second time. So I elected to just start rebooting using ssh from another machine, because only 1 reboot is required. I'm going to go reboot right now.

Then I will come back to your questions 5 and 6. Thanks for your patience.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

  1. Does the hang happen only with specific data, or a specific copying method? (In particular, does it hang in the same way if you do "dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k" ?)

I did this as an administrator, in a bash shell, just FYI:
sudo dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k

It wrote exactly 82407849984 bytes and then died. The bash shell that I was using is hung now. I can only see the size of the data transfer by using another bash shell.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

OK, maybe I spoke too soon, when I said that the bash shell died. It now said:

78591+0 records in
78590+0 records out
82407587840 bytes transferred in 406.307946 secs (202820517 bytes/sec)

@mdw333
Copy link

mdw333 commented Feb 8, 2020

Indeed, it looks like something got hung right around the time that I killed the process, because at time 13:33, the resulting file size was 82407849984 bytes, and then at 13:36, the file size became 82408636416 bytes.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

I'm going to reboot and try this process of transferring random bits, one more time. I'll be back. A reboot probably is not required, but I am just doing it to refresh this experiment entirely. The reboot is usually only needed because when a process like this dies, the operating system still has references to each file that was trying to transfer, as you know, so it won't even allow me to nicely reboot, when we die during a regular file transfer.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

OK, looks like this time it wrote 82680348672 bytes and then basically died at 13:47
but I was more patient this time. I didn't do anything at all for 5 minutes, and then the file size bumped up to 83066486784 bytes, and then slight more, to 84115718144 bytes, with both of those jumps occurring at 13:52. Instead of killing the process, I'm going to run home and switch cars with my wife, and see if this process stops on its own or not. I'll be patient. I'll be back in (say) 20 minutes. Looks like things are hung at the moment, but instead of killing the process, I'll let it run now.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

Here's the present spindump, before I go!
spindump5.txt

@mdw333
Copy link

mdw333 commented Feb 8, 2020

Still stuck at 84115718144 bytes. I'll be back in (say) 20 minutes.

@rottegift
Copy link
Contributor

Ok, this is all interesting.

I'll have some more thoughts in a little while, but in the mean time can you share the output of

$ sysctl -h kstat | egrep -i  'dirty|dbuf'

Also mds_stores in spindump.4.txt is an enormous factor and it would be helpful if you would:

$ sudo mdutil -i off /Volumes/tank
$ sudo touch /Volumes/tank/.metadata_never_index

prior to doing any further tests of writing into "tank" in the next few hours.

mds_stores is part of Spotlight and is being very aggressive at chasing the already-written data in the spindump.4.txt case, but falls behind because it's low priority, and thus starts causing actual I/Os to the disk because the data it wants has aged out of the cache. It is also almost certainly holding mmap() references on the files you're write()-ing to indirectly via DesktopServicesHelper, which is an unhelpful complication. Finally, DesktopServicesHelper appears to be writing small chunks to multiple files, and the compressibility of the data and the slowdown is causing a lot of additional slow memory allocations.

It's possible that after a significant wait, your hang would resolve itself as when you thought the hang had happened earlier. That's not a workaround either, and there's no point waiting for more than, say, 30 minutes after the apparent hang. The wait may allow for the draining of a sort of priority inversion that low-IO-priority mds_stores is causing by mmap()ing files that are being written to by high-priority DesktopServicesHelper.

Unfortunately this crossed while you were reporting some results of your dd test, and while I was dealing with other things, so I haven't had the chance to absorb the results of whatever happened during dd.

If you also hang during dd, especially if you hang with mdutil -i off and .metadata_never_index in place, please take a spindump during the apparent hang.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

Still stuck at 84115718144 bytes... and now I'll share one more spindump and then kill that process.
spindump6.txt

@mdw333
Copy link

mdw333 commented Feb 8, 2020

in the mean time can you share the output of
$ sysctl -h kstat | egrep -i 'dirty|dbuf'

Yes, indeed! Here you go:

kstat.zfs.darwin.tunable.async_write_min_dirty_pct: 30
kstat.zfs.darwin.tunable.async_write_max_dirty_pct: 60
kstat.zfs.darwin.tunable.zfs_dirty_data_max: 4,294,967,296
kstat.zfs.darwin.tunable.zfs_dirty_data_sync: 67,108,864
kstat.zfs.darwin.tunable.zfs_delay_min_dirty_percent: 60
kstat.zfs.darwin.tunable.dbuf_cache_max_bytes: 25,736,249,344
kstat.zfs.misc.dnodestats.dnode_hold_dbuf_hold: 0
kstat.zfs.misc.dnodestats.dnode_hold_dbuf_read: 0
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_throttle: 0
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_delay: 0
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_over_max: 0
kstat.zfs.misc.arcstats.dbuf_redirtied: 3,919,047

@mdw333
Copy link

mdw333 commented Feb 8, 2020

Aha, I hate Spotlight! Why didn't I think of turning off Spotlight?!?! I should have thought of that. OK, now I ran both of these:

$ sudo mdutil -i off /Volumes/tank
$ sudo touch /Volumes/tank/.metadata_never_index

and now the Indexing is disabled. Wow, I wish I had thought of that earlier.

@mdw333
Copy link

mdw333 commented Feb 8, 2020

I'm not sure if you would prefer for me to rest from further tests at the moment, or to go ahead and try some more, with the Spotlight turned off. I'll wait to hear from you, before I do any more writing or reading to/from the tank. I'll just leave things alone for now. Indeed, I'll do a reboot so that I'm ready to start fresh, whenever you are.

I can't thank you enough for your suggestions! I'll avoid doing anything else until I hear back from you. Thank you!

@rottegift
Copy link
Contributor

rottegift commented Feb 8, 2020

Go ahead and try more with spotlight off.

The problem is one thread in spindump5.txt (the zio_execute thread with the kernel_memory_alloc), and I cannot tell if it's the writer-driven thread or the reader-driven thread. The thread is causing massive headaches for the kernel allocator, in particular the OS is spending a lot of time hunting for a good place to grab more memory for zfs.

How much memory is in your system? Do you have anything at all in /etc/zfs/zsysctl.conf ? Have you set any tunables yourself at this point?

The 23 GiB for the dbuf cache implies a system memory of a terabyte and a half, whereas the 4GiB dirty data means the memory is large enough that we cap it at zfs_dirty_data_max_max (which is not a runtime tunable). One of the problems with the dbuf cache being that big is that (a) it holds uncompressed copies of the data that would also want to be in the ARC compressed and (b) it's effectively FIFO (LRU with no reuse if mds_store lags behind enough, which it will, or if spotlight is disabled) so a sequential write like you're doing results in filling up a dbuf buffer, and if the dbuf cache is full, we have to where we evict some dbufs from its tail before allocating at the head. It gets a bit more complicated because there is a trailling reader (mds_store) which will also pull in compressed blocks into the arc, which will then be decompressed and put in to the dbuf effectively-FIFO cache before being copied into mds_store's address space. Because of that complication it's hard to tell whether it's the reader wanting more dbufs or the reader wanting more dbufs than ZFS's allocator has on hand. Either way, ZFS is pestering the operating system with allocations, and likely frees. XNU's default kernel allocator is not awesome when pestered like that. :-(

My thought is that the dbuf cache is somehow mis-sized for your real system memory. (Edit: it is, because dbuf.c uses 1/64 of system mem by default, see #750 ).

In particular your kstat.zfs.darwin.tunable.dbuf_cache_max_bytes is unaccountably enormous and that's definitely not helping things (and might be the root of the problem here). The huge dbuf cache, and that there are several zio_execute threads misbehaving in spindump4.txt, leads me to think the problem is on the writer side. Your kstat.zfs.darwin.tunable.zfs_dirty_data_max is also very large (and in particular too large for realistic spinny disks). What's happening there is that because your writer threads (dd or DesktopServicesHelper) are faster than the actual spinny disks, then after the first five to ten seconds seconds of writing test, they will be stuffing data into the dbuf cache up to zfs_dirty_data_max.

Try the following:

$ sudo sysctl kstat.zfs.darwin.tunable.zfs_dirty_data_max=536870912 kstat.zfs.darwin.tunable.dbuf_cache_max_bytes=1073741824

and see how that affects your writing tests.

Again, if things lock up, a spindump file will be helpful to look at.

@rottegift
Copy link
Contributor

rottegift commented Feb 8, 2020

Oh, @mdw333 actually has that much RAM ("a Mac Pro with 1.5 TB of RAM"), so the dbuf cache size not unaccountable, since it depends on dbuf_cache_shift = 5.

We should cap the dbuf cache at something reasonable, like 1 GiB, rather than killing systems with hundreds of GiB or more of RAM, @lundman and @rottegift . It's hard to imagine a system that has that much RAM and really really needs that big an amount of uncompressed copies of ARC data lingering around in the dbuf cache, and not hard to imagine (and in fact we have just found) a dbuf cache being too big in practice on a system with even just 64 GiB of memory.

@mdw333 : setting the dbuf cache to 1 GiB will help with your workload, and in more general workloads. You could even make it as small as 512 MB. If your testing against the sysctl in the previous message bears out this diagnosis, you can add these two to /etc/zfs/zsysctl.conf (which will take effect on your kext load, or reboot) :

# the default for 1.5 TiB RAM would be 23 GiB, which is at least 23 times too big
kstat.zfs.darwin.tunable.dbuf_cache_max_bytes=1073741824

@mdw333
Copy link

mdw333 commented Feb 8, 2020

I apologize for disappearing for two hours! I didn't realize that my wife and kiddos were going to a short music concert at our library, so I went with them. Thanks for diagnosing things!

Yes, I do have 1.5 TB of RAM installed in this beast. OK, I ran what you suggested (as Administrator), namely:
sudo sysctl kstat.zfs.darwin.tunable.zfs_dirty_data_max=536870912 kstat.zfs.darwin.tunable.dbuf_cache_max_bytes=1073741824

in the bash shell, and then, in that same shell, I called:
sudo dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k
Our file of random bits made it to 103832748032 bytes this time, but alas, it seems to be stuck there. This is further than it made it during our made other times.... but we still do not quite have things solved! I would be delighted to continue to tweak these parameters, but I haven't ever made such tweaks in the past, so I'd love some guidance. My /etc/zfs/zsysctl.conf file is empty.

Here's the spin dump.

spindump7.txt

@mdw333
Copy link

mdw333 commented Feb 8, 2020

@rottegift if we ever figure all of this out, I'm going to owe you a beer! (or an orange juice, if you are a teetotaler like me) I'm willing and able to continue adjusting anything that you think I should adjust. I hope that this hard work will make it easier for other people, down the road. I want to help, and I appreciate your help so far!

@mdw333
Copy link

mdw333 commented Feb 8, 2020

I am fond of replicates, so I did the experiment again. Almost exactly the same effect. I got 103993180160 bytes in the data transfer this time before things died.

sudo sysctl kstat.zfs.darwin.tunable.zfs_dirty_data_max=536870912 kstat.zfs.darwin.tunable.dbuf_cache_max_bytes=1073741824

sudo dd if=/dev/random bs=1m of=/Volumes/tank/randombits count=300k

spindump8.txt

@mdw333
Copy link

mdw333 commented Feb 8, 2020

One more replicate of the experiment, for good measure! (I can't help it! I'm a scientist.)
FYI, this time the data transfer of random bytes died after 103609794560 bytes.

spindump9.txt

@lopezio
Copy link

lopezio commented Jun 25, 2020

Apple has made developing on osx a little less friendly in recent times, that is true, and there probably will be a day in the future when we can no longer maintain support. But until that time!

@lundman ... has that time now arrived, with mac OS 11 Bien Sûr Big Sur... ? :(

@ylluminate
Copy link

What is the full situation with macOS 11 Big Sur? I'd like to understand the core OS / kext, etc. situation completely.

With the joke that was WWDC we've just put holding patterns on all Apple equipment upgrades at this point due to this situation. Windows isn't really an option and we still need macOS software, but we're looking for emulation paths and other options since Apple has proven hostile to flexibility and other ideals that are critical for some of us and was the original reason for having moved to Apple 18 years ago.

@3add3287
Copy link

3add3287 commented Jun 25, 2020

macOS 11.0 aka Big Sur will still have Kernel Extensions. That's based on what the WWDC Session about Silicon on what they said in the "Platforms State of the Union" (aka Developer Keynote) and "Explore the new system architecture of Apple Silicon Macs" sessions. Judging from that there seems to be little change in IOKit. They mentioned one change related to IOMMU for DMA, but it's not clear if that's just for the Apple SoC or for x86 as well.

Please refrain from commenting on this specific issue unless it's about the specific "problems" discussed here. It helps those who are interested or affected about what's discussed up in earlier comments.
The OpenZFSOnOSX Forum seems to be the best place to discuss potential macOS 11.0 Big Sur related topics.

Thank you

@lundman
Copy link
Contributor

lundman commented Jun 26, 2020

Status on kext on Big Sur: openzfsonosx/openzfs#8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests