-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no devices created in /dev/<zpoolname>/<zvolname> #26
Comments
Whoops, yes you must be right. Currently, I only create the /dev/ files at module load time or when a new zvol is created, I forgot about the import case. As a work around for the moment you could try importing the pool, unloading the zfs module, and then reloading the zfs module to force the device creation. |
Thank you, it worked on one of test machines. But on another I run into different problem. There I was running benchmarks, in paralel taking snapshots once per minute and destroying oldest to keep 100 snapshots. Exported zpool, rebooted, modprobed zfs, imported zpool. No block devices in /dev//. Did rmmod zfs and modprobe zfs. And "modprobe zfs" hangs. Jul 2 06:42:58 localhost kernel: SPL: Loaded Solaris Porting Layer v0.5.0 Jul 2 06:42:58 localhost kernel: zunicode: module license 'CDDL' taints kernel. Jul 2 06:42:58 localhost kernel: Disabling lock debugging due to kernel taint Jul 2 06:42:59 localhost kernel: ZFS: Loaded ZFS Filesystem v0.5.0 Jul 2 06:45:57 localhost kernel: INFO: task modprobe:3913 blocked for more than 120 seconds. Jul 2 06:45:57 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 2 06:45:57 localhost kernel: modprobe D ffffffffa05f7998 0 3913 1779 0x00000080 Jul 2 06:45:57 localhost kernel: ffff8802cf213be8 0000000000000086 ffff8802cf213b58 ffff880200000000 Jul 2 06:45:57 localhost kernel: 00000000000009c5 0000000000000000 0000000000000400 ffff8802cf213fd8 Jul 2 06:45:57 localhost kernel: ffff8802cf213fd8 000000000000f9b0 00000000000157c0 ffff88030c2603c8 Jul 2 06:45:57 localhost kernel: Call Trace: Jul 2 06:45:57 localhost kernel: [] ? mutex_spin_on_owner+0x48/0x70 Jul 2 06:45:57 localhost kernel: [] __mutex_lock_common+0x135/0x19c Jul 2 06:45:57 localhost kernel: [] __mutex_lock_slowpath+0x14/0x16 Jul 2 06:45:57 localhost kernel: [] mutex_lock+0x31/0x4b Jul 2 06:45:57 localhost kernel: [] zvol_create_minor+0x25/0x383 [zfs] Jul 2 06:45:57 localhost kernel: [] ? kmem_asprintf+0x4d/0x5b [spl] Jul 2 06:45:57 localhost kernel: [] zvol_create_minors_cb+0xc/0xe [zfs] Jul 2 06:45:57 localhost kernel: [] dmu_objset_find_spa+0x2c6/0x359 [zfs] Jul 2 06:45:57 localhost kernel: [] ? zvol_create_minors_cb+0x0/0xe [zfs] Jul 2 06:45:57 localhost kernel: [] dmu_objset_find_spa+0x12b/0x359 [zfs] Jul 2 06:45:57 localhost kernel: [] ? zvol_create_minors_cb+0x0/0xe [zfs] Jul 2 06:45:57 localhost kernel: [] zvol_init+0x146/0x169 [zfs] Jul 2 06:45:57 localhost kernel: [] ? spl__init+0x0/0x10 [zfs] Jul 2 06:45:57 localhost kernel: [] _init+0x1d/0x71 [zfs] Jul 2 06:45:57 localhost kernel: [] ? spl__init+0x0/0x10 [zfs] Jul 2 06:45:57 localhost kernel: [] spl__init+0xe/0x10 [zfs] Jul 2 06:45:57 localhost kernel: [] do_one_initcall+0x59/0x154 Jul 2 06:45:57 localhost kernel: [] sys_init_module+0xd1/0x230 Jul 2 06:45:57 localhost kernel: [] system_call_fastpath+0x16/0x1b Jul 2 06:45:57 localhost kernel: INFO: task blkid:4208 blocked for more than 120 seconds. Jul 2 06:45:57 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 2 06:45:57 localhost kernel: blkid D 0000000000000000 0 4208 2297 0x00000080 Jul 2 06:45:57 localhost kernel: ffff8802c92e1978 0000000000000082 0000000000000000 ffffffff8120650e Jul 2 06:45:57 localhost kernel: 00000040c92e1958 ffff88000ca2fc50 ffff8802c92e1ae8 ffff8802c92e1fd8 Jul 2 06:45:57 localhost kernel: ffff8802c92e1fd8 000000000000f9b0 00000000000157c0 ffff8802d481e108 Jul 2 06:45:57 localhost kernel: Call Trace: Jul 2 06:45:57 localhost kernel: [] ? __bitmap_weight+0x40/0x8f Jul 2 06:45:57 localhost kernel: [] ? mutex_spin_on_owner+0x42/0x70 Jul 2 06:45:57 localhost kernel: [] __mutex_lock_common+0x135/0x19c Jul 2 06:45:57 localhost kernel: [] __mutex_lock_slowpath+0x14/0x16 Jul 2 06:45:57 localhost kernel: [] mutex_lock+0x31/0x4b Jul 2 06:45:57 localhost kernel: [] ? __kmalloc+0x13f/0x151 Jul 2 06:45:57 localhost kernel: [] spa_open_common+0x59/0x242 [zfs] Jul 2 06:45:57 localhost kernel: [] spa_open+0xe/0x10 [zfs] Jul 2 06:45:57 localhost kernel: [] dsl_dir_open_spa+0x8a/0x29b [zfs] Jul 2 06:45:57 localhost kernel: [] dsl_dataset_hold+0x30/0x1b9 [zfs] Jul 2 06:45:57 localhost kernel: [] ? __mutex_lock_common+0x13d/0x19c Jul 2 06:45:57 localhost kernel: [] dsl_dataset_own+0x1f/0x52 [zfs] Jul 2 06:45:57 localhost kernel: [] dmu_objset_own+0x29/0xb0 [zfs] Jul 2 06:45:57 localhost kernel: [] ? __mutex_lock_slowpath+0x14/0x16 Jul 2 06:45:57 localhost kernel: [] ? mutex_lock+0x31/0x4b Jul 2 06:45:57 localhost kernel: [] zvol_open+0x63/0x1ec [zfs] Jul 2 06:45:57 localhost kernel: [] ? kobject_get+0x1a/0x22 Jul 2 06:45:57 localhost kernel: [] __blkdev_get+0xda/0x391 Jul 2 06:45:57 localhost kernel: [] ? unlock_new_inode+0x45/0x49 Jul 2 06:45:57 localhost kernel: [] ? blkdev_open+0x0/0xa7 Jul 2 06:45:57 localhost kernel: [] blkdev_get+0xb/0xd Jul 2 06:45:57 localhost kernel: [] blkdev_open+0x71/0xa7 Jul 2 06:45:57 localhost kernel: [] __dentry_open+0x18d/0x2c4 Jul 2 06:45:57 localhost kernel: [] ? security_inode_permission+0x1c/0x1e Jul 2 06:45:57 localhost kernel: [] nameidata_to_filp+0x3a/0x4b Jul 2 06:45:57 localhost kernel: [] do_filp_open+0x571/0xad5 Jul 2 06:45:57 localhost kernel: [] ? might_fault+0x1c/0x1e Jul 2 06:45:57 localhost kernel: [] ? alloc_fd+0x76/0x11f Jul 2 06:45:57 localhost kernel: [] do_sys_open+0x5e/0x10a Jul 2 06:45:57 localhost kernel: [] sys_open+0x1b/0x1d Jul 2 06:45:57 localhost kernel: [] system_call_fastpath+0x16/0x1b Jul 2 06:45:57 localhost kernel: INFO: task blkid:4210 blocked for more than 120 seconds. I waited half an hour, at this time nothing happens. /dev// had 3 first snapshots $ ls -l /dev/pimber/ total 0 brw-------. 1 root root 230, 0 Jul 2 06:30 phoronix@69 brw-------. 1 root root 230, 16 Jul 2 06:30 phoronix@87 Then I put "kernel.hung_task_timeout_secs = 0" into /etc/sysctl.conf and rebooted. And now I think I've lost zvol absolutely: after "modprobe zfs" which hangs as it was before, commands "zpool" and "zfs" are not working, like: sudo zpool status Unable to open /dev/zfs: No such file or directory. Verify the ZFS module stack is loaded by running '/sbin/modprobe zfs'. There are many zfs processes: $ pgrep -f -l 'z' 4110 modprobe zfs 4113 zvol/0 4114 zvol/1 4115 zvol/2 4116 zvol/3 4260 zio_null_issue/ 4261 zio_null_intr/0 4262 zio_read_issue/ 4263 zio_read_issue/ 4264 zio_read_issue/ 4265 zio_read_issue/ 4266 zio_read_issue/ 4267 zio_read_issue/ 4268 zio_read_issue/ 4269 zio_read_issue/ 4270 zio_read_intr/0 4271 zio_read_intr/1 4272 zio_read_intr/2 4273 zio_read_intr/3 4274 zio_write_issue 4275 zio_write_issue 4276 zio_write_issue 4277 zio_write_issue 4278 zio_write_issue 4279 zio_write_issue 4280 zio_write_issue 4281 zio_write_issue 4282 zio_write_issue 4283 zio_write_intr/ 4284 zio_write_intr/ 4285 zio_write_intr/ 4286 zio_write_intr/ 4287 zio_write_intr/ 4288 zio_write_intr/ 4289 zio_write_intr/ 4290 zio_write_intr/ 4291 zio_write_intr_ 4292 zio_write_intr_ 4293 zio_write_intr_ 4294 zio_write_intr_ 4295 zio_write_intr_ 4296 zio_free_issue/ 4297 zio_free_issue/ 4298 zio_free_issue/ 4299 zio_free_issue/ 4300 zio_free_issue/ 4301 zio_free_issue/ 4302 zio_free_issue/ 4303 zio_free_issue/ 4304 zio_free_issue/ 4305 zio_free_issue/ 4306 zio_free_issue/ 4307 zio_free_issue/ 4308 zio_free_issue/ 4309 zio_free_issue/ 4310 zio_free_issue/ 4311 zio_free_issue/ 4312 zio_free_issue/ 4313 zio_free_issue/ 4314 zio_free_issue/ 4315 zio_free_issue/ 4316 zio_free_issue/ 4317 zio_free_issue/ 4318 zio_free_issue/ 4319 zio_free_issue/ 4320 zio_free_issue/ 4321 zio_free_issue/ 4322 zio_free_issue/ 4323 zio_free_issue/ 4324 zio_free_issue/ 4325 zio_free_issue/ 4326 zio_free_issue/ 4327 zio_free_issue/ 4328 zio_free_issue/ 4329 zio_free_issue/ 4330 zio_free_issue/ 4331 zio_free_issue/ 4332 zio_free_issue/ 4333 zio_free_issue/ 4334 zio_free_issue/ 4335 zio_free_issue/ 4336 zio_free_issue/ 4337 zio_free_issue/ 4338 zio_free_issue/ 4339 zio_free_issue/ 4340 zio_free_issue/ 4341 zio_free_issue/ 4342 zio_free_issue/ 4343 zio_free_issue/ 4344 zio_free_issue/ 4345 zio_free_issue/ 4346 zio_free_issue/ 4347 zio_free_issue/ 4348 zio_free_issue/ 4349 zio_free_issue/ 4350 zio_free_issue/ 4351 zio_free_issue/ 4352 zio_free_issue/ 4353 zio_free_issue/ 4354 zio_free_issue/ 4355 zio_free_issue/ 4356 zio_free_issue/ 4357 zio_free_issue/ 4358 zio_free_issue/ 4359 zio_free_issue/ 4360 zio_free_issue/ 4361 zio_free_issue/ 4362 zio_free_issue/ 4363 zio_free_issue/ 4364 zio_free_issue/ 4365 zio_free_issue/ 4366 zio_free_issue/ 4367 zio_free_issue/ 4368 zio_free_issue/ 4369 zio_free_issue/ 4370 zio_free_issue/ 4371 zio_free_issue/ 4372 zio_free_issue/ 4373 zio_free_issue/ 4374 zio_free_issue/ 4375 zio_free_issue/ 4376 zio_free_issue/ 4377 zio_free_issue/ 4378 zio_free_issue/ 4379 zio_free_issue/ 4380 zio_free_issue/ 4381 zio_free_issue/ 4382 zio_free_issue/ 4383 zio_free_issue/ 4384 zio_free_issue/ 4385 zio_free_issue/ 4386 zio_free_issue/ 4387 zio_free_issue/ 4388 zio_free_issue/ 4389 zio_free_issue/ 4390 zio_free_issue/ 4391 zio_free_issue/ 4392 zio_free_issue/ 4393 zio_free_issue/ 4394 zio_free_issue/ 4395 zio_free_issue/ 4396 zio_free_intr/0 4397 zio_claim_issue 4398 zio_claim_intr/ 4399 zio_ioctl_issue 4400 zio_ioctl_intr/ 4402 zfs_vn_rele_tas and I don't know how can I get out of this situation, I can't even destroy zpool: for that I need successful completion of "modprobe zfs". |
OK, this first issue you reported should be fixed by commit 45cb33f which I just pushed to github. There were in fact several places where devices were not created or destroyed properly. Can you please test this and let me know if you find any others. As for you second issue that looks like it was caused by a mistake when updating the zvol to onnv_141. I've pushed commit 122e5b4 which should resolve the deadlock you found. If that doesn't take care of it please let me know. Thanks for testing out the Linux port, this sort of thing goes a long way to finding and fixing all the rough edges. |
Thanks, it worked much better than previous versions. [seriv@pimbra zfs]$ git log | head commit e3804450562d2f4f9121634cdf0f56176c4ce1e0 Merge: 3c3546f f0ff2ed Author: Brian Behlendorf Date: Fri Jul 2 12:26:11 2010 -0700 Merge commit 'refs/top-bases/top' into top $ rpm -qa | grep zfs zfs-test-0.5.0-1.x86_64 zfs-modules-devel-0.5.0-1_2.6.33.5_124.fc13.x86_64 zfs-devel-0.5.0-1.x86_64 zfs-0.5.0-1.x86_64 zfs-modules-0.5.0-1_2.6.33.5_124.fc13.x86_64 $ sudo tail -9 /var/log/messages Jul 2 19:55:49 pimbra kernel: SPL: Loaded Solaris Porting Layer v0.5.0 Jul 2 19:55:49 pimbra kernel: zunicode: module license 'CDDL' taints kernel. Jul 2 19:55:49 pimbra kernel: Disabling lock debugging due to kernel taint Jul 2 19:58:50 pimbra udevd[570]: worker [22115] unexpectedly returned with status 0x0100 Jul 2 19:58:50 pimbra udevd[570]: worker [22115] failed while handling '/devices/virtual/block/pimber!phoronix@69' Jul 2 19:58:50 pimbra udevd[570]: worker [22411] unexpectedly returned with status 0x0100 Jul 2 19:58:50 pimbra udevd[570]: worker [22411] failed while handling '/devices/virtual/block/pimber!phoronix@87' Jul 2 19:58:50 pimbra udevd[570]: worker [22413] unexpectedly returned with status 0x0100 Jul 2 19:58:50 pimbra udevd[570]: worker [22413] failed while handling '/devices/virtual/block/pimber!phoronix@0' $ sudo ls -l /dev/pimber total 0 brw-------. 1 root root 230, 32 Jul 2 19:55 phoronix@0 brw-------. 1 root root 230, 0 Jul 2 19:55 phoronix@69 brw-------. 1 root root 230, 16 Jul 2 19:55 phoronix@87 I/o and cpu is almost idle, $ pgrep -f -l z 2202 /usr/bin/gnome-keyring-daemon --daemonize --login 22108 /sbin/modprobe zfs 22119 zvol/0 22120 zvol/1 22121 zvol/2 22122 zvol/3 22266 zio_null_issue/ 22267 zio_null_intr/0 22268 zio_read_issue/ 22269 zio_read_issue/ 22270 zio_read_issue/ 22271 zio_read_issue/ 22272 zio_read_issue/ 22273 zio_read_issue/ 22274 zio_read_issue/ 22275 zio_read_issue/ 22276 zio_read_intr/0 22277 zio_read_intr/1 22278 zio_read_intr/2 22279 zio_read_intr/3 22280 zio_write_issue 22281 zio_write_issue 22282 zio_write_issue 22283 zio_write_issue 22284 zio_write_issue 22285 zio_write_issue 22286 zio_write_issue 22287 zio_write_issue 22288 zio_write_issue 22289 zio_write_intr/ 22290 zio_write_intr/ 22291 zio_write_intr/ 22292 zio_write_intr/ 22293 zio_write_intr/ 22294 zio_write_intr/ 22295 zio_write_intr/ 22296 zio_write_intr/ 22297 zio_write_intr_ 22298 zio_write_intr_ 22299 zio_write_intr_ 22300 zio_write_intr_ 22301 zio_write_intr_ 22302 zio_free_issue/ 22303 zio_free_issue/ 22304 zio_free_issue/ 22305 zio_free_issue/ 22306 zio_free_issue/ 22307 zio_free_issue/ 22308 zio_free_issue/ 22309 zio_free_issue/ 22310 zio_free_issue/ 22311 zio_free_issue/ 22312 zio_free_issue/ 22313 zio_free_issue/ 22314 zio_free_issue/ 22315 zio_free_issue/ 22316 zio_free_issue/ 22317 zio_free_issue/ 22318 zio_free_issue/ 22319 zio_free_issue/ 22320 zio_free_issue/ 22321 zio_free_issue/ 22322 zio_free_issue/ 22323 zio_free_issue/ 22324 zio_free_issue/ 22325 zio_free_issue/ 22326 zio_free_issue/ 22327 zio_free_issue/ 22328 zio_free_issue/ 22329 zio_free_issue/ 22330 zio_free_issue/ 22331 zio_free_issue/ 22332 zio_free_issue/ 22333 zio_free_issue/ 22334 zio_free_issue/ 22335 zio_free_issue/ 22336 zio_free_issue/ 22337 zio_free_issue/ 22338 zio_free_issue/ 22339 zio_free_issue/ 22340 zio_free_issue/ 22341 zio_free_issue/ 22342 zio_free_issue/ 22343 zio_free_issue/ 22344 zio_free_issue/ 22345 zio_free_issue/ 22346 zio_free_issue/ 22347 zio_free_issue/ 22348 zio_free_issue/ 22349 zio_free_issue/ 22350 zio_free_issue/ 22351 zio_free_issue/ 22352 zio_free_issue/ 22353 zio_free_issue/ 22354 zio_free_issue/ 22355 zio_free_issue/ 22356 zio_free_issue/ 22357 zio_free_issue/ 22358 zio_free_issue/ 22359 zio_free_issue/ 22360 zio_free_issue/ 22361 zio_free_issue/ 22362 zio_free_issue/ 22363 zio_free_issue/ 22364 zio_free_issue/ 22365 zio_free_issue/ 22366 zio_free_issue/ 22367 zio_free_issue/ 22368 zio_free_issue/ 22369 zio_free_issue/ 22370 zio_free_issue/ 22371 zio_free_issue/ 22372 zio_free_issue/ 22373 zio_free_issue/ 22374 zio_free_issue/ 22375 zio_free_issue/ 22376 zio_free_issue/ 22377 zio_free_issue/ 22378 zio_free_issue/ 22379 zio_free_issue/ 22380 zio_free_issue/ 22381 zio_free_issue/ 22382 zio_free_issue/ 22383 zio_free_issue/ 22384 zio_free_issue/ 22385 zio_free_issue/ 22386 zio_free_issue/ 22387 zio_free_issue/ 22388 zio_free_issue/ 22389 zio_free_issue/ 22390 zio_free_issue/ 22391 zio_free_issue/ 22392 zio_free_issue/ 22393 zio_free_issue/ 22394 zio_free_issue/ 22395 zio_free_issue/ 22396 zio_free_issue/ 22397 zio_free_issue/ 22398 zio_free_issue/ 22399 zio_free_issue/ 22400 zio_free_issue/ 22401 zio_free_issue/ 22402 zio_free_intr/0 22403 zio_claim_issue 22404 zio_claim_intr/ 22405 zio_ioctl_issue 22406 zio_ioctl_intr/ |
Well that's sounds better, but certainly not good. If you have a reproducer you could post in the bug that would help. If not if you could get a new backtrace from 'modprobe zfs'. |
Well, it looks like this was inheritance from the previous code. I was not able to get rid of it by removing rpms, removing /etc/zfs and reinstalling rpms. But when I rsynced OS to new partition and imported zpool from there, everything worked. Now I have all 100 snapshots and 2 volumes mounted without any problem. |
# This is the 1st commit message: Merge branch 'master' of https://github.com/zfsonlinux/zfs * 'master' of https://github.com/zfsonlinux/zfs: Enable QAT support in zfs-dkms RPM # This is the commit message openzfs#2: Import 0.6.5.7-0ubuntu3 # This is the commit message openzfs#3: gbp changes # This is the commit message openzfs#4: Bump ver # This is the commit message openzfs#5: -j9 baby # This is the commit message openzfs#6: Up # This is the commit message openzfs#7: Yup # This is the commit message openzfs#8: Add new module # This is the commit message openzfs#9: Up # This is the commit message openzfs#10: Up # This is the commit message openzfs#11: Bump # This is the commit message openzfs#12: Grr # This is the commit message openzfs#13: Yay # This is the commit message openzfs#14: Yay # This is the commit message openzfs#15: Yay # This is the commit message openzfs#16: Yay # This is the commit message openzfs#17: Yay # This is the commit message openzfs#18: Yay # This is the commit message openzfs#19: yay # This is the commit message openzfs#20: yay # This is the commit message openzfs#21: yay # This is the commit message openzfs#22: Update ppa script # This is the commit message openzfs#23: Update gbp conf with br changes # This is the commit message openzfs#24: Update gbp conf with br changes # This is the commit message openzfs#25: Bump # This is the commit message openzfs#26: No pristine # This is the commit message openzfs#27: Bump # This is the commit message openzfs#28: Lol whoops # This is the commit message openzfs#29: Fix name # This is the commit message openzfs#30: Fix name # This is the commit message openzfs#31: rebase # This is the commit message openzfs#32: Bump # This is the commit message openzfs#33: Bump # This is the commit message openzfs#34: Bump # This is the commit message openzfs#35: Bump # This is the commit message openzfs#36: ntrim # This is the commit message openzfs#37: Bump # This is the commit message openzfs#38: 9 # This is the commit message openzfs#39: Bump # This is the commit message openzfs#40: Bump # This is the commit message openzfs#41: Bump # This is the commit message openzfs#42: Revert "9" This reverts commit de488f1. # This is the commit message openzfs#43: Bump # This is the commit message openzfs#44: Account for zconfig.sh being removed # This is the commit message openzfs#45: Bump # This is the commit message openzfs#46: Add artful # This is the commit message openzfs#47: Add in zed.d and zpool.d scripts # This is the commit message openzfs#48: Bump # This is the commit message openzfs#49: Bump # This is the commit message openzfs#50: Bump # This is the commit message openzfs#51: Bump # This is the commit message openzfs#52: ugh # This is the commit message openzfs#53: fix zed upgrade # This is the commit message openzfs#54: Bump # This is the commit message openzfs#55: conf file zed.d # This is the commit message #56: Bump
…hot. Currently when the dataset is in use we can't receive snapshot. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy The same goes for the Linux version: oshogbo@u-wing:/test$ sudo sudo zfs send test/1@b | sudo zfs recv -F test/2 umount: /test/2: target is busy. cannot unmount '/test/2': umount failed oshogbo@u-wing:/test$ uname -a Linux u-wing 4.18.0-25-generic openzfs#26-Ubuntu SMP Mon Jun 24 09:32:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux This commits add option 'M' which forcible unmounting the dataset. Thanks to to that we can enforce receiving snapshot in single step. Discussed with: pjd Reviowed by: AllanJude (FreeBSD version) FreeBSD review: https://reviews.freebsd.org/D22306
Using zfs with Lustre, an arc_read can trigger kernel memory allocation that in turn leads to a memory reclaim callback and a deadlock within a single zfs process. This change uses spl_fstrans_mark and spl_trans_unmark to prevent the reclaim attempt and the deadlock (https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba). The stack trace observed is: #0 [ffffc9002b98adc8] __schedule at ffffffff81610f2e openzfs#1 [ffffc9002b98ae68] schedule at ffffffff81611558 openzfs#2 [ffffc9002b98ae70] schedule_preempt_disabled at ffffffff8161184a openzfs#3 [ffffc9002b98ae78] __mutex_lock at ffffffff816131e8 openzfs#4 [ffffc9002b98af18] arc_buf_destroy at ffffffffa0bf37d7 [zfs] openzfs#5 [ffffc9002b98af48] dbuf_destroy at ffffffffa0bfa6fe [zfs] openzfs#6 [ffffc9002b98af88] dbuf_evict_one at ffffffffa0bfaa96 [zfs] openzfs#7 [ffffc9002b98afa0] dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs] openzfs#8 [ffffc9002b98b050] dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs] openzfs#9 [ffffc9002b98b100] osd_object_delete at ffffffffa0b64ecc [osd_zfs] openzfs#10 [ffffc9002b98b118] lu_object_free at ffffffffa06d6a74 [obdclass] openzfs#11 [ffffc9002b98b178] lu_site_purge_objects at ffffffffa06d7fc1 [obdclass] openzfs#12 [ffffc9002b98b220] lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass] openzfs#13 [ffffc9002b98b278] shrink_slab at ffffffff811ca9d8 openzfs#14 [ffffc9002b98b338] shrink_node at ffffffff811cfd94 openzfs#15 [ffffc9002b98b3b8] do_try_to_free_pages at ffffffff811cfe63 openzfs#16 [ffffc9002b98b408] try_to_free_pages at ffffffff811d01c4 openzfs#17 [ffffc9002b98b488] __alloc_pages_slowpath at ffffffff811be7f2 openzfs#18 [ffffc9002b98b580] __alloc_pages_nodemask at ffffffff811bf3ed openzfs#19 [ffffc9002b98b5e0] new_slab at ffffffff81226304 openzfs#20 [ffffc9002b98b638] ___slab_alloc at ffffffff812272ab openzfs#21 [ffffc9002b98b6f8] __slab_alloc at ffffffff8122740c openzfs#22 [ffffc9002b98b708] kmem_cache_alloc at ffffffff81227578 openzfs#23 [ffffc9002b98b740] spl_kmem_cache_alloc at ffffffffa048a1fd [spl] openzfs#24 [ffffc9002b98b780] arc_buf_alloc_impl at ffffffffa0befba2 [zfs] openzfs#25 [ffffc9002b98b7b0] arc_read at ffffffffa0bf0924 [zfs] openzfs#26 [ffffc9002b98b858] dbuf_read at ffffffffa0bf9083 [zfs] openzfs#27 [ffffc9002b98b900] dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs] Signed-off-by: Mark Roper <markroper@gmail.com>
…hot. Currently when the dataset is in use we can't receive snapshot. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy The same goes for the Linux version: oshogbo@u-wing:/test$ sudo sudo zfs send test/1@b | sudo zfs recv -F test/2 umount: /test/2: target is busy. cannot unmount '/test/2': umount failed oshogbo@u-wing:/test$ uname -a Linux u-wing 4.18.0-25-generic openzfs#26-Ubuntu SMP Mon Jun 24 09:32:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux This commits add option 'M' which forcible unmounting the dataset. Thanks to to that we can enforce receiving snapshot in single step. Discussed with: pjd Reviewed by: AllanJude (FreeBSD version) FreeBSD review: https://reviews.freebsd.org/D22306 Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
…hot. Currently when the dataset is in use we can't receive snapshot. zfs send test/1@asd | zfs recv -FM test/2 cannot unmount '/test/2': Device busy The same goes for the Linux version: oshogbo@u-wing:/test$ sudo sudo zfs send test/1@b | sudo zfs recv -F test/2 umount: /test/2: target is busy. cannot unmount '/test/2': umount failed oshogbo@u-wing:/test$ uname -a Linux u-wing 4.18.0-25-generic openzfs#26-Ubuntu SMP Mon Jun 24 09:32:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux This commits add option 'M' which forcible unmounting the dataset. Thanks to to that we can enforce receiving snapshot in single step. Discussed with: pjd Reviewed by: AllanJude (FreeBSD version) FreeBSD review: https://reviews.freebsd.org/D22306 Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time. macOS: more work Upstream: zfs_log can't VN_HOLD a possibly unlinked vp Follow in FreeBSD steps, and avoid the first call to VN_HOLD in case it is unlinked, as that can deadlock waiting in vnode_iocount(). Walk up the xattr_parent.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time. macOS: more work Upstream: zfs_log can't VN_HOLD a possibly unlinked vp Follow in FreeBSD steps, and avoid the first call to VN_HOLD in case it is unlinked, as that can deadlock waiting in vnode_iocount(). Walk up the xattr_parent.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time. macOS: more work Upstream: zfs_log can't VN_HOLD a possibly unlinked vp Follow in FreeBSD steps, and avoid the first call to VN_HOLD in case it is unlinked, as that can deadlock waiting in vnode_iocount(). Walk up the xattr_parent.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time. macOS: more work Upstream: zfs_log can't VN_HOLD a possibly unlinked vp Follow in FreeBSD steps, and avoid the first call to VN_HOLD in case it is unlinked, as that can deadlock waiting in vnode_iocount(). Walk up the xattr_parent.
Add all files required for the macOS port. Add new cmd/os/ for tools which are only expected to be used on macOS. This has support for all macOS version up to Catalina. (Not BigSur). Signed-off-by: Jorgen Lundman <lundman@lundman.net> macOS: big uio change over. Make uio be internal (ZFS) struct, possibly referring to supplied (XNU) uio from kernel. This means zio_crypto.c can now be identical to upstream. Update for draid, and other changes macOS: Use SET_ERROR with uiomove. [squash] macOS: they went and added vdev_draid macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too Upstream: avoid warning zio_crypt.c:1302:3: warning: passing 'const struct iovec *' to parameter of type 'void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] kmem_free(uio->uio_iov, uio->uio_iovcnt * sizeof (iovec_t)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ macOS: Update zfs_acl.c to latest This includes commits like: 65c7cc4 1b376d1 cfdc432 716b53d a741b38 485b50b macOS: struct vdev changes macOS: cstyle, how you vex me [squash] Upstream: booo Werror booo Upstream: squash baby Not defined gives warnings. Upstream: Include all Makefiles Signed-off-by: Jorgen Lundman <lundman@lundman.net> double draid! macOS: large commit macOS: Use APPLE approved kmem_alloc() macOS: large commit WIP: remove reliance on zfs.exports The memory-pressure has been nerfed, and will not run well until we can find other solutions. The kext symbol lookup we can live without, used only for debug and panic. Use lldb to lookup symbols. leaner! leanerr! remove zfs.export dependency cont. export reduction cont. cont. Corrective tweaks for building Correct vnode_iocount() Cleanup pipe wrap code, use pthreads, handle multiple streams latest pipe send with threads sort of works, but bad timing can be deadlock macOS: work out corner case starvation issue in cv_wait_sig() Fix -C in zfs send/recv cv_wait_sig squash Also wrap zfs send resume Implement VOP_LOOKUP for snowflake Finder Don't change date when setting size. Seems to be a weird required with linux, so model after freebsd version macOS: correct xattr checks for uio Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures Fix a noisy source of misleading-indentation warnings Fix "make install" ln -s failures fix ASSERT: don't try to peer into opaque vp structure Import non-panicking ASSERT from old spl/include/sys/debug.h Guard with MACOS_ASSERT_SHOULD_PANIC which will do what Linux and FreeBSD do: redefine ASSERTs as VERIFYs. The panic report line will say VERIFY obscuring the problem, and a system panic is harsher (and more dangerous) on MacOS than a zfs-module panic on Linux. ASSERTions: declare assfail in debug.h Build and link spl-debug.c Eliminate spurious "off" variable, use position+offset range Make sure we hold the correct range to avoid panic in dmu_tx_dirty_buf (macro DMU_TX_DIRTY_BUF defined with --enable-debug). zvol_log_write the range we have written, not the future range silence very noisy and dubious ASSERT macOS: M1 fixes for arm64. sysctl needs to use OID2 Allocs needs to be IOMalloc_aligned Initial spl-vmem memory area needs to be aligned to 16KB No cpu_number() for arm64. macOS: change zvol locking, add zvol symlinks macOS: Return error on UF_COMPRESSED This means bsdtar will be rather noisy, but we prefer noise over corrupt files (all files would be 0-sized). usr/bin/zprint: Failed to set file flags~ -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint usr/bin/zprint: Failed to set file flags -rwxr-xr-x 1 root wheel 47024 Mar 17 2020 /Volumes/BOOM/usr/bin/zprint Actually include zedlet for zvols macOS: Fix Finder crash on quickview, SMB error codes xattr=sa would return negative returncode, hangover from ZOL code. Only set size if passed a ptr. Convert negative errors codes back to normal. Add LIBTOOLFLAGS for macports toolchain This will replace PR#23 macOS zpool import fixes The new codebase uses a mixture of thread pools and lio_listio async io, and on macOS there are low aio limits, and when those are reached lio_listio() returns EAGAIN when probing several prospective leaf vdevs concurrently, looking for labels. We should not abandon probing a vdev in this case, and can usually recover by trying again after a short delay. (We continue to treat other errnos as unrecoverable for that vdev, and only try to recover from EAGAIN a few times). Additionally, take logic from old o3x and don't probe a variety of devices commonly found in /dev/XXX as they either produce side-effects or are simply wasted effort. Finally, add a trailing / that FreeBSD and Linux both have. listxattr may not expose com.apple.system xattr=sa We need to ask IOMallocAligned for the enclosing POW2 vmem_create() arenas want at least natural alignment for the spans they import, and will panic if they don't get it. For sub-PAGESIZE calls to osif_malloc, align on PAGESIZE. Otherwise align on the enclosing power of two for any osif_malloc allocation up to 2^32. Anything that asks osif_malloc() for more than that is almost certainly a bug, but we can try aligning on PAGESIZE anyway, rather than extend the enclosing-power-of-two device to handle 64-bit allocations. Simplify the creation of bucket arenas, and adjust their quanta. This results in handing back considerably more (and smaller) chunks of memory to osif_free if there is pressure, and reduces waits in xnu_alloc_throttled(), so is a performance win for a busy memory-constrained system. Finally, uncomment some valid code that might be used by future callers of vmem_xcreate(). use vmem_xalloc to match the vmem_xfree of initial dynamic alloc vmem_alloc() breaks the initial large vmem_add() allocation into smaller chunks in an effort to have a large number vmem segments in the arena. This arena does not benefit from that. Additionaly, in vmem_fini() we call vmem_xfree() to return the initial allocation because it is done after almost everything has been pulled down. Unfortunately vmem_xfree() returns the entire initial allocation as a single span. IOFree() checks a variable maintained by the IOMalloc* allocators which tracks the largest allocation made so far, and will panic when (as it almost always is the case) the initial large span is handed to it. This usually manifests as a panic or hang on kext unload, or a hang at reboot. Consequently, we will now use vmem_xalloc() for this initial allocation; vmem_xalloc() also lets us explicitly specify the natural alignement we want for it. zfs_rename SA_ADDTIME may grow SA Avoid: zfs`dmu_tx_dirty_buf(tx=0xffffff8890a56e40, db=0xffffff8890ae8cd0) at dmu_tx.c:674:2 -> 674 panic("dirtying dbuf obj=%llx lvl=%u blkid=%llx but not tx_held\n", 675 (u_longlong_t)db->db.db_object, db->db_level, 676 (u_longlong_t)db->db_blkid); zfs diff also needs to be wrapped. Replace call to pipe() with a couple of open(mkfifo) instead. Upstream: cstyle zfs_fm.c macOS: cstyle baby IOMallocAligned() should call IOFreeAligned() macOS: zpool_disable_volumes v1 When exporting, also kick mounted zvols offline macOS: zpool_disable_volumes v2 When exporting zvols, check IOReg for the BSDName, instead of using readlink on the ZVOL symlinks. Also check if apfs has made any synthesized disks, and ask them to unmount first. ./scripts/cmd-macos.sh zpool export BOOM Exporting 'BOOM/volume' ... asking apfs to eject 'disk5' Unmount of all volumes on disk5 was successful ... asking apfs to eject 'disk5s1' Unmount of all volumes on disk5 was successful ... asking ZVOL to export 'disk4' Unmount of all volumes on disk4 was successful zpool_disable_volume: exit macOS: Add libdiskmgt and call inuse checks macOS: compile fixes from rebase macOS: oh cstyle, how you vex me so macOS: They added new methods - squash macOS: arc_register_hotplug for userland too macOS: minor tweaks for libdiskmgt macOS: getxattr size==0 is to lookup size Also skip the ENOENT return for "zero" finderinfo, as we do not skip over them in listxattr. macOS: 10.9 compile fixes macOS: go to rc2 macOS: kstat string handling should copyin. cstyle baby macOS: Initialise ALL quota types projectid, userobj, groupobj and projectobj, quotas were missed. macOS: error check sysctl for older macOS Wooo cstyle, \o/ Make arc sysctl tunables work (#27) * use an IOMemAligned for a PAGE_SIZE allocation * we should call arc_kstat_update_osx() Changing kstat.zfs.darwin.tunable.zfs_arc_min doesn't do anything becasue arc_kstat_update_osx() was removed at the same time the (obsoleted by upstream) arc_kstat_update() was removed from zfs_kstat_osx.c. Put it back. * when we sysctl arc tunables, call arc_tuning_update() * rely on upstream's sanity checking Simplification which also avoids spurious CMN_WARN messages caused by setting the arcstat variables here, when upstream's arc_tuning_update() checks that they differ from the tunable variables. * add tunable zfs_arc_sys_free and fix zfs_arc_lotsfree_percent both are in upstream's arc_tuning_update() zfs_arc_sys_free controls the amount of memory that ARC will leave free, which is roughly what lundman wants for putting some sort of cap on memory use. * cstyle macOS: set UIO direction, to receive xattr from XNU macOS: ensure uio is zeroed in case XNU uio is NULL. Fix zfs_vnop_getxattr (openzfs#28) "xattr -l <file>" would return inconsistent garbage, especially from non-com.apple.FinderInfo xattrs. The UIO_WRITE was a 0 (UIO_READ) in the previous commit, change it. Also turn kmem_alloc -> kmem_zalloc in zfs_vnops_osx.c, for cheap extra safety. launch `zpool import` through launchd in the startup script (#26) Signed-off-by: Guillaume Lessard <glessard@tffenterprises.com> cstyle macOS: correct dataset_kstat_ logic and kstat leak. dataset_kstat_create() will allocate a string and set it before calling kstat_create() - so we can not set strings to NULL. Likewise, we can not bulk free strings on unload, we have to rely on the caller of kstat to do so. (Which is proper). Add calls to dataset_kstat for datasets and zvol. kstat.zfs/BOOM.dataset.objset-0x36.dataset_name: BOOM kstat.zfs/BOOM.dataset.objset-0x36.writes: 0 kstat.zfs/BOOM.dataset.objset-0x36.nwritten: 0 kstat.zfs/BOOM.dataset.objset-0x36.reads: 11 kstat.zfs/BOOM.dataset.objset-0x36.nread: 10810 kstat.zfs/BOOM.dataset.objset-0x36.nunlinks: 0 kstat.zfs/BOOM.dataset.objset-0x36.nunlinked: 0 macOS: remove no previous prototype for function macOS: correct openat wrapper build fixes re TargetConditionals.h (openzfs#30) AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Memory fixes on macOS_pure (openzfs#31) * Improve memory handling on macOS * remove obsolete/unused zfs_file_data/zfs_metadata caching * In the new code base, we use upstream's zio.c without modification, and so the special zio caching code became entirely vestigial, and likely counterproductive. * and make busy ABD better behaved on busy macOS box Post-ABD we no longer gained much benefit in the old code base from the complicated special handling for the caches created in zio.c. As there's only really one size of ABD allocation, we do not need a qcache layer as in 1.9. Instead use an arena with VMC_NO_QCACHE set to ask for for 256k chunks. * don't reap extra caches in arc_kmem_reap_now() KMF_LITE in DEBUG build is OK * build fixes re TargetConditionals.h AvailabilityMacros.h needs TargetConditionals.h defintions in picky modern compilers. Add them to sysmacros.h, and fix a missing sysmacros.h include. Use barrier synchronization and IO Priority in ldi_iokit.cpp (openzfs#33) * other minor changes in vdev_disk Thread and taskq fixing (openzfs#32) Highlights: * thread names for spindump * some taskq_d is safe and useful * reduce thread priorities * use througput & latency QOS * TIMESHARE scheduling * passivate some IO * Pull in relevant changes from old taskq_fixing branch 1.9 experimentation pulled into 2.x * add throttle_set_thread_io_policy to zfs.exports * selectively re-enable TASKQ_DYNAMIC also drop wr_iss zio taskqs even further in priority (cf freebsd) * reduce zvol taskq priority * make system_taskq dynamic * experimentally allow three more taskq_d * lower thread prorities overall on an M1 with no zfs whatsoever, the highest priority threads are in the mid 90s, with most kernel threads at priority 81 (basepri). with so many maxclsyspri threads in zfs, we owuld starve out important things like vm_pageout_scan (pri 91), sched_maintenance_thread (pri 95), and numerous others. moreover, ifnet_start_{interfaces} are all priority 82. we should drop minclsyspri below 81, have defclsyspri at no more than 81, and make sure we have few threads above 89. * some tidying up of lowering of priority Thread and taskq fixing * fix old code pulled into spa.c, and further lower priorities * Thread and taskq fixing drop xnu priorities by one update a comment block set USER_INITIATED throughput QOS on TIMESHARE taskq threads don't boost taskq threads accidentally don't let taskq threads be pri==81 don't let o3x threads have importance > 0 apply xnu thread policies to taskq_d threads too assuming this works, it calls out for DRY refactoring with the other two flavours, that operate on current_thread(). simplify in spa.c make practically all the taskqs TIMESHARE Revert "apply xnu thread policies to taskq_d threads too" Panic in VM This reverts commit 39f93be. Revert "Revert "apply xnu thread policies to taskq_d threads too"" I see what happened now. This reverts commit 75619f0. adjust thread not the magic number refactor setting thread qos make DRY refactor rebuild this includes userland TASKQ_REALLY_DYNAMIC fixes fix typo set thread names for spindump visibility cstyle Upstream: Add --enable-macos-impure to autoconf Controls -DMACOS_IMPURE Signed-off-by: Jorgen lundman <lundman@lundman.net> macOS: Add --enable-macos-impure switch to missing calls. Call the wrapped spl_throttle_set_thread_io_policy Add spl_throttle_set_thread_io_policy to headers macOS: vdev_file should use file_taskq Also cleanup spl-taskq to have taskq_wait_outstanding() in preparation for one day implementing it. Change alloc to zalloc in zfs_ctldir.c Call wrap_zstd_init() and wrap_zstd_fini() (openzfs#34) macOS: change both alloc to zalloc macOS: mutex_tryenter can be used while holding zstd uses mutex_tryenter() to check if it already is holding the mutex. Can't find any implementations that object to it, so changing our spl-mutex.c Tag zfs-2.0.0rc4 macOS: return error from uiomove instead of panic macOS: Skip known /dev entry which hangs macOS: Give better error msg when features are needed for crypto Using 1.9.4 crypto dataset now require userobj and projectquota. Alert the user to activate said features to mount crypt dataset. There is no going back to 1.9.4 after features are enabled. macOS: Revert to pread() over AIO due to platform issues. We see waves of EAGAIN errors from lio_listio() on BigSur (but not Catalina) which could stem from recent changes to AIO in XNU. For now, we will go with the classic read label. Re-introduce a purified memory pressure handling mechanism (openzfs#35) * Introduce pure pressure-detecting-and-reacting system * "pure" -- no zfs.exports requirement * plumb in mach_vm_pressure_level_monitor() and mach_vm_pressure_monitor() calls to maintain reduced set of inputs into previous signalling into (increasingly shared with upstream) arc growth or shrinking policy * introduce mach_vm_pressure kstats which can be compared with userland-only sysctls: kstat.spl.misc.spl_misc.spl_vm_pages_reclaimed: 0 kstat.spl.misc.spl_misc.spl_vm_pages_wanted: 0 kstat.spl.misc.spl_misc.spl_vm_pressure_level: 0 vm.page_free_wanted: 0 vm.page_free_count: 25,545 vm.page_speculative_count: 148,572 * and a start on tidying and obsolete code elimination * make arc_default_max() much bigger Optional: can be squashed into main pressure commit, or omitted. Users can use zsysctl.conf or manual setting of kstat.zfs.darwin.tunable.zfs_arc_max to override whichever default is chosen (this one, or the one it replaces). Allmem is already deflated during initialization, so this patch raises the un-sysctled ARC maximum from 1/6 to 1/2 of physmem. * handle (vmem) abd_cache fragmentation after arc shrink When arc shrinks due to a significant pressure event, the abd_chunk kmem cache will free slabs back to the vmem abd_cache, and this memory can be several gigabytes. Unfortunately multi-threaded concurrent kmem_cache allocation in the first place, and a priori unpredicatble arc object lifetimes means that abds held by arc objects may be scattered across multiple slabs, with different objects interleaved within slabs. Thus after a moderate free, the vmem cache can be fragmented and this is seen by (sysctl) kstat.vmem.vmem.abd_cache.mem_inuse being much smaller than (sysctl) kstat.vmem.vmem.abd_cache.mem_import, the latter of which may even be stuck at approximately the same value as before the arc free and kmem_cache reap. When there is a large difference between import and inuse, we set arc_no_grow in hopes that ongoing arc activity will defragment organically. This works better with more arc read/write activity after the free, and almost not at all if after the free there is almost no activity. We also add BESTFIT policy to abd_arena experimentally BESTFIT: look harder to place an abd chunk in a slab rather than place in the first slot that is definitely large enough which breaks the vmem constant-time allocation guarantee, although that is less important for this particular vmem arena because of the strong modality of allocations from the abd_chunk cache (its only client). Additionally reduce the abd_cache arena import size to 128k from 256k; the increase in allocation and free traffic between it and the heap is small compared to the gain under this new anti-fragmentation scheme. * some additional tidying in arc_os.c Tag macos-2.0.0-rc5 abd_cache fragmentation mitigation (openzfs#36) * printf->dprintf HFS_GET_BOOT_INFO periodically there will be huge numbers of these printfs, and they are not really useful except when debugging vnops. * Mitigate fragmentation in vmem.abd_cache In macOS_pure the abd_chunk kmem cache is parented to the abd_cache vmem arena to avoid sometimes-heavy ARC allocation and free stress on the main kmem cache, and because abd_chunk has such a strongly modal page-sized allocation size. Additionally, abd_chunk allocations and frees come in gangs, often with high multi-thread concurrency. It is that latter property which is the primary source of arena fragmentation, and it will affect any vmem arena directly underneath the abd_chunk kmem cache. Because we have a vmeme parent solely for abd_chunk, we can monitor that parent for various patterns and react to them. This patch monitors the difference between the variables exported as kstat.vmem.vmem.abd_cache.mem_inuse and kstat.vmem.vmem.abd_cache.mem_import, watching for a large gap between the two, which can arise after an ARC shrink returns many slabs from the arc_chunk kmem cache to the abd_cache arena, as vmem segments still contain slabs which hold still-alive abds. When there is a significant gap, we turn on arc_no_grow and hope that organic ARC activity reduces the gap. If after several minutes this is not the case, a small arc_reduce_target_size() is applied. In comparison with previous behaviour, ARC equilibrium sizes will tend slightly -- but not neormously -- lower because the arc target size reduction is made fairly frequently. However, this is offset by the benefit of less *long-term* abd_cache fragmentation, and less complete collapses of ARC in the face of system memory pressure (since less is "stuck" in vmem). ARC consequently will stay at its equilibrium more often than near its minimum. This is demonstrated by a generally lower overall total held memory (kstat.spl.misc.spl_misc.os_mem_alloc) except on systems with essentially no memory pressure, or systems which have been sysctl-tuned for different behaviour. macOS: Additional 10.9 fixes that missed the boat Tidying nvram zfs_boot=pool (openzfs#37) If zfs_boot is set we run a long-lived zfs_boot_import_thread, which can stay running until the kernel module is running _fini() functions at unload or shutdown. This patch dispatches it on a zfs_boot() taskq, to avoid causing a hang at the taskq_wait_outstanding(system_taskq, 0) in zvol.c's zvol_create_minors_recursive(), which would prevent pool imports finishing if the pool contained zvols. (Symptoms: "zpool import" does not exit for any pool, system does not see any zvols). This exposed a long-term race condition in our zfs_boot.cpp: the notifier can cause the mutex_enter(&pools->lock) in zfs_boot_probe_media to be reached before the mutex_enter() after the notifier was created. The use of the system_taskq was masking that, by quietly imposing a serialization choke. Moving the mutex and cv initialization earlier -- in particular before the notifier is created -- eliminates the race. Further tidying in zfs_boot.cpp, including some cstyling, switching to _Atomic instead of volatile. Volatile is for effectively random reads; _Atomic is for when we want many readers to have a consistent view after the variable is written. Finally, we need TargetConditionals.h in front of AvailabilityMacros.h in order to build. Add includes to build on Big Sur with macports-clang-11 (openzfs#38) * TargetConditionals.h before all AvailabilityMacros.h * add several TargetConditionals.h and AvaialbilityMacros.h Satisfy picky macports-clang-11 toolchain on Big Sur. macOS: clean up large build, indicate errors. Fix debug macOS: Retire MNTTYPE_ZFS_SUBTYPE lookup zfs in iokit macOS: rename net.lundman. -> org.openzfsonosx. macOS: Tag va_mode for upstream ASSERTS XNU sets va_type = VDIR, but does not bother with va_mode. However ZFS checks to confirm S_ISDIR is set in mkdir. macOS: Fix zfs_ioc_osx_proxy_dataset for datasets It was defined as a _pool() ioctl. While we are here changing things change it into a new-style ioctl instead. This should fix non-root datasets mounting as a proxy (devdisk=on). cstyle macOS: setxattr debug prints left in macOS: don't create DYNAMIC with _ent taskq macOS: Also uninstall new /usr/local/zfs before install macos-2.0.0-rc6 macOS: strcmp deprecated after macOS 11 macOS: pkg needs to notarize at the end macOS: strdup strings in getmntent mirrored on FreeBSD. macOS: remove debug print macOS: unload zfs, not openzfs macOS: actually include the volume icon file as well also update to PR macOS: prefer disk over rdisk macOS: devdisk=off mimic=on needs to check for dataset Datasets with devdisks=on will be in ioreg, with it off and mimic=on then it needs to handle: BOOM/fs1 /Volumes/BOOM/fs1 by testing if "BOOM/fs1" is a valid dataset. fixifx macOS: doubled up "int rc" losing returncode Causing misleading messages macOS: zfsctl was sending from IDs macOS: let zfs mount as user succeed If the "mkdir" can succeed (home dir etc, as opposed to /Volumes) then let the mount be able to happen. macOS: Attempt to implement taskq_dispatch_delay() frequently used with taskq_cancel_id() to stop taskq from calling `func()` before the timeout expires. Currently implemented by the taskq sleeping in cv_timedwait() until timeout expires, or it is signalled by taskq_cancel_id(). Seems a little undesirable, could we build an ordered list of delayed taskqs, and only place them to run once timeout has expired, leaving the taskq available to work instead of delaying. macOS: Separate unmount and proxy_remove When proxy_remove is called at the tail end of unmount, we get the alert about "ejecting before disconnecting device". To mirror the proxy create, we make it a separate ioctl, and issue it after unmount completes. macOS: explicitly call setsize with O_TRUNC It appears O_TRUNC does nothing, like the goggles. macOS: Add O_APPEND to zfs_file_t It is currently not used, but since it was written for a test case, we might as well keep it. macOS: Pass fd_offset between kernel and userland. macOS: Missing return in non-void function macOS: finally fix taskq_dispatch_delay() you find a bug, you own the bug. macOS: add missing kstats macOS: restore the default system_delay_taskq macOS: dont call taskq_wait in taskq_cancel macOS: fix taskq_cancel_id() We need to make sure the taskq has finished before returning in taskq_cancel_id(), so that the taskq doesn't get a chance to run after. macOS: correct 'hz' to 100. sysctl kern.clockrate: 100 sleeping for 1 second. bolt: 681571 sleep() 35 bolt: 681672: diff 101 'hz' is definitely 100. macOS: implement taskq_delay_dispatch() Implement delayed taskq by adding them to a list, sorted by wake-up time, and a dispatcher thread which sleeps until the soonest taskq is due. taskq_cancel_id() will remove task from list if present. macOS: ensure to use 1024 version of struct statfs and avoid coredump if passed zhp == NULL. macOS: fix memory leak in xattr_list macOS: must define D__DARWIN_64_BIT_INO_T for working getfsstat getmntany: don't set _DARWIN_FEATURE_64_BIT_INODE This is automatically set by default in userland if the deployment target is > 10.5 macOS: Fix watchdog unload and delay() macOS: improve handling of invariant disks Don't prepend /dev to all paths not starting with /dev as InvariantDisks places its symlinks in /var/run/disk/by-* not /dev/disk/by-*. Also, merge in some tweaks from Linux's zpool_vdev_os.c such as only using O_EXCL with spares. macOS: remove zfs_unmount_006_pos from large. Results in KILLED. Tag macos-2.0.0rc7 macOS: If we don't set SOURCES it makes up zfs.c from nowhere macOS: remove warning macOS: compile fixes after rebase macOS: connect SEEK_HOLE SEEK_DATA to ioctl macOS: Only call vnode_specrdev() when valid macOS: Use VNODE_RELOAD in iterate in the hopes of avoiding ZFS call back in VNOP_INACTIVE macOS: zfs_kmod_fini() calls taskq_cancel_id() so we must unload system_taskq_fini() after the call to zfs_kmod_fini() macOS: shellcheck error macOS: Setting landmines cause panic on M1 "panicString" : "panic(cpu 1 caller 0xfffffe001db72dc8): Break 0xC470 instruction exception from kernel. Ptrauth failure with IA key resulted in 0x2000000000000001 at pc 0xfffffe001c630880, lr 0x8afcfe001c630864 (saved state: 0xfffffe309386b180) macOS: vget should only lookup direct IDs macOS: rootzp left z_projid uninitialised Causing z_projid to have "0xBADDCAFEBADDCAFE" initially, and zfs_link() to return EXDEV due to differenting z_projid, presenting the user with "Cross-device link". Would only happen after loading kext, on the root znode. macOS: Update installer rtf macOS: update and correct the kext_version macOS: Update copyright, fix url and versions macOS ARC memory improvements and old code removal macOS_pure "purification" in spl-[kv]mem coupled with the new dynamics of trying to contain the split between inuse and allocated in the ABD vmem arena produce less memory-greed, so we don't have to do as much policing of memory consumption, and lets us rely on some more common/cross-platform code for a number of commonplace calculation and adjustment of ARC variables. Additionally: * Greater niceness in spl_free_thread : when we see pages are wanted (but no xnu pressure), react more strongly. Notably if we are within 64MB of zfs's memory ceiling, clamp spl_free to a maximum of 32MB. * following recent fixes to abd_os.c, revert to KMC_NOTOUCH at abd_chunk kmem cache creation time, to turn off BUFTAG|CONTENTS|LITE, thus avoiding allocations of many many extra 4k chunks in DEBUG builds. * Double prepopulation of kmem_taskq entries: kmem_cache_applyall() makes this busy, and we want at least as many entries as we have kmem caches at kmem_reqp() time. macOS: more work Upstream: zfs_log can't VN_HOLD a possibly unlinked vp Follow in FreeBSD steps, and avoid the first call to VN_HOLD in case it is unlinked, as that can deadlock waiting in vnode_iocount(). Walk up the xattr_parent.
When importing zpool, I can see zfs but no special files are created in /dev//. If I create new zvol, special file is created and I can mkfs filesystem there.
The text was updated successfully, but these errors were encountered: