Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow mount times due to enormous ZFS delete queue object #5916

Closed
DeHackEd opened this issue Mar 23, 2017 · 2 comments
Closed

Slow mount times due to enormous ZFS delete queue object #5916

DeHackEd opened this issue Mar 23, 2017 · 2 comments

Comments

@DeHackEd
Copy link
Contributor

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7
Linux Kernel 4.9.11-1.el7.elrepo.x86_64
Architecture x86_64
ZFS Version v0.7.0-rc3_119_g912e2ba
SPL Version v0.7.0-rc3_3_g8d5feec

Describe the problem you're observing

On mounting of a specific filesystem the mount command appears to hang for an extended period. This is very reproducible on this system. Even after sending/receiving the dataset on a remote system said remote experiences long mount times (though not quite as long as this one - 15 seconds vs 10 minutes)

root     27857  0.0  0.0 125640  1820 ?        S    06:34   0:00 /bin/mount -t zfs -o defaults,noatime,dev,exec,rw,suid,nomand,zfsutil pool/fs /srv/fs
root     27858  0.3  0.0 153640  4744 ?        D    06:34   0:00 /sbin/mount.zfs pool/fs /srv/fs -o rw,noatime,zfsutil

Stack trace of mount command:

[<ffffffffa0483133>] cv_wait_common+0xb3/0x130 [spl]
[<ffffffffa0483208>] __cv_wait_io+0x18/0x20 [spl]
[<ffffffffa105f84f>] zio_wait+0xef/0x190 [zfs]
[<ffffffffa0fb0934>] dbuf_read+0x2f4/0x8e0 [zfs]
[<ffffffffa0fbbc40>] dmu_buf_hold_by_dnode+0x50/0x80 [zfs]
[<ffffffffa10245eb>] zap_get_leaf_byblk.isra.11+0x8b/0x240 [zfs]
[<ffffffffa1024869>] zap_deref_leaf+0xc9/0xe0 [zfs]
[<ffffffffa10268eb>] fzap_cursor_retrieve+0x13b/0x260 [zfs]
[<ffffffffa102a374>] zap_cursor_retrieve+0x64/0x290 [zfs]
[<ffffffffa10337ee>] zfs_unlinked_drain+0x5e/0xe0 [zfs]
[<ffffffffa10471b5>] zfs_sb_setup+0x115/0x150 [zfs]
[<ffffffffa1048b7f>] zfs_domount+0x29f/0x330 [zfs]
[<ffffffffa10679dc>] zpl_fill_super+0x2c/0x40 [zfs]
[<ffffffff8122e23d>] mount_nodev+0x4d/0xa0
[<ffffffffa1067e52>] zpl_mount+0x52/0x80 [zfs]
[<ffffffff8122edd9>] mount_fs+0x39/0x160
[<ffffffff8124c277>] vfs_kern_mount+0x67/0x100
[<ffffffff8124e862>] do_mount+0x1e2/0xca0
[<ffffffff8124f633>] SyS_mount+0x83/0xd0
[<ffffffff81003a47>] do_syscall_64+0x67/0x180
[<ffffffff81757cab>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
# time zdb -ddddd pool/fs 3|head -n 35
Dataset pool/fs [ZPL], ID 110, cr_txg 696, 48.7T, 64318820 objects, rootbp DVA[0]=<2:6a451ace000:3000> DVA[1]=<3:1ab19b35d000:3000> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=5094441L/5094441P fill=64318820 cksum=f2994723e:5f373a0c800:13583087450e5:2b4e1e6496d98b

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         3    3    16K    16K   360M     512   391M   99.94  ZFS delete queue
	dnode flags: USED_BYTES USERUSED_ACCOUNTED 
	dnode maxblkid: 25013
	Fat ZAP stats:
		Pointer table:
			32768 elements
			zt_blk: 16400
			zt_numblks: 16
			zt_shift: 15
			zt_blks_copied: 0
			zt_nextblk: 0
		ZAP entries: 0
		Leaf blocks: 24982
		Total blocks: 25014
		zap_block_type: 0x8000000000000001
		zap_magic: 0x2f52ab2ab
		zap_salt: 0x7a559d52f
		Leafs with 2^n pointers:
			  0:  17196 ****************************************
			  1:   7786 *******************
		Blocks with n*5 entries:
			  0:  24982 ****************************************
		Blocks n/10 full:
			  1:  24982 ****************************************
		Entries with n chunks:
		Buckets with n entries:
			  0: 12790784 ****************************************

Indirect blocks:
               0 L2   4:ff246f97200:2a00 4000L/1c00P F=16369 B=3864811/3864811
               0  L1  1:cdde52fd000:6000 4000L/1e00P F=128 B=3864811/3864811
               0   L0 1:cdde52f1000:6000 4000L/1400P F=1 B=3864811/3864811

real	8m23.293s
user	0m8.725s
sys	0m8.642s

In contrast, using zdb to read the root directory listing

# time zdb -ddddd pool/fs 4 | head -n 35
    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         4    3    16K    16K  7.84M     512  4.02M  100.00  ZFS directory
                                               168   bonus  System attributes
	dnode flags: USED_BYTES USERUSED_ACCOUNTED 
	dnode maxblkid: 256
	path	/
	uid     500
	gid     500
	atime	Fri Jul 15 20:44:17 2016
	mtime	Thu Mar 23 08:54:54 2017
	ctime	Thu Mar 23 08:54:54 2017
	crtime	Fri Jul 15 20:44:17 2016
	gen	696
	mode	40755
	size	23217
	parent	4
	links	23217
	pflags	40800000144
	Fat ZAP stats:
		Pointer table:
			1024 elements
			zt_blk: 0
			zt_numblks: 0
			zt_shift: 10
			zt_blks_copied: 0
			zt_nextblk: 0
		ZAP entries: 23215
		Leaf blocks: 256
		Total blocks: 257
		zap_block_type: 0x8000000000000001
		zap_magic: 0x2f52ab2ab
		zap_salt: 0x7a559d681
		Leafs with 2^n pointers:

real	0m18.963s
user	0m4.931s
sys	0m5.508s

Describe how to reproduce the problem

I don't know how it got so big, but now that it is I'm in this unpleasant situation where there's nothing I can do about it on mount.

The large delete queue appears to go with a send/recv, but the enormous dataset and sensitivity of the contents makes it impractical to share.

Include any warning/errors/backtraces from the system logs

@loli10K
Copy link
Contributor

loli10K commented Mar 23, 2017

Duplicate of #3814? How is this different?

@DeHackEd
Copy link
Contributor Author

Interesting. It's probably close enough to call this a duplicate.

The main "issue" is that my delelet queue is empty. There's no work to be done. But because the ZAP data structure is so huge and fragmented (probably?) it takes forever to load into memory and finish the mount. And it never shrinks AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants