Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Panic During rsync: spl_kmem_cache_alloc #2701

Closed
necouchman opened this issue Sep 15, 2014 · 8 comments
Closed

Kernel Panic During rsync: spl_kmem_cache_alloc #2701

necouchman opened this issue Sep 15, 2014 · 8 comments
Milestone

Comments

@necouchman
Copy link

Using ZFS on CentOS6.5. Just about all packages are up-to-date. My disks are FC-attached (32 x 500GB), and my pool splits those disks into four eight-disk raidz-1 vdevs. I have 64GB of RAM in a Dell PowerEdge 1950-III chassis, with 2 x 4-core CPUs. Not the latest and greatest, but a fairly modest system for what I'm trying to do. I'll note here, early on, that I've tested the RAM with memtest86+, and have not found any bad memory, so I don't think bad RAM is the blame in this case.

I'm trying to use the system to rsync data from several SAN volumes as a sort of disk-based backup system. I'm using iSCSI to attach the SAN volumes, most of which are formatted with XFS. I've scripted it down to a process that goes roughly like this:

  • Connect SAN snapshot to server via iSCSI.
  • Mount XFS volume to specific location (ro,noatime).
  • Kick off rsync command (-aADH --inplace --delete) to sync contents.
  • Unmount
  • Disconnect SAN snapshot
  • Create ZFS snapshots

On ZFS, I'm using the following:

  • Deduplication = On
  • Compression = lz4
  • aclmode = posixacl
  • xattr = sa
  • atime = off

I'm running into a fairly consistent, repeatable kernel panic during the rsync. I have vmcore and vmcore-dmesg output (text posted below) for the crashes. It doesn't crash in exactly the same place every time, but seems to be fairly early on in the rsync process (~7TB of data total, usually happens in the first 1-2 TB).

I've searched around a bit - hopefully not duplicating an existing bug out there.

==vmcore-dmesg.txt==

<1>BUG: unable to handle kernel paging request at 0000000000008068
<1>IP: [<ffffffffa035c03c>] spl_kmem_cache_alloc+0x3c/0x1070 [spl]
<4>PGD f63955067 PUD f63953067 PMD 0 
<4>Oops: 0002 [#1] SMP 
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:04.0/0000:0a:00.1/host2/rport-2:0-10/target2:0:10/2:0:10:0/state
<4>CPU 6 
<4>Modules linked in: xfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext3 jbd dm_round_robin dm_multipath microcode iTCO_wdt iTCO_vendor_support dcdbas serio_raw lpc_ich mfd_core igb dca ptp pps_core i5000_edac edac_core i5k_amb zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate bnx2 sg ses enclosure shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix qla2xxx scsi_transport_fc scsi_tgt megaraid_sas radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6598, comm: rsync Tainted: P           ---------------    2.6.32-431.23.3.el6.x86_64 #1 Dell Inc. PowerEdge 1950/0M788G
<4>RIP: 0010:[<ffffffffa035c03c>]  [<ffffffffa035c03c>] spl_kmem_cache_alloc+0x3c/0x1070 [spl]
<4>RSP: 0018:ffff880f63e17518  EFLAGS: 00010246
<4>RAX: 00000007bda10800 RBX: 0000000000000000 RCX: ffffffffa04fa6e0
<4>RDX: 0000000000000030 RSI: 0000000000000230 RDI: 0000000000000000
<4>RBP: ffff880f63e17628 R08: ffff880e91cba7c0 R09: 0000000000000000
<4>R10: ffff880f63e17608 R11: ffff880fca1d80b0 R12: ffff880e92678f20
<4>R13: 0000000000000230 R14: 0000000000000001 R15: 0000000000008068
<4>FS:  00007ff01fd82700(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000008068 CR3: 0000000f6396f000 CR4: 00000000000407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process rsync (pid: 6598, threadinfo ffff880f63e16000, task ffff880fa2663500)
<4>Stack:
<4> 0000000000000000 0000000000000000 0000000001000000 ffff8800283943c0
<4><d> 0000000000000001 ffff880028396840 ffff880fa6ec8e00 00000000ffffffff
<4><d> 0000000000016840 ffff880fa6ec8e00 00000000ffffffff ffff880fa6ec8e00
<4>Call Trace:
<4> [<ffffffffa002a190>] ? avl_find+0x60/0xb0 [zavl]
<4> [<ffffffffa04bd863>] zio_buf_alloc+0x23/0x30 [zfs]
<4> [<ffffffffa041fb58>] arc_get_data_buf+0x338/0x4d0 [zfs]
<4> [<ffffffffa0420540>] arc_buf_alloc+0xf0/0x130 [zfs]
<4> [<ffffffffa0420a08>] arc_read+0x458/0x8e0 [zfs]
<4> [<ffffffff8152a1fe>] ? mutex_lock+0x1e/0x50
<4> [<ffffffffa043d2ca>] ? dnode_block_freed+0x14a/0x160 [zfs]
<4> [<ffffffffa04242d0>] ? dbuf_read_done+0x0/0x110 [zfs]
<4> [<ffffffffa0424b7b>] dbuf_read+0x1eb/0x740 [zfs]
<4> [<ffffffffa042d29f>] dmu_spill_hold_by_dnode+0x4f/0x160 [zfs]
<4> [<ffffffffa042d573>] dmu_spill_hold_existing+0x163/0x170 [zfs]
<4> [<ffffffff8152a8f6>] ? down_read+0x16/0x30
<4> [<ffffffffa045eabf>] sa_get_spill+0x3f/0x80 [zfs]
<4> [<ffffffffa045f97e>] sa_attr_op+0x8e/0x3e0 [zfs]
<4> [<ffffffffa0460033>] sa_lookup_impl+0x13/0x20 [zfs]
<4> [<ffffffffa04600f2>] sa_lookup+0x42/0x60 [zfs]
<4> [<ffffffffa0495d91>] zfs_dirent_lock+0x4c1/0x540 [zfs]
<4> [<ffffffffa036870c>] ? xdr_dec_string+0x8c/0xf0 [spl]
<4> [<ffffffffa0495e9a>] zfs_get_xattrdir+0x8a/0x180 [zfs]
<4> [<ffffffffa03e9d30>] ? nvs_operation+0x150/0x2e0 [znvpair]
<4> [<ffffffffa03eb081>] ? nvlist_common+0x111/0x1f0 [znvpair]
<4> [<ffffffff8152a1fe>] ? mutex_lock+0x1e/0x50
<4> [<ffffffffa04afdb5>] zfs_lookup+0x215/0x340 [zfs]
<4> [<ffffffffa04c686b>] __zpl_xattr_get+0x8b/0x200 [zfs]
<4> [<ffffffff811a4167>] ? __d_lookup+0xa7/0x150
<4> [<ffffffffa04c7af3>] zpl_xattr_get+0x73/0x150 [zfs]
<4> [<ffffffff811a458a>] ? dput+0x9a/0x150
<4> [<ffffffffa04c7e66>] zpl_get_acl+0xd6/0x220 [zfs]
<4> [<ffffffffa04c8008>] zpl_xattr_acl_get+0x58/0x90 [zfs]
<4> [<ffffffffa04c8074>] zpl_xattr_acl_get_access+0x14/0x20 [zfs]
<4> [<ffffffff811aece7>] generic_getxattr+0x87/0x90
<4> [<ffffffff811b0e3e>] vfs_getxattr+0xae/0xe0
<4> [<ffffffff811b0ef0>] getxattr+0x80/0x170
<4> [<ffffffff8119a21b>] ? path_walk+0x7b/0xe0
<4> [<ffffffff81196996>] ? final_putname+0x26/0x50
<4> [<ffffffff81196f9b>] ? putname+0x2b/0x40
<4> [<ffffffff8119b552>] ? user_path_at+0x62/0xa0
<4> [<ffffffff811b10c1>] sys_getxattr+0x61/0xa0
<4> [<ffffffff810e1ee7>] ? audit_syscall_entry+0x1d7/0x200
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 00 00 00 0f 1f 44 00 00 f6 05 08 40 01 00 01 48 89 fb 41 89 f5 74 0d f6 05 f2 3f 01 00 08 0f 85 a3 01 00 00 4c 8d bb 68 80 00 00 <f0> ff 83 68 80 00 00 80 bb 48 80 00 00 00 0f 89 80 00 00 00 4c 
<1>RIP  [<ffffffffa035c03c>] spl_kmem_cache_alloc+0x3c/0x1070 [spl]
<4> RSP <ffff880f63e17518>
<4>CR2: 0000000000008068
@dweeezil
Copy link
Contributor

Given your description above and the stack trace, this sounds like another case of a corrupted spill block. See #2700 and #2663.

@necouchman
Copy link
Author

Okay...is there any more info I can provide, or should I just track with those other issues?

@dweeezil
Copy link
Contributor

@necouchman Following this panic, have you been able to locate a file on the target filesystem which triggers the problem? If so, could you please run zdb -dddd <pool>/<filesystem> <inode_number> on it? Do you think there is any/much memory pressure on the system while running the rsync? If you can reproduce it easily, would you be willing to capture the output from strace -ff -o <file> rsync ... and make it available or, at least, grep for the name of one of the corrupted files/directories in it. Does the problem happen without dedup enabled (at all, sooner, later)?

@necouchman
Copy link
Author

@dweeezil I was not able to locate the exact file, and, unfortunately, needed to get the system running, again, so I blew away the zpool and re-created it, and left xattr=on instead of xattr=sa. I can try to set up a test system and recreate the issue, if that would help. As far as memory pressure - there is some, but most of it is consumed by buffer/cache, not actual process usage. I don't know about dedup on/off - I have dedup on (really need it for this particular system), and have not tried with xattr=sa and dedup off.

@dweeezil
Copy link
Contributor

@necouchman I've not been able to reproduce this problem yet but if you can, I'd love to know about it. Unfortunately, I've not had any time this week to look into it further.

@necouchman
Copy link
Author

Nice... is there a schedule for the next RPM build (for CentOS 6), that would include this commit? I can build from source, but prefer to use the release RPMs.

Thanks! - Nick

@behlendorf
Copy link
Contributor

@necouchman likely a few months. There are still a handful of fixes and features we want in the next tag. You could use the zfs-testing repository to track master, or just grab the latest release from the repository with the fix and stay there until a new tag is made.

ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
If a spill block's dbuf hasn't yet been written when a spill block is
freed, the unwritten version will still be written.  This patch handles
the case in which a spill block's dbuf is freed and undirties it to
prevent it from being written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

	setfattr -n user.test -v very_very_very..._long_value  <file>
	setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#2663
Closes openzfs#2700
Closes openzfs#2701
Closes openzfs#2717
Closes openzfs#2863
Closes openzfs#2884

Conflicts:
	module/zfs/dbuf.c
behlendorf pushed a commit that referenced this issue Dec 23, 2014
If a spill block's dbuf hasn't yet been written when a spill block is
freed, the unwritten version will still be written.  This patch handles
the case in which a spill block's dbuf is freed and undirties it to
prevent it from being written.

The most common case in which this could happen is when xattr=sa is being
used and a long xattr is immediately replaced by a short xattr as in:

	setfattr -n user.test -v very_very_very..._long_value  <file>
	setfattr -n user.test -v short_value  <file>

The first value must be sufficiently long that a spill block is generated
and the second value must be short enough to not require a spill block.
In practice, this would typically happen due to internal xattr operations
as a result of setting acltype=posixacl.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #2663
Closes #2700
Closes #2701
Closes #2717
Closes #2863
Closes #2884
@necouchman
Copy link
Author

Looks like this fix was committed to the 0.6.3-1.2 tag, which has now been RPMd (I got it on my last CentOS6 upgrade)? Assuming that's true I should have it, now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants