Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Oops when rsyncing #1822

Closed
rfehren opened this issue Oct 31, 2013 · 6 comments
Closed

Kernel Oops when rsyncing #1822

rfehren opened this issue Oct 31, 2013 · 6 comments
Milestone

Comments

@rfehren
Copy link

rfehren commented Oct 31, 2013

Hi,

I have the following configuration:

  • Kernel 3.12 rc5 (removel of old shrinker code reverted, so ZFS compiles)
  • SPL/ZFS 0.6.2
  • 16GB RAM
    options zfs zfs_arc_min=536870912
    options zfs zfs_arc_max=4294967296

$ zpool list -v
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ssd 6.94T 379G 6.57T 5% 1.00x ONLINE -
raidz1 6.94T 379G 6.57T -
scsi-SATA_Crucial_CT960M5_13250940E411 - - - -
scsi-SATA_Crucial_CT960M5_13250940E45D - - - -
scsi-SATA_Crucial_CT960M5_13250940E431 - - - -
scsi-SATA_Crucial_CT960M5_132609429A2E - - - -
scsi-SATA_Crucial_CT960M5_132609429D6B - - - -
scsi-SATA_Crucial_CT960M5_132909464FF2 - - - -
scsi-SATA_Crucial_CT960M5_13310947CE48 - - - -
scsi-SATA_Crucial_CT960M5_13260942678B

While rsyncing, after a while I get the following OOPS (it's reproducible, even though time varies):

[14587.648348] BUG: unable to handle kernel NULL pointer dereference at (null)
[14587.656885] IP: < (null)>
[14587.662285] PGD 0
[14587.664696] Oops: 0010 [#1] SMP
[14587.668373] Modules linked in: zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) nfsd exportfs ipv6 mptspi scsi_transport_spi mptscsih mptbase igb i2c_algo_bit dm_mod hid_generic usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 acpi_cpufreq lpc_ich mfd_core ahci libahci mpt2sas microcode scsi_transport_sas ehci_pci xhci_hcd ehci_hcd usbcore usb_common ixgbe mdio ipmi_si thermal ipmi_msghandler fan processor
[14587.717108] CPU: 2 PID: 2459 Comm: arc_adapt Tainted: P O 3.12.0-rc5-ql-generic-8 #1
[14587.726372] Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 1.1 07/19/2013
[14587.733756] task: ffff88007e3663c0 ti: ffff880402baa000 task.ti: ffff880402baa000
[14587.741872] RIP: 0010:[<0000000000000000>] < (null)>
[14587.750017] RSP: 0018:ffff880402babd50 EFLAGS: 00010246
[14587.755647] RAX: 0000000000000000 RBX: ffff880402ecc368 RCX: 0000000000000000
[14587.763121] RDX: ffffffffa056bd4e RSI: ffff880402babd58 RDI: ffff8804056c3b58
[14587.770748] RBP: ffff880402babd98 R08: 0000000000000000 R09: 0000000000049959
[14587.778219] R10: 0000000000000000 R11: 0140000000000000 R12: ffff880402babdb4
[14587.785683] R13: ffff880402ecc000 R14: ffff8804056c3800 R15: ffff880402babe10
[14587.793143] FS: 0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[14587.801883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14587.807962] CR2: 0000000000000000 CR3: 000000000169c000 CR4: 00000000001407e0
[14587.815423] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14587.822932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[14587.830393] Stack:
[14587.832730] ffffffffa0541a62 00000000000000d0 0000000000000400 0000000000000000
[14587.840863] 0000000000000000 ffffffffa055f4d0 ffff8804056c3800 ffff8804056c3868
[14587.848985] 0000000000000000 ffff880402babdb8 ffffffffa055f4eb 0000000000000064
[14587.857106] Call Trace:
[14587.859908] [] ? zfs_sb_prune+0xb2/0xd0 [zfs]
[14587.866248] [] ? zpl_inode_alloc+0x70/0x70 [zfs]
[14587.872852] [] zpl_prune_sb+0x1b/0x20 [zfs]
[14587.879023] [] iterate_supers_type+0xae/0xd0
[14587.885277] [] ? zpl_prune_sb+0x20/0x20 [zfs]
[14587.891626] [] zpl_prune_sbs+0x27/0x30 [zfs]
[14587.897878] [] arc_adjust_meta+0x119/0x1e0 [zfs]
[14587.904477] [] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
[14587.911255] [] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
[14587.918035] [] arc_adapt_thread+0x5f/0x160 [zfs]
[14587.924642] [] thread_generic_wrapper+0x73/0x90 [spl]
[14587.931679] [] ? __thread_create+0x300/0x300 [spl]
[14587.938522] [] kthread+0xbb/0xc0
[14587.943729] [] ? kthread_freezable_should_stop+0x70/0x70
[14587.951021] [] ret_from_fork+0x7c/0xb0
[14587.956747] [] ? kthread_freezable_should_stop+0x70/0x70
[14587.964043] Code: Bad RIP value.
[14587.967729] RIP < (null)>
[14587.973217] RSP
[14587.977028] CR2: 0000000000000000
[14587.981248] ---[ end trace 9342161bfa5076eb ]---

After the OOPS occurs, memory is increasing until at some point the oomkiller kills system processes and the machine becomes unusable. arcstats from after the OOPS with memory pressure starting to appear:

$ cat /proc/spl/kstat/zfs/arcstats |grep c_
c_min 4 536870912
c_max 4 4294967296
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 7581447024
arc_meta_limit 4 1073741824
arc_meta_max 4 7685258800

Thanks,

Roland

@behlendorf
Copy link
Contributor

It's likely this is related to the shrinker changes you needed to make. We'll certainly get this sorted once we add 3.12 support.

@rfehren
Copy link
Author

rfehren commented Oct 31, 2013

Thanks for the fast reply. It didn't think it's related to my change since I just reverted the commit where the compat shrinker interface was removed torvalds/linux@a0b0213, hence I'm using the shrinker API of 3.11. Could you tell from the call trace that it has to do with the shrinker?

@behlendorf
Copy link
Contributor

@rfehren If that was the only change then perhaps this is unrelated. The stack just happens to be down a similar path which can be reached via the shrinker. Although in this case I see it was by arc_adapt.

@rfehren
Copy link
Author

rfehren commented Nov 2, 2013

I switched to our old 2.6.32 kernel (same spl/zfs version) on this machine. Things are running stable now. So it's either a general problem with 3.12 or indeed related to my shrinker change. Will test again once you've ported to the new shrinker ABI.

@behlendorf
Copy link
Contributor

@rfehren Official 3.12 support has been merged. Can you verify you're not able to reproduce this issue using the latest code from master and a 3.12 kernel.

@rfehren
Copy link
Author

rfehren commented Nov 9, 2013

Yup. Works without a crash now (stock kernel 3.12 with old shrinker API removed). Thanks a lot for the fix. There still seems to be a problem though with the size of the arc_meta cache: It sucks up all available memory way beyond the arc_meta_limit:

$ cat /proc/spl/kstat/zfs/arcstats |grep c_
c_min 4 1049497600
c_max 4 8395980800
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 532895
arc_meta_used 4 6948429328
arc_meta_limit 4 2098995200
arc_meta_max 4 7197867184

In case it makes any difference: This was with compression=lz4 on the filesystem being rsynced. Otherwise no special settings.

FransUrbo pushed a commit to FransUrbo/spl-crypto that referenced this issue Dec 20, 2013
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped.  As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs/zfs#1732
Closes openzfs/zfs#1822
Closes #293
Closes #307
dajhorn pushed a commit to zfsonlinux/pkg-spl that referenced this issue Apr 9, 2014
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped.  As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs/zfs#1732
Closes openzfs/zfs#1822
Closes #293
Closes #307
ryao added a commit to ryao/spl that referenced this issue Apr 10, 2014
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped.  As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs/zfs#1732
Closes openzfs/zfs#1822
Closes openzfs#293
Closes openzfs#307
FransUrbo pushed a commit to zfsonlinux/pkg-spl that referenced this issue Jul 18, 2015
torvalds/linux@24f7c6 introduced a new shrinker API while
torvalds/linux@a0b021 dropped support for the old shrinker API.
This patch adds support for the new shrinker API by wrapping
the old one with the new one.

This change also reorganizes the autotools checks on the shrinker
API such that the configure script will fail early if an unknown
API is encountered in the future.

Support for the set_shrinker() API which was used by Linux 2.6.22
and older has been dropped.  As a general rule compatibility is
only maintained back to Linux 2.6.26.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs/zfs#1732
Closes openzfs/zfs#1822
Closes #293
Closes #307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants