Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs arc_freed causes kernel bug in 0.7.0 RC3 + PR5824 build #6069

Closed
koplover opened this issue Apr 26, 2017 · 3 comments
Closed

zfs arc_freed causes kernel bug in 0.7.0 RC3 + PR5824 build #6069

koplover opened this issue Apr 26, 2017 · 3 comments
Labels
Status: Stale No recent activity for issue

Comments

@koplover
Copy link

I created a build of 0.7.0 RC3 with PR5824 to investigate the PRs impact on our performance problems (see #4880 for performance impact)

We run a Xen virtualised system with ZFS hosted in our driver domain, so providing zvols for each of our hosted guests. This performance testing has been going for approximately one week, last night all access to the server through the virtualised networking was lost, and all VMs were 'frozen'.

This is akin to the lock up zfs errors we have seen for 0.6.4.2, but analysing the virtual CPUs assigned to our disk domain did not show any issues. However, the cause was found in the kern.log on next reboot, see trace below:

Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213467] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213482] IP: [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213519] PGD 1a50bc067 PUD 25b0f4067 PMD 0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213527] Oops: 0000 [#1] SMP
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213532] Modules linked in: xen_blkback xen_netback drbd xt_addrtype xt_owner xt_multiport xt_hl xt_tcpudp xt_conntrack xt_NFLOG nfnetlink_log nfnetlink ip6table_mangle iptable_mangle ip6table_nat nf_conntrack_ipv6 nf
defrag_ipv6 nf_nat_ipv6 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel dm_multipath aes_x86_64 lrw gf128mul scsi_dh g
lue_helper ablk_helper cryptd lru_cache libcrc32c xenfs xen_privcmd zfs(POE) zunicode(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zavl(POE) dm_mirror dm_region_hash dm_log raid0 multipath linear dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async

tx raid1 raid10 xor raid6_pq nvme ahci libahci [last unloaded: drbd]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213620] CPU: 2 PID: 23967 Comm: z_wr_int_5 Tainted: P    B      OE  3.19.0-39-zdomu #44~14.04.1
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213627] task: ffff880284de93a0 ti: ffff8802265c4000 task.ti: ffff8802265c4000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213632] RIP: e030:[]  [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213658] RSP: e02b:ffff8802265c7b18  EFLAGS: 00010202
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213663] RAX: 0000000000000007 RBX: ffff8802265c7b70 RCX: 972d0935c2d85410
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213669] RDX: 0000000000000008 RSI: 0000000000025083 RDI: 000000009f7c4a10
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213675] RBP: ffff8802265c7b58 R08: 972d0935c2d85410 R09: 000000018040003a
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213680] R10: ffffea0009ee6e80 R11: ffffffffc02f1dca R12: ffff8801a005db18
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213686] R13: ffffffffc058d5f0 R14: 009e7e0ecceee9e6 R15: 000000000006ea7a
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213696] FS:  0000000000000000(0000) GS:ffff880294d00000(0000) knlGS:0000000000504d80
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213703] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213707] CR2: 0000000000000007 CR3: 00000002417ba000 CR4: 0000000000002660
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213712] Stack:
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213715]  ffff880286ee7400 0000000000025083 0000000000000000 ffff8801a005db18
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213723]  ffff8802866d4000 0000000000000000 00000000000254ac 0000000000000000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213731]  ffff8802265c7b88 ffffffffc0400e65 ffff8802866d4000 0000000000000000
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213739] Call Trace:
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213768]  [] arc_freed+0x25/0x90 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213823]  [] zio_free_sync+0x45/0x140 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213834]  [] ? __slab_free+0x101/0x2c1
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213882]  [] zio_free+0xc3/0x120 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213919]  [] dsl_free+0x11/0x20 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213954]  [] dsl_dataset_block_kill+0x248/0x450 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.213984]  [] dbuf_write_done+0x172/0x1c0 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214009]  [] arc_write_done+0x236/0x3d0 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214055]  [] zio_done+0x2f9/0xc10 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214070]  [] ? spl_kmem_free+0x2a/0x40 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214115]  [] zio_execute+0x9c/0x100 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214128]  [] taskq_thread+0x243/0x480 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214137]  [] ? wake_up_state+0x20/0x20
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214148]  [] ? taskq_cancel_id+0x120/0x120 [spl]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214155]  [] kthread+0xc9/0xe0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214162]  [] ? kthread_create_on_node+0x1c0/0x1c0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214171]  [] ret_from_fork+0x58/0x90
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214178]  [] ? kthread_create_on_node+0x1c0/0x1c0
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214182] Code: 38 c1 48 8b 05 c9 a2 16 00 4a 8b 04 f8 48 85 c0 74 40 49 8b 14 24 48 8b 75 c8 eb 0f 66 0f 1f 44 00 00 48 8b 40 20 48 85 c0 74 27 <48> 39 10 75 f2 49 8b 7c 24 08 48 39 78 08 75 e7 48 3b 70 10 75
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214247] RIP  [] buf_hash_find+0xc1/0x130 [zfs]
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214272]  RSP
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214276] CR2: 0000000000000007
Apr 25 20:08:19 zdiskdd0000-0026-00-00 kernel: [538931.214283] ---[ end trace 071f472bc78f6c36 ]---

@behlendorf
Copy link
Contributor

@koplover thanks for filing the issue. This looks like some kind of memory corruption, but it's not clear how it could happen. This area of the code is also largely unrelated to the changes from PR5824. Definitely let us know if you see this again.

@koplover
Copy link
Author

koplover commented May 3, 2017

@behlendorf no problem, I'll keep an eye on it. I'd preferably like to try PR5824 against the 0.6.5.x stream as that is more stable than 0.7.0 RCs, and issues such as this may not occur - and if they do they are much more critical as people will really be putting that into production.

When I tried initially the merge of the PR based on 0.6.5.9 was not straightforward. I'll take a look and see if it is now on the daily builds for 0.6.5.x stream.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

2 participants