Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs deadlock #2244

Closed
tomposmiko opened this issue Apr 7, 2014 · 3 comments
Closed

zfs deadlock #2244

tomposmiko opened this issue Apr 7, 2014 · 3 comments
Milestone

Comments

@tomposmiko
Copy link

Apr 7 11:42:06 lxc06 kernel: [ 4677.739744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 7 11:42:06 lxc06 kernel: [ 4677.739846] [] kthread+0xc0/0xd0
Apr 7 11:42:06 lxc06 kernel: [ 4677.739856] INFO: task spl_kmem_cache/:274 blocked for more than 120 seconds.
Apr 7 11:42:06 lxc06 kernel: [ 4677.739862] ffff8807dfea179c ffff8807efb49770 00000000ffffffff ffff8807dfea17a0
Apr 7 11:42:06 lxc06 kernel: [ 4677.739882] [] zfs_zinactive+0x4e/0xd0 [zfs]
Apr 7 11:42:06 lxc06 kernel: [ 4677.739914] [] prune_icache_sb+0x185/0x350
Apr 7 11:42:06 lxc06 kernel: [ 4677.739922] [] __alloc_pages_nodemask+0x5a9/0x920
Apr 7 11:42:06 lxc06 kernel: [ 4677.739932] [] vmap_page_range_noflush+0x307/0x370
Apr 7 11:42:06 lxc06 kernel: [ 4677.739947] [] ? kv_alloc.isra.9+0x49/0x50 [spl]
Apr 7 11:42:06 lxc06 kernel: [ 4677.739964] [] taskq_thread+0x237/0x4b0 [spl]
Apr 7 11:42:06 lxc06 kernel: [ 4677.739976] [] ? kthread_create_on_node+0x120/0x120
Apr 7 11:42:06 lxc06 kernel: [ 4677.739997] irqbalance D ffff88081fdd4580 0 1298 1 0x00000000
Apr 7 11:42:06 lxc06 kernel: [ 4677.740006] [] schedule_preempt_disabled+0x29/0x70
Apr 7 11:42:06 lxc06 kernel: [ 4677.740056] [] zpl_evict_inode+0x24/0x30 [zfs]
Apr 7 11:42:06 lxc06 kernel: [ 4677.740064] [] shrink_slab+0x165/0x300
Apr 7 11:42:06 lxc06 kernel: [ 4677.740073] [] handle_pte_fault+0x73b/0xab0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740084] [] ? kmem_cache_alloc_trace+0x38/0x130
Apr 7 11:42:06 lxc06 kernel: [ 4677.740093] [] ? terminate_walk+0x51/0x60
Apr 7 11:42:06 lxc06 kernel: [ 4677.740101] [] page_fault+0x28/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740110] [] SyS_read+0x49/0xa0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740117] ffff8807df7b36a0 0000000000000046 ffff8807df7b3fd8 0000000000014580
Apr 7 11:42:06 lxc06 kernel: [ 4677.740125] [] __mutex_lock_slowpath+0x13f/0x1c0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740181] [] shrink_slab+0x165/0x300
Apr 7 11:42:06 lxc06 kernel: [ 4677.740190] [] handle_pte_fault+0x73b/0xab0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740200] [] handle_mm_fault+0x299/0x670
Apr 7 11:42:06 lxc06 kernel: [ 4677.740207] [] ? pointer.isra.15+0x3b4/0x400
Apr 7 11:42:06 lxc06 kernel: [ 4677.740215] [] ? seq_read+0x27b/0x390
Apr 7 11:42:06 lxc06 kernel: [ 4677.740235] INFO: task java:6285 blocked for more than 120 seconds.
Apr 7 11:42:06 lxc06 kernel: [ 4677.740241] ffff8807ef904b58 ffff8807ef904b60 ffff880785bfbf58 ffff8807ef904b00
Apr 7 11:42:06 lxc06 kernel: [ 4677.740250] [] ? down_read+0x20/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740261] [] do_page_fault+0x2c/0x50
Apr 7 11:42:06 lxc06 kernel: [ 4677.740267] ffff88071efb9750 0000000000000046 ffff88071efb9fd8 0000000000014580
Apr 7 11:42:06 lxc06 kernel: [ 4677.740275] [] __mutex_lock_slowpath+0x13f/0x1c0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740325] [] evict+0xb6/0x1b0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740333] [] do_try_to_free_pages+0x39a/0x4c0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740342] [] swapin_readahead+0x98/0xe0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740351] [] ? do_futex+0x102/0x620
Apr 7 11:42:06 lxc06 kernel: [ 4677.740359] [] page_fault+0x28/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740364] ffff88072e2b1dd8 0000000000000046 ffff88072e2b1fd8 0000000000014580
Apr 7 11:42:06 lxc06 kernel: [ 4677.740368] Call Trace:
Apr 7 11:42:06 lxc06 kernel: [ 4677.740373] [] ? __do_page_fault+0x1f4/0x530
Apr 7 11:42:06 lxc06 kernel: [ 4677.740379] [] ? down_write+0x2d/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740384] [] ? do_page_fault+0x2c/0x50
Apr 7 11:42:06 lxc06 kernel: [ 4677.740389] INFO: task java:8810 blocked for more than 120 seconds.
Apr 7 11:42:06 lxc06 kernel: [ 4677.740392] ffff88071391fd68 0000000000000046 ffff88071391ffd8 0000000000014580
Apr 7 11:42:06 lxc06 kernel: [ 4677.740395] ffff8807ef904b58 ffff8807ef904b60 ffff88071391ff58 ffff8807ef904b00
Apr 7 11:42:06 lxc06 kernel: [ 4677.740398] [] schedule+0x29/0x70
Apr 7 11:42:06 lxc06 kernel: [ 4677.740402] [] call_rwsem_down_read_failed+0x14/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740405] [] __do_page_fault+0x1b4/0x530
Apr 7 11:42:06 lxc06 kernel: [ 4677.740409] [] ? sock_ioctl+0x1f0/0x2c0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740412] [] ? finish_task_switch+0x50/0xf0
Apr 7 11:42:06 lxc06 kernel: [ 4677.740415] [] page_fault+0x28/0x30
Apr 7 11:42:06 lxc06 kernel: [ 4677.740419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 7 11:42:06 lxc06 kernel: [ 4677.740421] ffff8806d2749d68 0000000000000046 ffff8806d2749fd8 0000000000014580
Apr 7 11:42:06 lxc06 kernel: [ 4677.740424] ffff8807ef904b58 ffff8807ef904b60 ffff8806d2749f58 ffff8807ef904b00
Apr 7 11:42:06 lxc06 kernel: [ 4677.740428] [] schedule+0x29/0x70
Apr 7 11:42:06 lxc06 kernel: [ 4677.740430] [] rwsem_down_read_failed+0xf5/0x130

root@lxc06:# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 13.10
Release: 13.10
Codename: saucy
root@lxc06:
# uname -a
Linux lxc06 3.11.0-19-generic #33-Ubuntu SMP Tue Mar 11 18:48:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

zpool status

pool: tank
state: ONLINE
scan: scrub in progress since Mon Apr 7 12:21:15 2014
49.4M scanned out of 412G at 791K/s, 151h37m to go
0 repaired, 0.01% done
config:

NAME                                          STATE     READ WRITE CKSUM
tank                                          ONLINE       0     0     0
  mirror-0                                    ONLINE       0     0     0
    ata-WDC_WD10JFCX-68N6GN0_WD-WX21EA3D4747  ONLINE       0     0     0
    ata-WDC_WD10JFCX-68N6GN0_WD-WXK1AA3Z2181  ONLINE       0     0     0
logs
  sda3                                        ONLINE       0     0     0
cache
  sda4                                        ONLINE       0     0     0
  zram1                                       ONLINE       0     0     0
  zram2                                       ONLINE       0     0     0
  zram3                                       ONLINE       0     0     0
  zram4                                       ONLINE       0     0     0

errors: No known data errors

zfs 0.6.2-1~saucy

2014-02-04.22:39:40 zpool create tank -f -o ashift=12 -O atime=off mirror ata-WDC_WD10JFCX-68N6GN0_WD-WX21EA3D4747 ata-WDC_WD10JFCX-68N6GN0_WD-WXK1AA3Z2181 log sda3 cache sda4

zfs covers only the data storage, system is on SSD (sda).

There is 32GB RAM in the machine.

The system was running fine for weeks then it was hard reset (by accident). When it came up it was crashed after about 1-2 hours. It wasn't a full lockup, I was able to write to the cli, but didn't get output from the command 'w'.

Let me know, if you need more information.

@behlendorf behlendorf added this to the 0.6.4 milestone Apr 14, 2014
@behlendorf behlendorf added the Bug label Apr 14, 2014
@behlendorf
Copy link
Contributor

Thanks for filing this, hopefully it should be enough.

@ryao
Copy link
Contributor

ryao commented Apr 25, 2014

Could this have been fixed by the following commits?

8ac6729
6f9548c

@tomposmiko
Copy link
Author

It's probably fixed as @ryao suggested.
At least I don't see these messages now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants