Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic(cpu 1 caller 0xffffff80002438d8): "zalloc: \"kalloc.1024\" (100535 elements) retry fail 3, kfree_nop_count: 0"@/SourceCache/xnu/xnu-2050.7.9/osfmk/kern/zalloc.c:1826 #30

Closed
lundman opened this issue Jun 4, 2013 · 9 comments

Comments

@lundman
Copy link
Contributor

lundman commented Jun 4, 2013

Running large iozone on HDD, ~200GB results in the following panic:

Backtrace (CPU 1), Frame : Return Address
0xffffff8040eb32b0 : 0xffffff800021d5f6 mach_kernel : _panic + 0xc6
0xffffff8040eb3320 : 0xffffff80002438d8 mach_kernel : _zalloc_canblock + 0x428
0xffffff8040eb3400 : 0xffffff80002245bd mach_kernel : _kalloc_canblock + 0x7d
0xffffff8040eb3430 : 0xffffff8000224c39 mach_kernel : _OSMalloc + 0x89
0xffffff8040eb3460 : 0xffffff7f80e8e7df net.lundman.spl : _kmem_cache_alloc + 0x2f
0xffffff8040eb34a0 : 0xffffff7f80f79673 net.lundman.zfs : _zio_create + 0xc3
0xffffff8040eb35a0 : 0xffffff7f80f79fe9 net.lundman.zfs : _zio_write + 0x139
0xffffff8040eb36b0 : 0xffffff7f80e9ff30 net.lundman.zfs : _arc_write + 0x180
0xffffff8040eb37a0 : 0xffffff7f80eac5c8 net.lundman.zfs : _dbuf_write + 0x628
0xffffff8040eb3960 : 0xffffff7f80eabc6b net.lundman.zfs : _dbuf_sync_leaf + 0x3cb
0xffffff8040eb39d0 : 0xffffff7f80eab737 net.lundman.zfs : _dbuf_sync_list + 0x87
0xffffff8040eb3a00 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3a40 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3a70 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3ab0 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3ae0 : 0xffffff7f80eab87e net.lundman.zfs : _dbuf_sync_indirect + 0x12e
0xffffff8040eb3b20 : 0xffffff7f80eab725 net.lundman.zfs : _dbuf_sync_list + 0x75
0xffffff8040eb3b50 : 0xffffff7f80ecea6e net.lundman.zfs : _dnode_sync + 0x60e
0xffffff8040eb3bb0 : 0xffffff7f80eb96a6 net.lundman.zfs : _dmu_objset_sync_dnodes + 0x96
0xffffff8040eb3bf0 : 0xffffff7f80eb93ed net.lundman.zfs : _dmu_objset_sync + 0x44d
0xffffff8040eb3d30 : 0xffffff7f80ed4ef1 net.lundman.zfs : _dsl_dataset_sync + 0x51
0xffffff8040eb3d60 : 0xffffff7f80ee326c net.lundman.zfs : _dsl_pool_sync + 0xfc
0xffffff8040eb3e50 : 0xffffff7f80f09970 net.lundman.zfs : _spa_sync + 0x4e0
0xffffff8040eb3f20 : 0xffffff7f80f16362 net.lundman.zfs : _txg_sync_thread + 0x332
0xffffff8040eb3fb0 : 0xffffff80002b2677 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:
System uptime in nanoseconds: 2309433338440
vm objects:4441920
pv_list:2420736
vm pages:36918072
kalloc.16:25956352
kalloc.32:3534848
kalloc.64:8560640
kalloc.128:78135296
kalloc.256:303702016
kalloc.512:222441472
kalloc.1024:102961152
kalloc.8192:1392640
vnodes:4689432
namecache:2040000
HFS node:5917776
HFS fork:1773568
buf.8192:24805376
Kernel Stacks:3833856
PageTables:13852672
Kalloc.Large:31064098

Backtrace suspected of leaking: (outstanding bytes: 94208)
0xffffff8000243589
0xffffff80002245bd
0xffffff8000224c39
0xffffff7f80e8e7df
0xffffff7f80f79673
0xffffff7f80f79fe9
0xffffff7f80e9ff30
0xffffff7f80eac5c8
0xffffff7f80eabc6b
0xffffff7f80eab737
0xffffff7f80eab87e
0xffffff7f80eab725
0xffffff7f80eab87e
0xffffff7f80eab725
0xffffff7f80eab87e
(gdb) zprint
ZONE                   COUNT   TOT_SZ   MAX_SZ   ELT_SZ ALLOC_SZ         TOT_ALLOC         TOT_FREE NAME
0xffffff8002a89000       178    1a290    1c800      592     5000               178                0 zones X$
0xffffff8002a8a970     18669   43c740   5b2000      224     4000           1134656          1115987 vm objects CX$
0xffffff8002a8a720     13248    817e0    c0000       40     1000             14655             1407 vm object hash entries CX$
0xffffff8002a8a4d0        91     6e90     a000      232     2000              2681             2590 maps X$
0xffffff8002a8a280      8135    c0f30   100000       80     5000           1532690          1524555 VM map entries CX$
0xffffff8002a8a030        33    cb3e0   280000       80     1000             79812            79779 Reserved VM map entries $
0xffffff8002a89de0         0      ff0     4000       80     1000             11503            11503 VM map copies CX$
0xffffff8002a89b90        81     6000    19000      256     1000               904              823 pmap CX$
0xffffff8002a89940        81    51000    81bf1     4096     1000               904              823 pagetable anchors CX$
0xffffff8002a896f0     50200   24f000   255600       48     3000             50200                0 pv_list CX$
0xffffff8002a894a0    512442  2335338        0       72     2000           2299521          2291262 vm pages HC
0xffffff8002a89250   1620133  18c1000  22a3599       16     1000         125203838        123583705 kalloc.16 CX
0xffffff8006306c50    110335   35f000   4ce300       32     1000          13634985         13524650 kalloc.32 CX
0xffffff8006306a00    133584   82a000   e6a900       64     1000          26510120         26376536 kalloc.64 CX
0xffffff80063067b0    610090  4a84000  614f4c0      128     1000          50524515         49914425 kalloc.128 CX
0xffffff8006306560   1070398 121a2000 1b5e4d60      256     1000          72534632         71464234 kalloc.256 CX
0xffffff8006306310    399302  d423000  daf26b0      512     1000          39231204         38831902 kalloc.512 CX
0xffffff80063060c0    100404  6231000  c29e980     1024     1000          22949693         22849289 kalloc.1024 CX
0xffffff8006305e70       292    9a000   200000     2048     1000          77633725         77633433 kalloc.2048 CX
0xffffff8006305c20       131    83000   900000     4096     1000             47564            47433 kalloc.4096 CX
0xffffff80063059d0       170   154000  2000000     8192     2000             29553            29383 kalloc.8192 CX
0xffffff8006305780     13248    34000    48000       16     1000             14655             1407 mem_obj_control CX$

Which makes malloc.256 be a likely candidate for memory leaks.

I suspect that the panic at malloc.1024, from zio_create() just happens to be the call running into the trouble first, and maybe not the place we leak.

@lundman
Copy link
Contributor Author

lundman commented Jun 5, 2013

Booting with zlog=kalloc.256 and using findoldest zstack and countpcs we find this situation;

(gdb) findoldest                                                                
oldest record is at log index 393:

--------------- ALLOC  0xffffff803276ec00 : index 393  :  ztime 21643824 -------------
0xffffff800024352e <zalloc_canblock+78>:        mov    %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>:        jmpq   0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>:       mov    %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov    %rax,%r15
0xffffff7f80e90649 <arc_buf_alloc+41>:  mov    %rax,-0x28(%rbp)
(gdb) countpcs 0xffffff7f80e90649   # arc_buf_alloc
occurred 2390 times in log (59% of records)

So it is less of a leak, and more of a where is the arc reclaim situation.

Other stacks are

--------------- ALLOC  0xffffff8012568300 : index 700  :  ztime 22130651 -------------
0xffffff800024352e <zalloc_canblock+78>:        mov    %eax,-0xcc(%rbp)
0xffffff80002245bd <get_zone_search+23>:        jmpq   0xffffff80002246d8 <KALLOC_ZINFO_SALLOC+35>
0xffffff8000224c39 <OSMalloc+89>:       mov    %rax,-0x18(%rbp)
0xffffff7f80e847df <zfs_kmem_alloc+15>: mov    %rax,%r15
0xffffff7f80ea0cbb <dbuf_create+59>:    mov    %rax,-0x40(%rbp)
0xffffff7f80ea0900 <__dbuf_hold_impl+1360>:     mov    -0x68(%rbp),%rcx
0xffffff7f80ea02fc <dbuf_hold_impl+140>:        mov    $0x780,%rsi
0xffffff7f80ea0bf2 <dbuf_hold+50>:      mov    %eax,-0x24(%rbp)
0xffffff7f80ea83bb <dmu_buf_hold_array_by_dnode+619>:   mov    %rax,-0x98(%rbp)
0xffffff7f80ea99d9 <dmu_write_uio_dnode+121>:   mov    %eax,-0x38(%rbp)
0xffffff7f80ea993c <dmu_write_uio_dbuf+108>:    mov    %eax,-0x3c(%rbp)
0xffffff7f80f5494b <zfs_write+2923>:    mov    %eax,-0xa0(%rbp)
0xffffff7f80f605e8 <zfs_vnop_write+88>: mov    %eax,-0x10(%rbp)
0xffffff8000311d62 <VNOP_WRITE+18>:     mov    %eax,%ebx
0xffffff8000308139 <vn_write+617>:      mov    %eax,%r14d

But as it is a dmu_hold, it should be short lived.

(gdb) countpcs dbuf_create+59
occurred 1578 times in log (39% of records)

@lundman
Copy link
Contributor Author

lundman commented Jun 5, 2013

Ok, seems to be just arc running wild. If I take arc_max / 2, it still happens, but arc_mac / 4 has completed iozone.

My VM has 2GB so that is quite conservative (250MB arc?)

@lundman
Copy link
Contributor Author

lundman commented Jun 27, 2013

I would seem that OSX keeps the memory allocations per size. So alloc.256, alloc.512, alloc.1024 alloc.4096 etc.

Even though we are staying under our self-imposed limit, we can in fact run out of a specific size well before than. Usually 512 or 1024. We may need to keep internal tally of the sizes as well. (Or explore a way to ask Darwin for those statistics).

Currently all ZFS memory is kmem, can Darwin do Linux style vmem allocations as well? Especially now that we are linking with IOkit.

@BjoKaSH
Copy link
Contributor

BjoKaSH commented Jun 27, 2013

I had a look into this, and I think I have an idea how to fix this but I need to check a bit more. Hopefully I can say more over the weekend.

@ghost ghost assigned BjoKaSH Jun 27, 2013
@lundman
Copy link
Contributor Author

lundman commented Jul 24, 2013

Basically, we don't appear to be reclaiming arc as we should. We are still using the arc from ZOL, which relies on SPL doing a reclaim callback. Clearly we don't do this either, so current thoughts are to also port over the FreeBSD arc.c to the project.

I added "total_allocated" to SPL layer (since all our memory allocations go through there) as well as dumping the arc every second, and the graphs look like this.

Note the very first drop way over on the left axis, that is arc having warmed up and doing first reclaim. The top is pretty much where the line should be, that we never go over.

screen shot 2013-07-24 at 14 39 23

@lundman
Copy link
Contributor Author

lundman commented Jul 25, 2013

Ok, I'm 40% sure this might have something to do with it;

void kmem_cache_reap_now(kmem_cache_t *cache)
{
}

@lundman
Copy link
Contributor Author

lundman commented Jul 31, 2013

Ok, kmem_cache_reap_now() is not really part of the reclaim. It is handled with calls to arc_adjust() which calls arc_evict(). We were in fact evicting fine, and all 6 lists (mru/mfu/ghosts * 2) went to zero. It was zfs.arc_meta_used which climbed upwards. Most likely due to a sa_buf_hold() call never released.

At the moment, ARC appears to behave as expected, keeping a 2GB machine up by evicting.

@lundman
Copy link
Contributor Author

lundman commented Aug 2, 2013

The reclaim thread is working surprisingly well and probably should not have waited this long to implement it.

We do have a lingering issue with ARC, sometimes zfs.arc_meta_used will climb way out of control. We are trying to evict, so presumably we are leaking buffers, or similar. It can trigger the panic for this Issue.

@lundman
Copy link
Contributor Author

lundman commented Aug 6, 2013

With the latest commits 3c65e48 we have managed to clean up most issues with ARC. It no longer balloons. We most likely has one sa_buf_free() missed in znode, and the reclaim thread would stop. Changing cv_signal() to cv_broadcast() and it solved that issue. We should investigate cv_signal().

For now, even 2 rsyncs from root will complete, with ARC staying level. We may need to revisit ARC work in future for tweaks, but I consider the original issue to be avoided.

@lundman lundman closed this as completed Aug 6, 2013
lundman pushed a commit that referenced this issue Nov 12, 2015
…by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com>

The begin record contains useful information that the caller might
want to have the access to, so that it can be used to derive values
for 'snapname' and 'origin' parameters.
New lzc_receive_with_header function is added.

Closes #30
rottegift pushed a commit to rottegift/zfs that referenced this issue Nov 12, 2015
…by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com>

The begin record contains useful information that the caller might
want to have the access to, so that it can be used to derive values
for 'snapname' and 'origin' parameters.
New lzc_receive_with_header function is added.

Closes openzfsonosx#30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants