-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic(cpu 1 caller 0xffffff80002438d8): "zalloc: \"kalloc.1024\" (100535 elements) retry fail 3, kfree_nop_count: 0"@/SourceCache/xnu/xnu-2050.7.9/osfmk/kern/zalloc.c:1826 #30
Comments
Booting with
So it is less of a leak, and more of a where is the arc reclaim situation. Other stacks are
But as it is a dmu_hold, it should be short lived.
|
Ok, seems to be just arc running wild. If I take arc_max / 2, it still happens, but arc_mac / 4 has completed iozone. My VM has 2GB so that is quite conservative (250MB arc?) |
I would seem that OSX keeps the memory allocations per size. So alloc.256, alloc.512, alloc.1024 alloc.4096 etc. Even though we are staying under our self-imposed limit, we can in fact run out of a specific size well before than. Usually 512 or 1024. We may need to keep internal tally of the sizes as well. (Or explore a way to ask Darwin for those statistics). Currently all ZFS memory is kmem, can Darwin do Linux style vmem allocations as well? Especially now that we are linking with IOkit. |
I had a look into this, and I think I have an idea how to fix this but I need to check a bit more. Hopefully I can say more over the weekend. |
Basically, we don't appear to be reclaiming arc as we should. We are still using the arc from ZOL, which relies on SPL doing a reclaim callback. Clearly we don't do this either, so current thoughts are to also port over the FreeBSD arc.c to the project. I added "total_allocated" to SPL layer (since all our memory allocations go through there) as well as dumping the arc every second, and the graphs look like this. Note the very first drop way over on the left axis, that is arc having warmed up and doing first reclaim. The top is pretty much where the line should be, that we never go over. |
Ok, I'm 40% sure this might have something to do with it;
|
Ok, At the moment, ARC appears to behave as expected, keeping a 2GB machine up by evicting. |
The reclaim thread is working surprisingly well and probably should not have waited this long to implement it. We do have a lingering issue with ARC, sometimes |
With the latest commits 3c65e48 we have managed to clean up most issues with ARC. It no longer balloons. We most likely has one sa_buf_free() missed in znode, and the reclaim thread would stop. Changing For now, even 2 rsyncs from root will complete, with ARC staying level. We may need to revisit ARC work in future for tweaks, but I consider the original issue to be avoided. |
…by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> The begin record contains useful information that the caller might want to have the access to, so that it can be used to derive values for 'snapname' and 'origin' parameters. New lzc_receive_with_header function is added. Closes #30
…by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> The begin record contains useful information that the caller might want to have the access to, so that it can be used to derive values for 'snapname' and 'origin' parameters. New lzc_receive_with_header function is added. Closes openzfsonosx#30
Running large iozone on HDD, ~200GB results in the following panic:
Which makes malloc.256 be a likely candidate for memory leaks.
I suspect that the panic at malloc.1024, from
zio_create()
just happens to be the call running into the trouble first, and maybe not the place we leak.The text was updated successfully, but these errors were encountered: