Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC consuming 38% CPU for no reason #6531

Closed
brendangregg opened this issue Aug 18, 2017 · 9 comments
Closed

ARC consuming 38% CPU for no reason #6531

brendangregg opened this issue Aug 18, 2017 · 9 comments
Labels
Component: Memory Management kernel memory management Status: Stale No recent activity for issue

Comments

@brendangregg
Copy link

brendangregg commented Aug 18, 2017

This is a production system that has ZFS installed, but is not using ZFS. No pools, datasets, or ARC buffers.

It has suffered a performance loss as ZFS was consuming 38% CPU system-wide. This is a 4 CPU system. Here is the bottom of a CPU flame graph (open in a new tab to zoom):

screen shot 2017-08-18 at 3 07 50 pm

Zooming into the arc_reclaim thread:

screen shot 2017-08-18 at 3 08 08 pm

This multilist work is new to me, but... do we really need to be selecting eviction lists using an entropy-based random function? Could this just be round robin?

There's also the shrink_zone CPU consumer on the right, which I'd guess is related to the ARC holding onto locks while in arc_adjust().

This system was almost running at memory capacity (about 99%), so I would think it is frequently entering arc_reclaim() and shrink_zone(). The workaround has been to reduce the Java heap size by a tiny bit.

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version Xenial
Linux Kernel 4.4.0-87-generic
Architecture x86_64
ZFS Version 0.6.5.6-0ubuntu16
SPL Version 0.6.5.6-0ubuntu4
@scotte
Copy link

scotte commented Aug 18, 2017

Here's what top was showing as we managed to catch one of the systems described above by @brendangregg in this state:

KiB Mem : 16431044 total,   215356 free, 15894868 used,   320820 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   190228 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
5014 www-data  20   0 18.837g 0.014t  52596 S 231.9 94.3 308:41.48 java
  51 root      20   0       0      0      0 R 100.0  0.0  42:06.98 kswapd1
 748 root      20   0       0      0      0 R  36.5  0.0  17:29.39 arc_reclaim
1250 root      20   0    9512    124      0 S   8.3  0.0   6:20.67 rngd

@bunder2015
Copy link
Contributor

Just curious, does this still happen on 0.7.x?

@gaurkuma
Copy link
Contributor

gaurkuma commented Aug 18, 2017

I think we should not even call arc_adjust when ARC is less than c_min

if (arc_size > arc_c_min)
arc_adjust()

@loli10K loli10K added the Component: Memory Management kernel memory management label Aug 19, 2017
@brendangregg
Copy link
Author

@gaurkuma right. At the very least, arc_adjust() should bail early (or not be called) if the arc_size is zero.

@ironMann
Copy link
Contributor

spa_get_random() switched to using a PRNG in 0.7. But I agree, the reclaim thread should be parked if there's nothing in the arc.

@richardelling
Copy link
Contributor

I believe the test should be if the spa_namespace_avl is empty in the arc_reclaim_thread() loop.

@loli10K
Copy link
Contributor

loli10K commented Aug 19, 2017

This is a production system that has ZFS installed, but is not using ZFS. No pools, datasets, or ARC buffers.

@brendangregg is zed (ZFS Event Daemon) running on this system? And if it is by any chance running can you still reproduce this issue when you kill/stop it?

@gaurkuma
Copy link
Contributor

@richardelling spa_namespace_avl check may not be sufficient because I can have unused pools. For e.g in our use case we create pools upfront on multiple nodes in a cluster and some of them may not even get used for quite some time.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

7 participants