-
Notifications
You must be signed in to change notification settings - Fork 178
spl_kmem_cache uses 100% CPU; processes blocked in spl_kmem_cache_alloc #210
Comments
@akorn Could you try resizing spl_kmem_cache taskq, this was one significant changed between 0.6.0.90 and 0.6.0.91. Basically, these allocations used to be handled by the kworker threads (1 per cpu) but were moved to a single taskq (1 thread by default). So it's possible this might be a bottleneck on a larger or busy system. You should be able to make this change to the DKMS sources on your Ubuntu system and just have DKMS rebuild everything. It increases the numbers of threads to 1 per cpu and effectively removes the queue depth limit form the taskq. diff --git a/module/spl/spl-kmem.c b/module/spl/spl-kmem.c index cc5961e..1f8839f 100644 --- a/module/spl/spl-kmem.c +++ b/module/spl/spl-kmem.c @@ -2409,7 +2409,8 @@ spl_kmem_init(void) init_rwsem(&spl_kmem_cache_sem); INIT_LIST_HEAD(&spl_kmem_cache_list); spl_kmem_cache_taskq = taskq_create("spl_kmem_cache", - 1, maxclsyspri, 1, 32, TASKQ_PREPOPULATE); + 100, maxclsyspri, 1, INT_MAX, + TASKQ_PREPOPULATE | TASKQ_THREADS_CPU_PCT); spl_register_shrinker(&spl_kmem_cache_shrinker); |
Thanks; I applied this change and so far things are looking better. I'll let you know how the system fares if left running for a few days. (It's a Debian sid system, not Ubuntu, btw. :) |
Sorry, the patch didn't help after all:
Sometimes a third spl_kmem_cache thread appears in top, with a CPU usage of ~7%, but it's gone quickly. However, sorting by CPU time I see:
So apparently the other two rthreads use comparable CPU time and are simply not spinning at this precise moment. Some sysrq-w output:
|
Oh, it was too long. Do you need the rest?
|
@akorn Can you please apply 27d26338ddc37ee9ee6a9a53a78873398da80b54 to your spl source and set the new module option |
Thanks, will try (likely tomorrow). Is this also expected to help avoid blocked processes like the above? |
OK, I applied the patch and set the module option. I'm not sure what's happening. spl_kmem_cache isn't using much CPU anymore, but the txg_sync thread is:
This has now been going on for hours. sysrq-w says:
Meanwhile, nothing is happening in the pool:
(I know the pool has an unusual layout; this has historical reasons.) iostat also doesn't report any disk activity:
(I don't know why it only reports ~50% of idle; I see no activity even if I include all block devices.) The two rsync processes appear well and truly stuck. |
@akorn that's encouraging news about the spl_kmem_cache cpu usage, it now appears your hitting one of the internal ZFS I/O throttles. Can you post in the contents of |
Sure:
|
That's why. ZFS believes (perhaps incorrectly) that you have less reclaimable memory than the maximum sized txg so it starts throttling. Something about your workload is probably triggering this. You can try setting the Honestly, long term these internal throttles are something we'll probably want to remove from ZoL. The evidence is mounting that it's probably preferable to rely on the existing Linux throttling mechanisms. |
With zfs_write_limit_shift=4 I had the same behaviour. Now, with 5, I can reliably cause the kernel to panic by rsyncing a specific fs to the box. Unfortunately I have so far been unable to capture the panic message in its entirety because it's not written to netconsole and I couldn't get ramoops to work. The end of the stack trace is identical to part of the above:
I'm typing the rest from a physical screenshot taken with a camera, so there may be typos:
It then doesn't always boot on the first try, but I'm still looking into that. I'll try to have a serial console hooked up so that we may capture all of the error output. |
Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#210
The original root causes of this issue have been addressed in the latest master. If you grab the latest source and are still having problems lets open new issue on the ZFS tracker for this. This issue is getting a bit to jumbled. |
Hi,
since I upgraded to 0.6.0.92 from 0.6.0.91, I see this in top:
(I have four cores and one of them is continually busy with spl_kmem_cache.)
Additionally, some operations take ridiculously long (if they finish at all).
I have some sysrq-w output:
The text was updated successfully, but these errors were encountered: