Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux load average increased since v,2.2.0 #15453

Closed
jnyilas opened this issue Oct 25, 2023 · 14 comments
Closed

Linux load average increased since v,2.2.0 #15453

jnyilas opened this issue Oct 25, 2023 · 14 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression

Comments

@jnyilas
Copy link

jnyilas commented Oct 25, 2023

System information

Type Version/Name
Distribution Name OpenSUSE Leap
Distribution Version 15.5
Kernel Version 5.14.21
Architecture x86_64
OpenZFS Version 2.2.0-1

Describe the problem you're observing

I recently upgraded to v,2.2.0 on my workstation. Since this upgrade, I noticed my load average has increased +3 from what it was before. Since Linux includes uninterruptible tasks in the load calculation, I am pretty sure this is it, as I have three (new) uninterruptible sleep threads:

[jnyilas@photon:~]  > uptime
 16:12:25  up   5:20,  3 users,  load average: 3.29, 3.32, 3.19

[jnyilas@photon:~] 2 > ps aux | grep -w D\<
root       880  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]
root       978  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]
root       979  0.0  0.0      0     0 ?        D<   10:52   0:00 [vdev_autotrim]

Any ideas with what changed on the autotrim code path? I have been using this feature for quite a long time.
I am not seeing any abnormal latency issues nor excessive autotrim ops on my two zpools, and performance remains good despite the increased load average.

[jnyilas@photon:~] > zpool iostat -w

photon       total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim  rebuild
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0    168  2.47K     37  1.65K      2     17      0
511ns           0      0      0      0  2.90K  10.2K    603  11.3K     74    666      0
1us             0      0      0      0  2.57K  5.27K    500  18.5K    137    134      0
2us             0      0      0      0    411  2.09K     52  5.39K     33     87      0
4us             0      0      0      0     54    350      1  4.40K      8      5      0
8us             0      0      0      0      5      6      9  8.51K      7      0      0
16us            2  18.7K     26  82.0K      3     15     16  19.7K     24      2      0
32us            2  42.1K    141   104K     27      9     20  25.9K     48      0      0
65us          114  52.8K  2.25K  44.2K     58      7     52  35.8K     73      0      0
131us       4.92K  62.1K  57.2K  20.0K     51     18    115  48.9K    143      0      0
262us       2.63K  37.2K   402K  7.14K     61     26    158  22.9K    352      0      0
524us       1.14K  2.99K  38.8K    785     23     30    165  1.88K    608    352      0
1ms         1.41K  3.00K    617  1.73K      2     50    126  2.25K  1.03K      0      0
2ms         1.76K  2.29K     36    247      1     60     33  2.00K  1.61K    335      0
4ms         2.47K  1.76K      6     46      0    114     49  1.56K  2.40K    699      0
8ms         3.77K  44.4K    127  8.29K      0    596     49  35.6K  3.52K  19.3K      0
16ms         414K  1.40K      0      0      0    485      0    757   417K    822      0
33ms        59.5K      5      0      0      0      0      0      5  56.3K      2      0
67ms        5.19K      0      0      0      0      0      0      0  5.17K      0      0
134ms       3.89K      0      0      0      0      0      0      0  3.89K      0      0
268ms         227      0      0      0      0      0      0      0    224      0      0
536ms           0      0      0      0      0      0      0      0      0      0      0
1s              0      0      0      0      0      0      0      0      0      0      0
2s              0      0      0      0      0      0      0      0      0      0      0
4s              0      0      0      0      0      0      0      0      0      0      0
8s              0      0      0      0      0      0      0      0      0      0      0
17s             0      0      0      0      0      0      0      0      0      0      0
34s             0      0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0      0
---------------------------------------------------------------------------------------

photon-dr    total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim  rebuild
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0     11    132      0      0      3      0      0
511ns           0      0      0      0     73    851      1    354    182      0      0
1us             0      0      0      0    298    607      4  2.18K    731      0      0
2us             0      0      0      0     37    286      0    395    463      0      0
4us             0      0      0      0      0     26      0     28     31      0      0
8us             0      0      0      0      0      0      0     39      0      0      0
16us            0      5      0      5      0      6      0     76     16      0      0
32us            1      8      1      8      0      0      0    186      3      0      0
65us           36     12     79     35      0      0      0    442      6      0      0
131us         187  1.55K    378  9.54K      0      0      0  1.37K     26      0      0
262us         172  3.48K    309  10.4K      0      0      0  2.07K     59      0      0
524us         196  5.19K    393  3.79K      0      0      0  4.29K    161      0      0
1ms            12  8.09K    674    458      0      0      1  7.37K     10      0      0
2ms            26  4.90K  3.86K     16      0      0      0  2.95K     33      0      0
4ms           191    282  11.4K     37      0      0      0    178    184      0      0
8ms           608    374  25.2K     83      0      0      0    299    485      0      0
16ms        1.39K    360  41.5K     79      0      0      1    263  1.16K      0      0
33ms        1.11K    239   307K     28      0      0      7    203  1.04K      0      0
67ms        2.51K      0  1.79K      0      2      0     18      0  2.79K      0      0
134ms        307K     64    638     50      1      0      0     14   313K      0      0
268ms       29.9K      6     50      6      0      0      0      0  23.7K      0      0
536ms       14.8K      0     20      0      0      0      0      0  14.1K      0      0
1s          10.1K      0     11      0      0      0      0      0  9.92K      0      0
2s          8.38K      0     11      0      0      0      0      0  8.28K      0      0
4s          11.0K      0      0      0      0      0      0      0  11.0K      0      0
8s          4.85K      0      0      0      0      0      0      0  4.82K      0      0
17s           974      0      0      0      0      0      0      0    968      0      0
34s             0      0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0      0
---------------------------------------------------------------------------------------
[jnyilas@photon:~] > zpool iostat -l 5
              capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim  rebuild
pool        alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait   wait
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G     26     14  2.43M   234K   14ms    1ms  202us  236us    5us  464us  365us    1ms   15ms    6ms      -
photon-dr    313G   615G     20      1  16.6M  15.1K  365ms    1ms   21ms  539us  477us  739ns   34ms    1ms  361ms      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G      0     22      0   534K      -    1ms      -  155us      -  336ns      -    1ms      -      -      -
photon-dr    313G   615G      0      0      0      0      -      -      -      -      -      -      -      -      -      -      -
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

[jnyilas@photon:~] > zpool iostat -q
              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write  rebuildq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
photon      45.2G   411G     26     14  2.41M   234K      0      0      0      0      0      0      0      0      0      0      0      0      0      0
photon-dr    313G   615G     20      1  16.4M  15.0K      0      0      0      0      0      0      0      0      0      0      0      0      0      0

Describe how to reproduce the problem

look for uninterruptible zfs threads contributing to load average increase

Include any warning/errors/backtraces from the system logs

N/A

@jnyilas jnyilas added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 25, 2023
@behlendorf behlendorf added the Type: Regression Indicates a functional regression label Oct 25, 2023
@stuartthebruce
Copy link

stuartthebruce commented Oct 28, 2023

I have also seen this on an idle RL8.8 system with autotrim enabled after upgrading from 2.1.12 to 2.2.0 and rebooting,

[root@repo2 ~]# uptime
 12:54:01 up 11 min,  1 user,  load average: 1.00, 0.93, 0.58

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  on        local

[root@repo2 ~]# ps aux | grep -w D\<
root        1264  0.0  0.0      0     0 ?        D<   12:42   0:00 [vdev_autotrim]

[root@repo2 ~]# cat /etc/redhat-release 
Rocky Linux release 8.8 (Green Obsidian)

[root@repo2 ~]# uname -a
Linux repo2 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Wed Sep 20 15:55:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

[root@repo2 ~]# cat /sys/module/zfs/version 
2.2.0-1

In addition, disabling autotrim did not get rid of the vdev_autotrim kernel thread,

[root@repo2 ~]# zpool set autotrim=off repo2

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  off       default

[root@repo2 ~]# ps aux | grep -w D\<
root        1264  0.0  0.0      0     0 ?        D<   12:42   0:00 [vdev_autotrim]

However, rebooting does get rid of that that kernel thread and restores the system to a load of less than 1,

[root@repo2 ~]# uptime
 13:03:02 up 1 min,  2 users,  load average: 0.15, 0.12, 0.05

[root@repo2 ~]# ps aux | grep -w D\<
root        3276  0.0  0.0  12144  1100 pts/1    S+   13:03   0:00 grep --color=auto -w D<

[root@repo2 ~]# zpool get autotrim repo2
NAME   PROPERTY  VALUE     SOURCE
repo2  autotrim  off       default

This then begs the question whether autotrim is working in 2.2.0?

@stuartthebruce
Copy link

Note, there are no extra lines in dbgmsg after enabling trim, which starts a D-state kernel thread,

[root@repo2 ~]# cat /proc/spl/kstat/zfs/dbgmsg
...
1698523656   spa_history.c:294:spa_history_log_sync(): command: zpool set autotrim=on repo2

[root@repo2 ~]# ps aux | grep -w D\<
root        7193  0.0  0.0      0     0 ?        D<   13:07   0:00 [vdev_autotrim]

@jnyilas
Copy link
Author

jnyilas commented Oct 29, 2023

It seems to me that autotrim is working with v,2.2.0, but differently than before. At least the request counts (-r) and latency (-w) as indicated by zpool-iostat(8) are showing that the disk trim IO is not blocking. Now, why it appears that the kernel thread is waiting for an IO to finish needs further investigation.

@jnyilas
Copy link
Author

jnyilas commented Oct 30, 2023

All three have the same stack:

# cat /proc/880/stack 
[<0>] cv_wait_common+0x117/0x130 [spl]
[<0>] vdev_autotrim_wait_kick+0x63/0xb0 [zfs]
[<0>] vdev_autotrim_thread+0x6d8/0x7c0 [zfs]
[<0>] thread_generic_wrapper+0x59/0x70 [spl]
[<0>] kthread+0x156/0x180
[<0>] ret_from_fork+0x22/0x30

@behlendorf
Copy link
Contributor

This looks like it's related to the change in #12194 which does adjust how autotrim works slightly. @jxdking could you look in to this.

@mailinglists35
Copy link

mailinglists35 commented Nov 9, 2023

Getting the same behaviour after 2.1 -> 2.2 upgrade.

OL9, default uek kernel

$ ps -axo state,pid,command | grep ^D
D   92094 [vdev_autotrim]
D   92095 [vdev_autotrim]

$ top -b 1 | head
top - 17:13:53 up 19:29,  3 users,  load average: 2.20, 2.16, 3.23
Tasks: 354 total,   1 running, 352 sleeping,   1 stopped,   0 zombie
%Cpu0  :  0.0 us,  6.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
GiB Mem :     15.6 total,      8.9 free,      3.5 used,      3.8 buff/cache
GiB Swap:     40.0 total,     39.8 free,      0.2 used.     12.1 avail Mem

$ systool -avm zfs|grep trim
    l2arc_trim_ahead    = "0"
    zfs_trim_extent_bytes_max= "134217728"
    zfs_trim_extent_bytes_min= "32768"
    zfs_trim_metaslab_skip= "0"
    zfs_trim_queue_limit= "10"
    zfs_trim_txg_batch  = "32"
    zfs_vdev_trim_max_active= "2"
    zfs_vdev_trim_min_active= "1"

$ echo t |sudo tee /proc/sysrq-trigger
t

$ journalctl -kb --no-host
[...]
Nov 09 17:08:03 kernel:  </TASK>
Nov 09 17:08:03 kernel: task:vdev_autotrim   state:D stack:    0 pid:92094 ppid:     2 flags:0x00004000
Nov 09 17:08:03 kernel: Call Trace:
Nov 09 17:08:03 kernel:  <TASK>
Nov 09 17:08:03 kernel:  __schedule+0x224/0x588
Nov 09 17:08:03 kernel:  schedule+0x5f/0xdf
Nov 09 17:08:03 kernel:  cv_wait_common+0x106/0x140 [spl]
Nov 09 17:08:03 kernel:  ? finish_wait+0x90/0x89
Nov 09 17:08:03 kernel:  vdev_autotrim_thread+0x65e/0x950 [zfs]
Nov 09 17:08:03 kernel:  ? set_next_task_fair+0x30/0xdd
Nov 09 17:08:03 kernel:  ? vdev_trim_thread+0x460/0x460 [zfs]
Nov 09 17:08:03 kernel:  thread_generic_wrapper+0x59/0x70 [spl]
Nov 09 17:08:03 kernel:  ? spl_taskq_fini+0x80/0x80 [spl]
Nov 09 17:08:03 kernel:  kthread+0x127/0x144
Nov 09 17:08:03 kernel:  ? set_kthread_struct+0x60/0x52
Nov 09 17:08:03 kernel:  ret_from_fork+0x22/0x2d
Nov 09 17:08:03 kernel:  </TASK>
Nov 09 17:08:03 kernel: task:vdev_autotrim   state:D stack:    0 pid:92095 ppid:     2 flags:0x00004000
Nov 09 17:08:03 kernel: Call Trace:
Nov 09 17:08:03 kernel:  <TASK>
Nov 09 17:08:03 kernel:  __schedule+0x224/0x588
Nov 09 17:08:03 kernel:  schedule+0x5f/0xdf
Nov 09 17:08:03 kernel:  cv_wait_common+0x106/0x140 [spl]
Nov 09 17:08:03 kernel:  ? finish_wait+0x90/0x89
Nov 09 17:08:03 kernel:  vdev_autotrim_thread+0x65e/0x950 [zfs]
Nov 09 17:08:03 kernel:  ? set_next_task_fair+0x30/0xdd
Nov 09 17:08:03 kernel:  ? vdev_trim_thread+0x460/0x460 [zfs]
Nov 09 17:08:03 kernel:  thread_generic_wrapper+0x59/0x70 [spl]
Nov 09 17:08:03 kernel:  ? spl_taskq_fini+0x80/0x80 [spl]
Nov 09 17:08:03 kernel:  kthread+0x127/0x144
Nov 09 17:08:03 kernel:  ? set_kthread_struct+0x60/0x52
Nov 09 17:08:03 kernel:  ret_from_fork+0x22/0x2d
Nov 09 17:08:03 kernel:  </TASK>

@giacomocariello
Copy link

Same here, NixOS 23.11, brand new.

@Gendra13
Copy link

This looks like it's related to the change in #12194 which does adjust how autotrim works slightly. @jxdking could you look in to this.

@behlendorf Is there any new insight into this problem? Because, unfortunately it doesn't look like jxdking has been active in the last time.

When I did my own tests, I was also able to reproduce this behavior both on Ubuntu 22.04 (with zfs-2.2.2 dkms manually installed) as well as both on Ubuntu 23.10 and the pre-release version of 24.04 which both ship with zfs-2.2.x build-in.

It looks like the load is permanently increased by one for each pool that has autotrim set to on even if there is no other activity on the drives.

@jnyilas
Copy link
Author

jnyilas commented Dec 31, 2023

It looks like the load is permanently increased by one for each pool that has autotrim set to on even if there is no other activity on the drives.

My interpretation of the data is different. The # of threads in uninterruptible sleep is actually the number of TRIMable underlying storage devices across all zpools. I have 2 zpools built on three NVME vdevs. My load is +3, and I can see one auto_trim thread in D state for each vdev.

In other words, the load will scale with the number of underlying storage devices capable of TRIM, so this is a bigger problem, not a smaller one.

@maxpoulin64
Copy link

Also just noticed this on my system. I have two pools that are on SSDs, autotrim is enabled, and my load will only ever go down to 2.0, matching the other's descriptions of having one [vdev_autotrim] per trimmable pool. The HDD-backed pool is not affected since HDDs don't TRIM.

Fully updated ArchLinux and ZFS compiled from the AUR (no archzfs).

I might try building with that commit reverted and see if it still does it.

@jnyilas
Copy link
Author

jnyilas commented Jan 9, 2024

I have two pools that are on SSDs, autotrim is enabled, and my load will only ever go down to 2.0, matching the other's descriptions of having one [vdev_autotrim] per trimmable pool. The HDD-backed pool is not affected since HDDs don't TRIM.

No. The correct description is that you have two total TRIMable vdevs assigned to zpool.

The number of zpools is not a factor. The number of vdevs participating in auto_trim each get an uninterruptible thread. That is the deciding factor driving the load increase. This is why this is a bigger problem than it appears. And, the bigger and more complex your system is, the worse this is. The load will scale with the number of vdevs participating in auto_trim.

E.G.
Here is just one zpool consisting of 4 vdev SSDs, and the zpool set to auto_trim.
Guess what the load is if we have 4 threads in uninterruptible sleep?

# ps aux | grep D\[\<]
root     19937  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19938  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19939  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]
root     19940  0.0  0.0      0     0 ?        D<   23:48   0:00 [vdev_autotrim]

# zpool status trimtest
  pool: trimtest
 state: ONLINE
config:

	NAME        STATE     READ WRITE CKSUM
	trimtest    ONLINE       0     0     0
	  sda6      ONLINE       0     0     0
	  sda7      ONLINE       0     0     0
	  sda8      ONLINE       0     0     0
	  sda9      ONLINE       0     0     0

# w
 00:01:31 up 14:37,  1 user,  load average: 4.25, 3.92, 2.51 <<<<

QED

@jxdking
Copy link
Contributor

jxdking commented Jan 16, 2024

Here is the attempt to fix the issue #15781 .

My test environment was teared down months ago in order to work on other projects. I have no way to test it. Please give feedback whether it fixes the issue. Thanks.

@Gendra13
Copy link

Thank you very much.

I did a quick test on my testing environment running Ubuntu 22.04 LTS and can indeed confirm that the high load average is fixed and the uninterruptible threads are gone.

@behlendorf
Copy link
Contributor

Fixed by #15781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

8 participants