Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU thread 100% #47

Closed
opalenet-sysadmin opened this issue May 24, 2022 · 11 comments · Fixed by #48
Closed

CPU thread 100% #47

opalenet-sysadmin opened this issue May 24, 2022 · 11 comments · Fixed by #48

Comments

@opalenet-sysadmin
Copy link

opalenet-sysadmin commented May 24, 2022

Hi,
Just updated the OS and restarted my Lemur Pro and the daemon uses 100% of one of the CPU threads.
I let it run 15 minutes then stopped the service.
Our company owns 5 identical machines, they all had the same issue with execsnoop-bpfcc

/usr/bin/system76-scheduler daemon
_ [execsnoop-bpfcc] < defunct >

System76 Lemur Pro/Lemur Pro, BIOS 2021-07-20_93c2809 07/20/2021

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

@skrap
Copy link

skrap commented May 24, 2022

Same issue here. perf top on the spinning process yields:

Samples: 36K of event 'cycles', Event count (approx.): 36120382738
Overhead  Command          Shared Object       Symbol
  30.65%  system76-schedu  [kernel.kallsyms]   [k] syscall_exit_to_user_mode
  18.02%  system76-schedu  [kernel.kallsyms]   [k] syscall_return_via_sysret
  15.23%  system76-schedu  [kernel.kallsyms]   [k] __entry_text_start
   3.36%  system76-schedu  libc.so.6           [.] __read
   3.31%  system76-schedu  [kernel.kallsyms]   [k] __fget_light
   2.65%  system76-schedu  [kernel.kallsyms]   [k] apparmor_file_permission
   2.33%  system76-schedu  [kernel.kallsyms]   [k] pipe_read
   1.89%  system76-schedu  [kernel.kallsyms]   [k] fput_many
   1.54%  system76-schedu  [kernel.kallsyms]   [k] mutex_lock
   1.45%  system76-schedu  [kernel.kallsyms]   [k] exit_to_user_mode_prepare
   1.39%  system76-schedu  [kernel.kallsyms]   [k] entry_SYSCALL_64_safe_stack
   1.36%  system76-schedu  [kernel.kallsyms]   [k] new_sync_read
   1.36%  system76-schedu  [kernel.kallsyms]   [k] entry_SYSCALL_64_after_hwframe
   1.25%  system76-schedu  [kernel.kallsyms]   [k] mutex_unlock
   0.99%  system76-schedu  [kernel.kallsyms]   [k] ksys_read
   0.97%  system76-schedu  [kernel.kallsyms]   [k] vfs_read
   0.86%  system76-schedu  [kernel.kallsyms]   [k] do_syscall_64
   0.75%  system76-schedu  system76-scheduler  [.] 0x00000000000c10f8
   0.58%  system76-schedu  [kernel.kallsyms]   [k] aa_file_perm
   0.57%  system76-schedu  [kernel.kallsyms]   [k] __x64_sys_read
   0.56%  system76-schedu  [kernel.kallsyms]   [k] __rcu_read_lock
   0.51%  system76-schedu  libc.so.6           [.] __pthread_disable_asynccancel
   0.46%  system76-schedu  [kernel.kallsyms]   [k] security_file_permission
   0.45%  system76-schedu  [kernel.kallsyms]   [k] fsnotify_perm.part.0
   0.42%  system76-schedu  [kernel.kallsyms]   [k] syscall_enter_from_user_mode
   0.39%  system76-schedu  [kernel.kallsyms]   [k] __static_call_text_end
   0.33%  system76-schedu  [kernel.kallsyms]   [k] __cond_resched
   0.30%  system76-schedu  [kernel.kallsyms]   [k] __fdget_pos
   0.29%  system76-schedu  [kernel.kallsyms]   [k] fpregs_assert_state_consistent
   0.27%  system76-schedu  [kernel.kallsyms]   [k] rw_verify_area
   0.27%  system76-schedu  [kernel.kallsyms]   [k] __rcu_read_unlock
   0.20%  system76-schedu  libc.so.6           [.] __memmove_avx_unaligned_erms
   0.16%  system76-schedu  [kernel.kallsyms]   [k] fput
   0.16%  system76-schedu  libc.so.6           [.] __pthread_enable_asynccancel
   0.14%  system76-schedu  [kernel.kallsyms]   [k] kill_fasync
   0.14%  system76-schedu  [kernel.kallsyms]   [k] iov_iter_init
   0.11%  system76-schedu  system76-scheduler  [.] 0x00000000000c10fe
   0.10%  system76-schedu  libc.so.6           [.] __memchr_avx2
   0.09%  system76-schedu  system76-scheduler  [.] 0x000000000012eef5
   0.09%  system76-schedu  system76-scheduler  [.] 0x00000000000c1379
   0.08%  system76-schedu  system76-scheduler  [.] 0x00000000000c110c
   0.08%  system76-schedu  system76-scheduler  [.] 0x00000000000c1183
   0.08%  system76-schedu  system76-scheduler  [.] 0x0000000000438b93
   0.08%  system76-schedu  system76-scheduler  [.] 0x00000000000c12f7
   0.07%  system76-schedu  system76-scheduler  [.] 0x00000000000c11a0
   0.07%  system76-schedu  system76-scheduler  [.] 0x00000000000c12f1

@mmstick
Copy link
Member

mmstick commented May 24, 2022

Try installing #48

@opalenet-sysadmin
Copy link
Author

Hi,
I compiled branch execsnoop with rust (cargo build --release) and now the service runs. No more high load.
Yet, the subprocess is still defunc and a zombie.

84083 ? SNsl 0:00 /usr/bin/system76-scheduler daemon
84086 ? ZN 0:01 \ _ [execsnoop-bpfcc] < defunct>

@mmstick
Copy link
Member

mmstick commented May 24, 2022

Check the latest change. You may also want to see why execsnoop-bpfcc is failing on this system.

@opalenet-sysadmin
Copy link
Author

No more defunct subprocess, but indeed execsnoop-bpfcc fails to start.

`● com.system76.Scheduler.service - Automatically configure CPU scheduler for responsiveness on AC
Loaded: loaded (/lib/systemd/system/com.system76.Scheduler.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2022-05-24 13:27:59 EDT; 25s ago
Main PID: 91883 (system76-schedu)
Tasks: 3 (limit: 149999)
Memory: 1.6M
CPU: 1.662s
CGroup: /system.slice/com.system76.Scheduler.service
└─91883 /usr/bin/system76-scheduler daemon

May 24 13:27:59 fbriand.opale.net systemd[1]: Starting Automatically configure CPU scheduler for responsiveness on AC...
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: INFO system76_scheduler: starting daemon service
May 24 13:27:59 fbriand.opale.net systemd[1]: Started Automatically configure CPU scheduler for responsiveness on AC.
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: INFO system76_scheduler::config: /usr/share/system76-scheduler/assignments/default.ron:
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: (Assignment(CpuPriority(-9), BestEffort(PriorityLevel(0))), {"easyeffects"})
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: (Assignment(CpuPriority(-5), BestEffort(PriorityLevel(4))), {"Xorg", "amsynth", "gnome-shell", "kwin", "steam", "sway", >
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: (Assignment(CpuPriority(0), Standard), {"dbus", "sshd", "systemd"})
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: (Assignment(CpuPriority(9), Idle), {"ModemManager", "NetworkManager", "accounts-daemon", "acpid", "automount", "avahi-da>
May 24 13:27:59 fbriand.opale.net system76-scheduler[91883]: (Assignment(CpuPriority(19), Idle), {"FAHClient", "FAHCoreWrapper", "apt", "apt-get", "boinc", "c++", "cargo", "clang", >
 `

` execsnoop-bpfcc

In file included from :2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:5:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:41:9: warning: 'HAVE_BUILTIN_BSWAP32' macro redefined [-Wmacro-redefined]
#define HAVE_BUILTIN_BSWAP32
^
:4:9: note: previous definition is here
#define HAVE_BUILTIN_BSWAP32 1
^
In file included from :2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:5:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:42:9: warning: 'HAVE_BUILTIN_BSWAP64' macro redefined [-Wmacro-redefined]
#define HAVE_BUILTIN_BSWAP64
^
:5:9: note: previous definition is here
#define HAVE_BUILTIN_BSWAP64 1
^
In file included from :2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:5:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:43:9: warning: 'HAVE_BUILTIN_BSWAP16' macro redefined [-Wmacro-redefined]
#define HAVE_BUILTIN_BSWAP16
^
:3:9: note: previous definition is here
#define HAVE_BUILTIN_BSWAP16 1
^
3 warnings generated.
cannot attach kprobe, probe entry may not exist
Traceback (most recent call last):
File "/usr/sbin/execsnoop-bpfcc", line 229, in
b.attach_kprobe(event=execve_fnname, fn_name="syscall__execve")
File "/usr/lib/python3/dist-packages/bcc/init.py", line 683, in attach_kprobe
raise Exception("Failed to attach BPF program %s to kprobe %s" %
Exception: Failed to attach BPF program b'syscall__execve' to kprobe b'sys_execve'`

@mmstick
Copy link
Member

mmstick commented May 24, 2022

Would you happen to be using a non-default kernel?

@opalenet-sysadmin
Copy link
Author

Linux fbriand.opale.net 5.17.5-76051705-generic #202204271406165150484022.04~63e51bd SMP PREEMPT Mon May 2 15: x86_64 x86_64 x86_64 GNU/Linux

@cmm
Copy link
Contributor

cmm commented Apr 29, 2023

Sorry about necro-bumping, but with 2.0.0 I'm seeing periodic CPU spikes of the scheduler when run in execsnoop mode. According to (my reading of) strace, there is nothing "wrong", the daemon is just choking on the volume of new execs. This seems to happen when certain shell scripts run (there's a lot of short-lived process creation going on there, like math done with ex and that sort of mildly-stupid but totally legitimate stuff).

The spikes wear off by themselves, but while they happen the daemon is non-responsive enough that dbus calls from the desktop to it time out.

And regardless of those spikes, the daemon in execsnoop mode just consumes illogical amounts of CPU, like 5 minutes an hour, according to Systemd. :(

I've turned execsnoop off now, and things are much better. This is on an i5 lemp11 BTW, which is not exactly puny resource-wise.

I wonder if it would be more efficient to use execsnoop as just a trigger for the much more efficient process list refresh -- like refresh as soon as there were N (configurable) or more new execsnoop notifications, in addition to doing that every "refresh-rate" seconds.

(BTW "refresh-rate" should be "refresh-interval"...)

@mmstick
Copy link
Member

mmstick commented Apr 30, 2023

@cmm A process list refresh should be more time-consuming, and execsnoop's output is required to identify when a process was exec'd to detect if it was launched from nice or ionice.

I did some profiling with cargo-flamegraph and heaptrack after enabling debug symbols and lowering the opt-level to 1, and saw that most of the time was being spent in the get_pid function, and in some heap allocations that could be reused. I'll have some commits for this in the scheduler branch soon.

@mmstick
Copy link
Member

mmstick commented Apr 30, 2023

Changes have been pushed. Try out the scheduler branch.

@cmm
Copy link
Contributor

cmm commented Apr 30, 2023

indeed, CPU spikes are gone and overall CPU consumption looks sane, thanks!
it seems that the pipewire priority boosting logic broke, though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants