Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues in sync thread (eBPF plugin) #15174

Merged
merged 6 commits into from Jun 19, 2023

Conversation

thiagoftsm
Copy link
Contributor

@thiagoftsm thiagoftsm commented Jun 10, 2023

Summary

Fixes #15103

This PR is fixing issued found in Rocky Linux, more details will come after tests to be sure other distributions are not affected.

Test Plan
  1. Compile branch
  2. Enable sync thread before to run:
$ cd /etc/netdata
$ ./edit-config ebpf.d.conf
[ebpf programs]
    sync = yes
  1. Start netdata, because sync thread is enabled by default.
  2. Wait few minutes and query https://localhost:19999/api/v1/data?chart=mem.file_sync&after=-30 . You should have data for this thread.
Additional Information

This PR was tested on:

Hardware Linux distribution Kernel File
Bare metal Slackware Current 6.1.31 slackware_6_1.txt
Vagrant Arch Linux 6.3.6-arch1-1 arck_6_3.txt
Vagrant Ubuntu 22.04 5.15.0-69-generic ubuntu_5_15.txt
Vagrant Oracle 8.6 5.15.0-101.103.2.1.el8uek.x86_64 oracle_5_15.txt
Vagrant Oracle 9 5.14.0-284.11.1.el9_2.x86_64 oracle_5_14.txt
VMWare Rocky 9.2 5.14.0-284.11.1.el9_2.x86_64 rocky_5_14.txt
Vagrant Alma 9 5.14.0-284.11.1.el9_2.x86_64 alma_5_14.txt
Vagrant Corel 9 5.14.0-319.el9.x86_64 corel_5_14.txt
Vagrant Debian 11 5.10.0-22-amd64 debian_5_10.txt
Vagrant Ubuntu 20.04 5.4.0-146-generic ubuntu_5_4.txt
Qemu Slackware current 5.4.210 slackware_5_4.txt
Vagrant Alma 8.6 4.18.0-477.13.1.el8_8.x86_64 alma_4_18.txt
Qemu Slackware current 4.14.290 slackware_4_14.txt

You can get all logs using this link.

For users: How does this change affect me?

@thiagoftsm thiagoftsm marked this pull request as draft June 10, 2023 00:30
@github-actions github-actions bot added area/collectors Everything related to data collection collectors/ebpf labels Jun 10, 2023
@thiagoftsm thiagoftsm marked this pull request as ready for review June 12, 2023 01:45
@stelfrag
Copy link
Collaborator

I applied this on top of #15146

On shutdown, I see

2023-06-12 09:59:24: netdata INFO  : MAIN : SERVICE CONTROL: the following 1 service(s) [ COLLECTORS ] take too long to exit: 'PD[ebpf]' (413770); giving up on them...
2023-06-12 09:59:24: netdata INFO  : MAIN : NETDATA SHUTDOWN: in   10138 ms, (TIMEOUT) stop all remaining worker threads - next: cancel main threads

I have ebpf_plugin still running

root      413773  3.9  0.0 738304 49752 ?        SN   09:58   0:03 /home/stelios/netdata-journal/netdata/usr/libexec/netdata/plugins.d/ebpf.plugin 1
#0  futex_wait (private=0, expected=2, futex_word=0x56019edad860 <ebpf_exit_cleanup>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=futex@entry=0x56019edad860 <ebpf_exit_cleanup>, private=0) at ./nptl/lowlevellock.c:49
#2  0x00007fe6c5a36082 in lll_mutex_lock_optimized (mutex=0x56019edad860 <ebpf_exit_cleanup>) at ./nptl/pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x56019edad860 <ebpf_exit_cleanup>) at ./nptl/pthread_mutex_lock.c:93
#4  0x000056019ebc94f6 in ebpf_stop_threads (sig=15) at collectors/ebpf.plugin/ebpf.c:730
#5  <signal handler called>
#6  0x00007fe6c5ab313b in __GI___close (fd=17) at ../sysdeps/unix/sysv/linux/close.c:27
#7  0x000056019ec6cf03 in bpf_link_perf_detach ()
#8  0x000056019ec6cae7 in bpf_link.destroy ()
#9  0x000056019ebc9001 in ebpf_unload_legacy_code (objects=0x7fe698000e20, probe_links=0x7fe698004640) at collectors/ebpf.plugin/ebpf.c:590
#10 0x000056019ebc9190 in ebpf_unload_unique_maps () at collectors/ebpf.plugin/ebpf.c:614
#11 0x000056019ebc96de in ebpf_stop_threads (sig=2) at collectors/ebpf.plugin/ebpf.c:771
#12 <signal handler called>
#13 0x00007fe6c5a83868 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7ffff9ddf950, rem=0x7ffff9ddf940) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#14 0x00007fe6c5a886e7 in __GI___nanosleep (req=<optimized out>, rem=<optimized out>) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#15 0x000056019ec1296d in sleep_usec_with_now (usec=999885, started_ut=1686553149451115) at libnetdata/clocks/clocks.c:365
#16 0x000056019ec12573 in heartbeat_next (hb=0x7ffff9ddfa10, tick=1000000) at libnetdata/clocks/clocks.c:320
#17 0x000056019ebcd767 in main (argc=2, argv=0x7ffff9ddfb48) at collectors/ebpf.plugin/ebpf.c:2650

Copy link
Contributor

@underhood underhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming other PRs will fix already reported issues

Copy link
Contributor

@Dim-P Dim-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found, tested on:

  • Debian 11
  • Ubuntu 22.04
  • Rocky Linux 9.2
  • Fedora 37

@thiagoftsm On a separate note unrelated to this PR, I noticed that fsync(2) and fdatasync(2) chart appears under Memory. Isn't Filesystem more suitable?

@thiagoftsm thiagoftsm reopened this Jun 19, 2023
@thiagoftsm thiagoftsm merged commit 35884c7 into netdata:master Jun 19, 2023
250 of 252 checks passed
@thiagoftsm thiagoftsm deleted the check_ebpf_syscall branch June 19, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collectors Everything related to data collection collectors/ebpf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: EBPF Plugin Kernel SEGFAULT
4 participants