Skip to content

[Bug]: Running netdata increases oom_kill #17013

@dev-zero

Description

@dev-zero

Bug description

Standard installation of Netdata v1.44.3 from the provided repo, with streaming:

/etc/netdata # cat stream.conf

[stream]
enabled = yes
destination = ...
api key = ...

on 3x x86_64 nodes (one of them being the parent) and 4x ARM nodes.
All of the systems are booted with systemd.unified_cgroup_hierarchy=1 (cgroup v2).

The oom_kill from /proc/vmstat increases on the two x86_64 child nodes by ~2000 every couple of minutes (with the respective warning in the dashboard).

Stopping netdata on the child nodes stops oom_kill from increasing (verified with watch -g grep oom_kill /proc/vmstat.

Unfortunately tracing oom_kill_process does not result in any output, nor does dmesg list any process.
The only evidence I have that it is netdata itself causing the oom_kill to increase is that by stopping it, the counter stops increasing.

The only I could trace is the do_send_sig_info which then lists go.d plugin or swapper/... as the process name.

Expected behavior

oom_kill not increasing just because I am running netdata.

Steps to reproduce

  1. Install netdata v1.44.3 from the binary packages on an openSUSE Leap 15.5 with cgroup v2 and configure stream.conf
  2. Wait for the dashboard to show the warning and/or watch /proc/vmstat

Installation method

manual setup of official DEB/RPM packages

System info

Linux infra1 5.14.21-150500.55.44-default #1 SMP PREEMPT_DYNAMIC Mon Jan 15 10:03:40 UTC 2024 (cc7d8b6) x86_64 x86_64 x86_64 GNU/Linux
/etc/os-release:NAME="openSUSE Leap"
/etc/os-release:VERSION="15.5"
/etc/os-release:ID="opensuse-leap"
/etc/os-release:ID_LIKE="suse opensuse"
/etc/os-release:VERSION_ID="15.5"
/etc/os-release:PRETTY_NAME="openSUSE Leap 15.5"
/etc/os-release:ANSI_COLOR="0;32"
/etc/os-release:CPE_NAME="cpe:/o:opensuse:leap:15.5"
/etc/os-release:LOGO="distributor-logo-Leap"

Netdata build info

Packaging:
    Netdata Version ____________________________________________ : v1.44.3
    Installation Type __________________________________________ : binpkg-rpm
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ :
    Configure Options __________________________________________ :  '--host=x86_64-suse-linux-gnu' '--build=x86_64-suse-linux-gnu' '--program-prefix=' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--datadir=/usr/share' '--includedir=/usr/include' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--disable-dependency-tracking' 'build_alias=x86_64-suse-linux-gnu' 'host_alias=x86_64-suse-linux-gnu' 'CFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables' 'CXXFLAGS=-O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables' 'PKG_CONFIG_PATH=:/usr/lib/pkgconfig:/usr/share/pkgconfig'
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /usr/share/netdata/web
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 5.14.21-150500.55.44-default
    Operating System ___________________________________________ : openSUSE Leap
    Operating System ID ________________________________________ : opensuse-leap
    Operating System ID Like ___________________________________ : suse opensuse
    Operating System Version ___________________________________ : 15.5
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 32
    CPU Frequency ______________________________________________ : 3000000000
    RAM Bytes __________________________________________________ : 540754960384
    Disk Capacity ______________________________________________ : 1440301105152
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : none
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    map ________________________________________________________ : YES
    save _______________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Judy (high-performance dynamic arrays and hashtables) ______ : YES (bundled)
    dlib (robust machine learning toolkit) _____________________ : YES (bundled)
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libm (mathematical functions) ______________________________ : YES
    jemalloc ___________________________________________________ : NO
    TCMalloc ___________________________________________________ : NO
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : YES
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : NO
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions