Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Netdata service restarting randomly in one of the server #16871

Closed
karangaj opened this issue Jan 29, 2024 · 6 comments
Closed

[Bug]: Netdata service restarting randomly in one of the server #16871

karangaj opened this issue Jan 29, 2024 · 6 comments

Comments

@karangaj
Copy link

Bug description

I am facing an issue with the netdata service running on my server. As a result, I am getting unreachable alerts for 1 minute even if the server is online.

Expected behavior

The netdata service restarts randomly and I get an alert that the server is unreachable and then it shows rechable after a minute. I was able to find some common error logs at the time of Restart, Before, and After but I am not sure what the issue is.

Steps to reproduce

It restarts unexpectedly so no steps to replicate.

Installation method

kickstart.sh

System info

root@n2:~# uname -a; grep -HvE "^#|URL" /etc/*release
Linux n2 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64 GNU/Linux
/etc/os-release:PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
/etc/os-release:NAME="Debian GNU/Linux"
/etc/os-release:VERSION_ID="12"
/etc/os-release:VERSION="12 (bookworm)"
/etc/os-release:VERSION_CODENAME=bookworm
/etc/os-release:ID=debian

Netdata build info

root@n2:~# netdata -W buildinfo
Packaging:
    Netdata Version ____________________________________________ : v1.44.1
    Installation Type __________________________________________ : binpkg-deb
    Package Architecture _______________________________________ : x86_64
    Package Distro _____________________________________________ :
    Configure Options __________________________________________ :  '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--with-user=netdata' '--with-math' '--with-zlib' '--with-webdir=/var/lib/netdata/www' '--disable-dependency-tracking' 'build_alias=x86_64-linux-gnu' 'CFLAGS=-g -O2 -ffile-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -ffile-prefix-map=/usr/src/netdata=. -fstack-protector-strong -Wformat -Werror=format-security'
Default Directories:
    User Configurations ________________________________________ : /etc/netdata
    Stock Configurations _______________________________________ : /usr/lib/netdata/conf.d
    Ephemeral Databases (metrics data, metadata) _______________ : /var/cache/netdata
    Permanent Databases ________________________________________ : /var/lib/netdata
    Plugins ____________________________________________________ : /usr/libexec/netdata/plugins.d
    Static Web Files ___________________________________________ : /var/lib/netdata/www
    Log Files __________________________________________________ : /var/log/netdata
    Lock Files _________________________________________________ : /var/lib/netdata/lock
    Home _______________________________________________________ : /var/lib/netdata
Operating System:
    Kernel _____________________________________________________ : Linux
    Kernel Version _____________________________________________ : 6.2.16-3-pve
    Operating System ___________________________________________ : Debian GNU/Linux
    Operating System ID ________________________________________ : debian
    Operating System ID Like ___________________________________ : unknown
    Operating System Version ___________________________________ : 12 (bookworm)
    Operating System Version ID ________________________________ : none
    Detection __________________________________________________ : /etc/os-release
Hardware:
    CPU Cores __________________________________________________ : 12
    CPU Frequency ______________________________________________ : 3201000000
    RAM Bytes __________________________________________________ : 67306582016
    Disk Capacity ______________________________________________ : 5761150230528
    CPU Architecture ___________________________________________ : x86_64
    Virtualization Technology __________________________________ : none
    Virtualization Detection ___________________________________ : systemd-detect-virt
Container:
    Container __________________________________________________ : none
    Container Detection ________________________________________ : systemd-detect-virt
    Container Orchestrator _____________________________________ : none
    Container Operating System _________________________________ : none
    Container Operating System ID ______________________________ : none
    Container Operating System ID Like _________________________ : none
    Container Operating System Version _________________________ : none
    Container Operating System Version ID ______________________ : none
    Container Operating System Detection _______________________ : none
Features:
    Built For __________________________________________________ : Linux
    Netdata Cloud ______________________________________________ : YES
    Health (trigger alerts and send notifications) _____________ : YES
    Streaming (stream metrics to parent Netdata servers) _______ : YES
    Back-filling (of higher database tiers) ____________________ : YES
    Replication (fill the gaps of parent Netdata servers) ______ : YES
    Streaming and Replication Compression ______________________ : YES (zstd lz4 gzip)
    Contexts (index all active and archived metrics) ___________ : YES
    Tiering (multiple dbs with different metrics resolution) ___ : YES (5)
    Machine Learning ___________________________________________ : YES
Database Engines:
    dbengine ___________________________________________________ : YES
    alloc ______________________________________________________ : YES
    ram ________________________________________________________ : YES
    map ________________________________________________________ : YES
    save _______________________________________________________ : YES
    none _______________________________________________________ : YES
Connectivity Capabilities:
    ACLK (Agent-Cloud Link: MQTT over WebSockets over TLS) _____ : YES
    static (Netdata internal web server) _______________________ : YES
    h2o (web server) ___________________________________________ : YES
    WebRTC (experimental) ______________________________________ : NO
    Native HTTPS (TLS Support) _________________________________ : YES
    TLS Host Verification ______________________________________ : YES
Libraries:
    LZ4 (extremely fast lossless compression algorithm) ________ : YES
    ZSTD (fast, lossless compression algorithm) ________________ : YES
    zlib (lossless data-compression library) ___________________ : YES
    Judy (high-performance dynamic arrays and hashtables) ______ : YES (bundled)
    dlib (robust machine learning toolkit) _____________________ : YES (bundled)
    protobuf (platform-neutral data serialization protocol) ____ : YES (system)
    OpenSSL (cryptography) _____________________________________ : YES
    libdatachannel (stand-alone WebRTC data channels) __________ : NO
    JSON-C (lightweight JSON manipulation) _____________________ : YES
    libcap (Linux capabilities system operations) ______________ : NO
    libcrypto (cryptographic functions) ________________________ : YES
    libm (mathematical functions) ______________________________ : YES
    jemalloc ___________________________________________________ : NO
    TCMalloc ___________________________________________________ : NO
Plugins:
    apps (monitor processes) ___________________________________ : YES
    cgroups (monitor containers and VMs) _______________________ : YES
    cgroup-network (associate interfaces to CGROUPS) ___________ : YES
    proc (monitor Linux systems) _______________________________ : YES
    tc (monitor Linux network QoS) _____________________________ : YES
    diskspace (monitor Linux mount points) _____________________ : YES
    freebsd (monitor FreeBSD systems) __________________________ : NO
    macos (monitor MacOS systems) ______________________________ : NO
    statsd (collect custom application metrics) ________________ : YES
    timex (check system clock synchronization) _________________ : YES
    idlejitter (check system latency and jitter) _______________ : YES
    bash (support shell data collection jobs - charts.d) _______ : YES
    debugfs (kernel debugging metrics) _________________________ : YES
    cups (monitor printers and print jobs) _____________________ : YES
    ebpf (monitor system calls) ________________________________ : YES
    freeipmi (monitor enterprise server H/W) ___________________ : YES
    nfacct (gather netfilter accounting) _______________________ : YES
    perf (collect kernel performance events) ___________________ : YES
    slabinfo (monitor kernel object caching) ___________________ : YES
    Xen ________________________________________________________ : NO
    Xen VBD Error Tracking _____________________________________ : NO
    Logs Management ____________________________________________ : YES
Exporters:
    AWS Kinesis ________________________________________________ : NO
    GCP PubSub _________________________________________________ : NO
    MongoDB ____________________________________________________ : YES
    Prometheus (OpenMetrics) Exporter __________________________ : YES
    Prometheus Remote Write ____________________________________ : YES
    Graphite ___________________________________________________ : YES
    Graphite HTTP / HTTPS ______________________________________ : YES
    JSON _______________________________________________________ : YES
    JSON HTTP / HTTPS __________________________________________ : YES
    OpenTSDB ___________________________________________________ : YES
    OpenTSDB HTTP / HTTPS ______________________________________ : YES
    All Metrics API ____________________________________________ : YES
    Shell (use metrics in shell scripts) _______________________ : YES
Debug/Developer Features:
    Trace All Netdata Allocations (with charts) ________________ : NO
    Developer Mode (more runtime checks, slower) _______________ : NO

Additional info

root@n2:~#journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata
Jan 26 02:45:59 n2 netdata[1440077]: Netdata agent version "v1.44.1" is starting
Jan 26 02:45:59 n2 netdata[1440077]: IEEE754: system is using IEEE754 DOUBLE PRECISION values
Jan 26 02:45:59 n2 netdata[1440077]: TIMEZONE: using the contents of /etc/timezone
Jan 26 02:45:59 n2 netdata[1440077]: TIMEZONE: fixed as 'Asia/Bangkok'
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: next: initialize ML
Jan 26 02:45:59 n2 netdata[1440077]: ml database version is 2 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 3 ms, initialize ML - next: initialize signals
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize signals - next: initialize static threads
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize static threads - next: initialize web server
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize web server - next: initialize h2o server
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize h2o server - next: set resource limits
Jan 26 02:45:59 n2 netdata[1440077]: resources control: allowed file descriptors: soft = 1024, max = 524288
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, set resource limits - next: become daemon
Jan 26 02:45:59 n2 netdata[1440077]: Failed to open pidfile '/var/run/netdata/netdata.pid'.
Jan 26 02:45:59 n2 netdata[1440077]: Out-Of-Memory (OOM) score is already set to the wanted value 0
Jan 26 02:45:59 n2 netdata[1440077]: Adjusted netdata scheduling policy to batch (3), with priority 0.
Jan 26 02:45:59 n2 netdata[1440077]: Running with process scheduling policy 'batch', nice level 19
Jan 26 02:45:59 n2 netdata[1440077]: Cannot chown '/var/run/netdata/netdata.pid' to 109:116
Jan 26 02:45:59 n2 netdata[1440077]: netdata started on pid 1440077.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 2 ms, become daemon - next: initialize threads after fork
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize threads after fork - next: initialize registry
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, initialize registry - next: fork the spawn server
Jan 26 02:45:59 n2 netdata[1440077]: Initializing spawn client.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, fork the spawn server - next: collecting system info
Jan 26 02:45:59 n2 netdata[1440079]: Spawn server is up.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 108 ms, collecting system info - next: initialize RRD structures
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database /var/cache/netdata/netdata-meta.db initialization
Jan 26 02:45:59 n2 netdata[1440077]: metadata database version is 15 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database initialization completed
Jan 26 02:45:59 n2 netdata[1440077]: SQLite database /var/cache/netdata/context-meta.db initialization
Jan 26 02:45:59 n2 netdata[1440077]: context database version is 1 (no migration needed)
Jan 26 02:45:59 n2 netdata[1440077]: Cannot open the file /var/lib/netdata/health.silencers.json, so Netdata will work with the default health configuration.
Jan 26 02:45:59 n2 netdata[1440077]: CONFIG: cannot load user config '/etc/netdata/stream.conf'. Will try stock config.
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 68 files in path /var/cache/netdata/dbengine-tier1
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 35 files in path /var/cache/netdata/dbengine-tier2
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: found 122 files in path /var/cache/netdata/dbengine
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 12 data/journal of tier 2...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 23 data/journal of tier 1...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: loading 41 data/journal of tier 0...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 12 journal files of tier 2, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 41 journal files of tier 0, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: populating retention to MRG from 23 journal files of tier 1, using 4 threads...
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 0 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 1 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: DBENGINE: tier 2 is ready for data collection and queries
Jan 26 02:45:59 n2 netdata[1440077]: Host 'cirrus.n2' (at registry as 'cirrus.n2') with guid '3cb84fc6-a974-11ee-9b6e-3cecefbf5390' initialized, os 'linux', timezone 'Asia/Bangkok', tags '',>
Jan 26 02:45:59 n2 netdata[1440077]: Creating archived hosts
Jan 26 02:45:59 n2 netdata[1440077]: Created 0 archived hosts
Jan 26 02:45:59 n2 netdata[1440077]: ACLK sync initialization completed
Jan 26 02:45:59 n2 netdata[1440077]: Starting ACLK synchronization thread
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 79 ms, initialize RRD structures - next: check for incomplete shutdown
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, check for incomplete shutdown - next: collect claiming info
Jan 26 02:45:59 n2 netdata[1440077]: File '/var/lib/netdata/cloud.d/claimed_id' was found. Setting state to AGENT_CLAIMED.
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, collect claiming info - next: collect host labels
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 1 ms, collect host labels - next: start the static threads
Jan 26 02:45:59 n2 netdata[1440077]: CONFIG: cannot load user exporting config '/etc/netdata/exporting.conf'. Will try the stock version.
Jan 26 02:45:59 n2 netdata[1440077]: To use encryption it is necessary to set "ssl certificate" and "ssl key" in [web] !
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 2
Jan 26 02:45:59 n2 netdata[1440077]: Waiting for Cloud to be enabled
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 3
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 4
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 5
Jan 26 02:45:59 n2 netdata[1440077]: starting worker 6
Jan 26 02:45:59 n2 netdata[1440077]: No connector instances to activate
Jan 26 02:45:59 n2 netdata[1440077]: EXPORTING: no exporting connectors configured
Jan 26 02:45:59 n2 netdata[1440077]: cleaning up...
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 0 ms, start the static threads - next: initialize commands API
Jan 26 02:45:59 n2 netdata[1440077]: Initializing command server.
Jan 26 02:45:59 n2 netdata[1440077]: STATSD collector thread started with taskid 1440566
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: in 1 ms, initialize commands API - next: ready
Jan 26 02:45:59 n2 netdata[1440077]: NETDATA STARTUP: completed in 199 ms. Enjoy real-time performance monitoring!
Jan 26 02:45:59 n2 netdata[1440077]: use unified cgroups true
Jan 26 02:45:59 n2 perf.plugin[1440544]: no charts enabled - nothing to do.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: plugin called DISABLE. Disabling it.
Jan 26 02:45:59 n2 apps.plugin[1440564]: PROCFILE: Cannot open file '/etc/netdata/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: Loaded config file '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 apps.plugin[1440564]: started on pid 1440564
Jan 26 02:45:59 n2 netdata[1440077]: child pid 1440544 exited with code 1.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: 'host:cirrus.n2', '/usr/libexec/netdata/plugins.d/perf.plugin' (pid 1440544) exited with error code 1 and haven't collected any data. Disabling>
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Does not have a configuration file inside `/etc/netdata/ebpf.d.conf. It will try to load stock file.
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440596]: FireQOS is not installed on this system. Use FireQOS to apply traffic QoS and expose the class names to netdata. Check https://github.com/netdat>
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: plugin called DISABLE. Disabling it.
Jan 26 02:45:59 n2 netdata[1440077]: child pid 1440594 exited with code 1.
Jan 26 02:45:59 n2 netdata[1440077]: PLUGINSD: 'host:cirrus.n2', '/usr/libexec/netdata/plugins.d/ioping.plugin' (pid 1440594) exited with error code 1 and haven't collected any data. Disabli>
Jan 26 02:45:59 n2 systemd-journal.plugin[1440538]: heartbeat randomness of 337000 is too big for a tick of 100000 - setting it to 29000
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Name resolution is disabled, collector will not parse "hostnames" list.
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440617]: Cannot find file '/usr/lib/netdata/conf.d/tc-qos-helper.conf'.
Jan 26 02:45:59 n2 ebpf.plugin[1440585]: Cannot read process groups configuration file '/etc/netdata/apps_groups.conf'. Will try '/usr/lib/netdata/conf.d/apps_groups.conf'
Jan 26 02:45:59 n2 tc-qos-helper.sh[1440630]: Cannot find file '/etc/netdata/tc-qos-helper.conf'.
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="env HTTP_PROXY '', HTTPS_PROXY ''" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="instance is started" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="loading config file" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="found '/usr/lib/netdata/conf.d/go.d.conf" plugin=go.d component=agent
Jan 26 02:45:59 n2 netdata[1440534]: level=info msg="config successfully loaded" plugin=go.d component=agent

Additional logs

root@n2:~# journalctl -u netdata --namespace=netdata
Jan 26 02:45:14 n2 netdata[810983]: Deleting chart 'cgroup_qemu_minbu-domain-com.cpu_limit' ('cgroup_qemu_minbu-domain-com.cpu_limit') from disk...
Jan 26 02:45:14 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, clean rrdhost database - next: stop aclk threads
Jan 26 02:45:14 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, stop aclk threads - next: stop all remaining worker threads
Jan 26 02:45:14 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 2 services [ COLLECTORS ANALYTICS ] to exit: 'P[cgroups]' (811518), 'ANALYTICS' (814453)
Jan 26 02:45:14 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:15 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:16 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:17 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:18 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:19 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:20 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:21 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:22 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:23 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:24 n2 netdata[810983]: SERVICE CONTROL: waiting for the following 1 services [ COLLECTORS ] to exit: 'P[cgroups]' (811518)
Jan 26 02:45:24 n2 netdata[810983]: SERVICE CONTROL: the following 1 service(s) [ COLLECTORS ] take too long to exit: 'P[cgroups]' (811518); giving up on them...
Jan 26 02:45:24 n2 netdata[810983]: NETDATA SHUTDOWN: in 10117 ms, (TIMEOUT) stop all remaining worker threads - next: cancel main threads
Jan 26 02:45:24 n2 netdata[810983]: EXIT: Stopping main thread: DYNCFG
Jan 26 02:45:24 n2 netdata[810983]: EXIT: Stopping main thread: P[cgroups]
Jan 26 02:45:24 n2 netdata[810983]: Waiting 2 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: cleaning up...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:24 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:25 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:26 n2 netdata[810983]: Waiting 1 threads to finish...
Jan 26 02:45:29 n2 netdata[810983]: Main thread P[cgroups] takes too long to exit. Giving up...
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 5005 ms, cancel main threads - next: close SQL context db
Jan 26 02:45:29 n2 netdata[810983]: Closing context SQLite database
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, close SQL context db - next: closed SQL main db
Jan 26 02:45:29 n2 netdata[810983]: Closing SQLite database
Jan 26 02:45:29 n2 netdata[810983]: No statements pending to finalize
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, closed SQL main db - next: remove pid file
Jan 26 02:45:29 n2 netdata[810983]: EXIT: cannot unlink pidfile '/var/run/netdata/netdata.pid'.
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, remove pid file - next: free openssl structures
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, free openssl structures - next: remove incomplete shutdown file
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: in 0 ms, remove incomplete shutdown file - next: exit
Jan 26 02:45:29 n2 netdata[810983]: NETDATA SHUTDOWN: completed in 22026 ms - netdata is now exiting - bye bye...
Jan 26 02:45:29 n2 netdata[811011]: EOF found in spawn pipe.
Jan 26 02:45:29 n2 netdata[811011]: Shutting down spawn server event loop.
Jan 26 02:45:29 n2 netdata[811011]: Shutting down spawn server loop complete.

@karangaj karangaj added bug needs triage Issues which need to be manually labelled labels Jan 29, 2024
@netdata-community-bot
Copy link

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/netdata-service-restarting-randomly-in-one-of-the-server/5076/8

@karangaj
Copy link
Author

The issue was opened by me on Netdata Community Forums and they asked me to open a bug report.

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/netdata-service-restarting-randomly-in-one-of-the-server/5076/8

@ilyam8
Copy link
Member

ilyam8 commented Jan 30, 2024

Hi, @karangaj. I see nothing in your logs that would indicate a problem. Can you provide full logs from Netdata start to restart? Not parts that you find common but full.

@ilyam8 ilyam8 added need feedback and removed needs triage Issues which need to be manually labelled labels Jan 30, 2024
@karangaj
Copy link
Author

karangaj commented Feb 2, 2024

Hello @ilyam8,
As of now, we have implemented some changes on our end, and the service hasn't been restarted since. I will monitor the situation over the next week. If the issue persists, I will provide you with the complete logs for further investigation. In case the problem does not reoccur during this period, I will proceed to close this issue.

@Arslan374
Copy link

Hi @karangaj ,
I am facing the same issue on some of my servers, if you have resolved it please share.

@karangaj
Copy link
Author

karangaj commented Feb 4, 2024

Hi @Arslan374,
We made a lot of changes during this time and couldn't point out which one worked, We can't revert the changes to check at this point. We did face another issue with Netdata (Which is not related to this topic) showing the server as unreachable even though the server and service were not restarted and we could identify the issue by checking logs and fixing it in the server.

Please try using the below command to identify the issue if possible and share it with the team.
journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata
journalctl -u netdata --namespace=netdata

@karangaj karangaj closed this as completed Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants