Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Docker container rapidly fills log file with apcaccess error #12413

Closed
Xalaxis opened this issue Mar 15, 2022 · 4 comments · Fixed by #12418
Closed

[Bug]: Docker container rapidly fills log file with apcaccess error #12413

Xalaxis opened this issue Mar 15, 2022 · 4 comments · Fixed by #12418
Assignees
Labels
area/collectors Everything related to data collection bug collectors/charts.d priority/high Super important issue

Comments

@Xalaxis
Copy link
Contributor

Xalaxis commented Mar 15, 2022

Bug description

I auto-update my Docker containers daily. This morning my Netdata container started rapidly filling my filesystem until it was full (approximately 600GB of data). After some investigation, a temporary file has been created that contains millions of lines of "charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1". This file is created every time the container is restarted, and once it gets going will accumulate on a SSD at a rate of approximately 3GB per minute or so.

Issue #12346 feels related given the timing, perhaps one of the changes has introduced the bug?

/tmp/.netdata-charts.d-XXXXmPHbmH # ls -lh
total 87G    
-rw-rw----    1 netdata  netdata    87.4G Mar 15 14:27 run.295

/tmp/.netdata-charts.d-XXXXmPHbmH # tail run.295 
2022-03-15 13:53:07: charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1:
 --- BEGIN TRACE ---
2022-03-15 13:53:07: charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1:
 --- BEGIN TRACE ---
2022-03-15 13:53:07: charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1:
 --- BEGIN TRACE ---
2022-03-15 13:53:07: charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1:
 --- BEGIN TRACE ---
2022-03-15 13:53:07: charts.d: : apcupsd: command 'apcaccess status 127.0.0.1:3551 ' failed with code 1:

Expected behavior

The log file output should be capped/ratelimited, otherwise this issue could occur again with other faults.

To resolve this issue, probably a delay should be present before attempting to reconnect to apcupsd, and if the command has never succeeded possibly a cancellation of the check?

Steps to reproduce

  1. Start Netdata
  2. Visit the Netdata interface, then close it
  3. Watch the contents of /tmp increase rapidly

Installation method

docker

System info

/tmp/.netdata-charts.d-XXXXmPHbmH # uname -a; grep -HvE "^#|URL" /etc/*release
Linux unraid 5.15.27-Unraid #1 SMP Wed Mar 9 11:15:51 PST 2022 x86_64 Linux
/etc/alpine-release:3.15.0
/etc/os-release:NAME=Slackware
/etc/os-release:VERSION="15.0"
/etc/os-release:ID=slackware
/etc/os-release:VERSION_ID=15.0
/etc/os-release:PRETTY_NAME="Slackware 15.0 x86_64"
/etc/os-release:ANSI_COLOR="0;34"
/etc/os-release:CPE_NAME="cpe:/o:slackware:slackware_linux:15.0"
/etc/os-release:VERSION_CODENAME=stable

Netdata build info

/tmp/.netdata-charts.d-XXXXmPHbmH # netdata -W buildinfo
Version: netdata v1.33.1-165-nightly
Configure options:  '--prefix=/usr' '--sysconfdir=/etc' '--localstatedir=/var' '--libexecdir=/usr/libexec' '--libdir=/usr/lib' '--with-zlib' '--with-math' '--with-user=netdata' '--without-bundled-protobuf' '--with-bundled-libJudy' '--disable-ebpf' 'CFLAGS=' 'LDFLAGS='
Install type: unknown
Features:
    dbengine:                   YES
    Native HTTPS:               YES
    Netdata Cloud:              YES 
    ACLK Next Generation:       YES
    ACLK-NG New Cloud Protocol: YES
    ACLK Legacy:                NO
    TLS Host Verification:      YES
    Machine Learning:           YES
    Stream Compression:         YES
Libraries:
    protobuf:                YES (system)
    jemalloc:                NO
    JSON-C:                  YES
    libcap:                  NO
    libcrypto:               YES
    libm:                    YES
    tcalloc:                 NO
    zlib:                    YES
Plugins:
    apps:                    YES
    cgroup Network Tracking: YES
    CUPS:                    NO
    EBPF:                    NO
    IPMI:                    YES
    NFACCT:                  NO
    perf:                    YES
    slabinfo:                YES
    Xen:                     NO
    Xen VBD Error Tracking:  NO
Exporters:
    AWS Kinesis:             NO
    GCP PubSub:              NO
    MongoDB:                 YES
    Prometheus Remote Write: YES

Additional info

Issue also being followed at https://forums.unraid.net/topic/47828-support-data-monkey-netdata/page/13/

@Xalaxis Xalaxis added bug needs triage Issues which need to be manually labelled labels Mar 15, 2022
@MrZammler
Copy link
Contributor

Hi @Xalaxis thanks for your report, we will look into it. Thanks.

@ilyam8 ilyam8 added the priority/high Super important issue label Mar 15, 2022
@Ferroin
Copy link
Member

Ferroin commented Mar 15, 2022

Looks like a bug in the APCUPSD collector in the charts.d plugin. We recently added the tooling this collector requires to our Docker images, and the plugin is enabled by default and seems to (incorrectly) assume that if the tooling is present it’s correctly configured.

Short-term fix is to manually disable the APCUPSD plugin. This can be done by creating a file called /etc/netdata/charts.d.conf in the container with the following contents (or appending the following line to such a file if it already exists):

apcupsd=no

We’re working on a long-term fix as I’m writing this.

@ilyam8 ilyam8 added area/collectors Everything related to data collection area/external and removed needs triage Issues which need to be manually labelled labels Mar 15, 2022
@ilyam8 ilyam8 self-assigned this Mar 15, 2022
@Ferroin
Copy link
Member

Ferroin commented Mar 15, 2022

A Docker image with a temporary fix has been published with the latest and edge tags. A more permanent fix is being worked on and will be published as part of the regular nightly builds.

@Xalaxis
Copy link
Contributor Author

Xalaxis commented Mar 15, 2022

Much appreciated, thanks for all your rapid work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collectors Everything related to data collection bug collectors/charts.d priority/high Super important issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants