Disk input gives error which disappears on restart #1352

gunnaraasen · 2016-06-08T19:41:40Z

NOTE this is currently attributed to an issue with systemd: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760

Bug report

System info:

Telegraf version 0.13.1
CoreOS version 1010.5.0
Many plugins running, including the disk input.

Steps to reproduce:

We can reproduce the issue in our environment.

Expected behavior:

Expect the Telegraf disk input to work correctly on startup.

Actual behavior:

The Telegraf disk input reports the following errors every collection cycle.

2016/06/08 18:59:20 Error in input [disk]: error getting disk usage info: too many levels of symbolic links

The input starts working correctly when Telegraf is restarted.

The text was updated successfully, but these errors were encountered:

sparrc · 2016-06-14T17:14:36Z

can you provide your configuration?

sstarcher · 2016-07-05T21:34:57Z

@sparrc I get it while running - https://github.com/deis/monitor/tree/master/telegraf

wegel · 2016-07-19T19:22:29Z

+1

sparrc · 2016-07-20T11:25:12Z

does anyone have steps to reproduce this?

sstarcher · 2016-07-20T12:44:13Z

@sparrc my guess is it's related to telegraf running early in the startup process of CoreOS. As I have only reproduced it on newly booted boxes.

sstarcher · 2016-07-20T13:04:18Z

The things I know

I have reproduced it while running telegraf on boot inside a docker container.
I have also reproduced it running telegraf inside of a kubernetes pod as a Daemon set.
Once it happens it never goes away and nothing else is logged
Jumping inside of the container all mounted directories look fine

sstarcher · 2016-07-20T13:05:02Z

My configuration

Setting Agent Hostname to: ip-10-0-20-45.ec2.internal
Setting PROMETHEUS_URLS to: "https://192.168.3.1:443/api/v1/proxy/nodes/ip-10-0-20-45.ec2.internal/metrics", "https://192.168.3.1:443/metrics"
Building config.toml!
Finished building toml...
###########################################
###########################################
# Set Tag Configuration
[tags]
# Set Agent Configuration
[agent]
  interval = "10s"
  round_interval = true
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = true
  flush_buffer_when_full = true
  hostname = "ip-10-0-20-45.ec2.internal"
# Set output configuration
[[outputs.influxdb]]
  urls = ["http://influxdb.service.consul:8086"]
  database = "telegraf"
  precision = "ns"
  timeout = "5s"




# Set Input Configuration
[[inputs.netstat]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.mem]]
[[inputs.cpu]]
  percpu = true
  totalcpu = true
[[inputs.disk]]

[[inputs.diskio]]


[[inputs.net]]

# Set Service Input Configuration
###########################################
###########################################
2016/07/01 02:46:14 Starting Telegraf (version 1.0.0-beta1-1-g2211231)
2016/07/01 02:46:14 Loaded outputs: influxdb
2016/07/01 02:46:14 Loaded inputs: disk diskio net netstat swap system mem cpu
2016/07/01 02:46:14 Tags enabled: host=ip-10-0-20-45.ec2.internal
2016/07/01 02:46:14 Agent Config: Interval:10s, Debug:false, Quiet:true, Hostname:"ip-10-0-20-45.ec2.internal", Flush Interval:10s
2016/07/01 02:46:20 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links
2016/07/01 02:46:30 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links

sparrc · 2016-07-20T13:09:10Z

@sstarcher the entire telegraf process hangs when this happens?

sstarcher · 2016-07-20T13:10:53Z

@sparrc I still get the other inputs shipped so I doubt it's hanging. I could turn debug on and quiet off to see if anything else is happening.

sstarcher · 2016-07-20T13:11:23Z

Disk usage and iNodes are not collected and shipped, but cpu/memory/network is being shipped.

sstarcher · 2016-07-20T13:12:00Z

The first time telegraf is starting I see the following in my journal logs

Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 3782 (telegraf)
Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: Mounting Arbitrary Executable File Formats File System...
Jul 01 02:46:20 ip-10-0-20-45.ec2.internal systemd[1]: Mounted Arbitrary Executable File Formats File System.```

olalonde · 2016-08-03T22:18:01Z

Just noticed this in my papertrail logs and google brought me here:

Aug 03 15:15:41 localhost fluentd:  2016-08-03T22:15:40Z    fluentd {"log":"2016/08/03 22:15:40 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links\n","stream":"stderr","docker":{"container_id":"427f86914f90644f9d56770a379cfe5b48f9c7c30598092dade1b40bd8f3830a"},"kubernetes":{"namespace_name":"deis","pod_id":"6164e10f-530c-11e6-a5e2-068755380eff","pod_name":"deis-monitor-telegraf-x5e7f","container_name":"deis-monitor-telegraf","labels":{"app":"deis-monitor-telegraf"},"host":"ip-172-20-0-131.us-west-1.compute.internal"}}

2016/08/03 22:15:40 ERROR in input [disk]: error getting disk usage info: too many levels of symbolic links

Repeats every 10s

felixbuenemann · 2016-08-12T20:32:07Z

I'm seeing the same issue with all telegraf pods created by deis workflow 2.3.0 on kubernetes 1.3.4, 1.0.0-beta2-18-g755b2ec and CoreOS stable as the host os.

I think @sstarcher is on the right track. If I attach to a telegraf container and traverse the rootfs with find I get the following:

find / -type s
find: '/rootfs/proc/sys/fs/binfmt_misc': Too many levels of symbolic links

The error only occurs for /rootfs/proc/sys/fs/binfmt_misc while /proc/sys/fs/binfmt_misc can be listed just fine.

The output of mount shows:

systemd-1 on /rootfs/proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=0,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11343)

felipejfc · 2016-08-12T20:47:52Z

same as @felixbuenemann

felixbuenemann · 2016-08-12T20:59:19Z

I am pretty sure that all of the affected host systems (not the containers) are running systemd.

I found this related ubuntu issue: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760

Both coreos stable (what I am running) and ubuntu xenial (listed in the report) are using systemd 229.

felixbuenemann · 2016-08-12T21:28:35Z

OK, I think the issue at hand is that the /proc/sys/fs/binfmt_misc is mounted on the host at first access by the automount proxy. But the telegraf container ist started before that with a read-only mount of the host /proc to /rootfs/proc.

If I do ls /proc/sys/fs/binfmt_misc on the coreos host and then kill the telegraf pod so it restarts, the error goes away in the new pod.

felixbuenemann · 2016-08-12T22:22:26Z

I have been able to work around the issue on CoreOS by requiring the docker.service to want the proc-sys-fs-binfmt_misc.mount service, which causes /proc/sys/fs/binfmt_misc on the host to be mounted before docker starts.

This is the cloud-config userdata to define the drop-in:

coreos:
  units:
    - name: docker.service
      drop-ins:
        - name: 60-mount-binfmt-misc.conf
          content: |
            [Unit]
            Wants=proc-sys-fs-binfmt_misc.mount
            After=proc-sys-fs-binfmt_misc.mount

The same could be done on other systems by editing or extending the docker systemd service.

dvdred · 2016-12-01T15:30:35Z

CentOS 7.2, docker 1.12.3, telegraf 1.1.1 (in container)
Same issue.

docker-compose.yml

version: '2'
services:

  telegraf:
    image: telegraf
    environment:
      HOST_PROC: /rootfs/proc
      HOST_SYS: /rootfs/sys
      HOST_ETC: /rootfs/etc
    hostname: host
    volumes:
     - ./telegraf.conf:/etc/telegraf/telegraf.conf:ro
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - /sys:/rootfs/sys:ro
     - /proc:/rootfs/proc:ro
     - /etc:/rootfs/etc:ro

telegraf.conf:

...
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.diskio]]
...

m4ce · 2017-01-08T13:05:32Z

Having the same issue #1544

m4ce · 2017-01-11T11:46:13Z

I personally solved this issue by creating a shell script that wraps telegraf around.

# FIXME: for some reasons, this doesn't happen to work in the entrypoint :(
if [ -n "$HOST_MOUNT_PREFIX" ]; then
  while (mount | grep "$HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc" &>/dev/null); do
    # this solves #TELEGRAF-1352 (too many levels of symbolic links)
    umount $HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc
  done
fi

/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS

I've changed the Dockerfile's CMD statement to use the script and the issue is gone for good.

sparrc · 2017-01-11T12:50:46Z

has anyone been able to confirm that this is the same issue as https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760?

Dark0096 · 2017-02-06T15:34:07Z

Host OS Information
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

Docker OS Information
Docker Base Image : alpine 3.4

Description
I have experienced the same phenomenon. To explain my situation briefly, the Ubuntu host did not have a problem, and there was a picture-like issue inside the docker container. Telegraf left the following logs.

2017-02-06T15: 32: 40Z E! ERROR in input [inputs.disk]: error getting disk usage info: too many levels of symbolic links

Dark0096 · 2017-02-06T15:37:03Z

My input configuration like this

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "overlay"]

[[inputs.diskio]]

There is something strange. Restart the container and the disk collection will be normal. The contents of the container are as follows.

Dark0096 · 2017-02-07T01:56:15Z

@gunnaraasen Can you tell me how you resolved the issue?

Dark0096 · 2017-02-07T08:26:24Z

I found the cause. My case is that the host binfmt_misc is basically an automount so it will mount at the point of use.

sudo systemctl status proc-sys-fs-binfmt_misc.automount

At first, you can see the following logs.

 proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
   Loaded: loaded (/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: enabled)
   Active: active (waiting) since Mon 2017-02-06 19:24:19 KST; 22h ago
    Where: /proc/sys/fs/binfmt_misc
     Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt
           http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

After running telegraf, you can see the following logs.

proc-sys-fs-binfmt_misc.automount - Arbitrary Executable File Formats File System Automount Point
   Loaded: loaded (/lib/systemd/system/proc-sys-fs-binfmt_misc.automount; static; vendor preset: enabled)
   Active: active (running) since Mon 2017-02-06 19:24:20 KST; 22h ago
    Where: /proc/sys/fs/binfmt_misc
     Docs: https://www.kernel.org/doc/Documentation/binfmt_misc.txt
           http://www.freedesktop.org/wiki/Software/systemd/APIFileSystems

Feb 07 00:58:50 ip-xxx systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 28579 (telegraf)
Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

I've never used binfmt_misc on my host before, which means it will not mount. However, since bingmt_misc is mounted to access mount information from telegraf first, since then volume mapping information is included as well, so telegraf can collect information based on disk information without problems.

jeremydenoun · 2017-02-20T18:03:07Z

same issue, I made a quick fix with umount /proc/sys/fs/binfmt_misc just after telegraf start

danielnelson · 2018-11-12T21:44:59Z

Closing, please use the workaround provided by @Dark0096

bacongobbler mentioned this issue Jan 4, 2017

deis-monitor-telegraf shows - error getting disk usage info: too many levels of symbolic links deis/monitor#163

Open

sparrc added bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution labels Feb 9, 2017

sparrc added this to the Future Milestone milestone Feb 9, 2017

danielnelson removed this from the Future Milestone milestone Jun 14, 2017

danielnelson closed this as completed Nov 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk input gives error which disappears on restart #1352

Disk input gives error which disappears on restart #1352

gunnaraasen commented Jun 8, 2016 •

edited by sparrc

sparrc commented Jun 14, 2016

sstarcher commented Jul 5, 2016

wegel commented Jul 19, 2016

sparrc commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sparrc commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

olalonde commented Aug 3, 2016 •

edited

felixbuenemann commented Aug 12, 2016 •

edited

felipejfc commented Aug 12, 2016

felixbuenemann commented Aug 12, 2016 •

edited

felixbuenemann commented Aug 12, 2016

felixbuenemann commented Aug 12, 2016

dvdred commented Dec 1, 2016 •

edited

m4ce commented Jan 8, 2017

m4ce commented Jan 11, 2017

sparrc commented Jan 11, 2017

Dark0096 commented Feb 6, 2017 •

edited

Dark0096 commented Feb 6, 2017 •

edited

Dark0096 commented Feb 7, 2017

Dark0096 commented Feb 7, 2017 •

edited

jeremydenoun commented Feb 20, 2017

danielnelson commented Nov 12, 2018

Disk input gives error which disappears on restart #1352

Disk input gives error which disappears on restart #1352

Comments

gunnaraasen commented Jun 8, 2016 • edited by sparrc

Bug report

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

sparrc commented Jun 14, 2016

sstarcher commented Jul 5, 2016

wegel commented Jul 19, 2016

sparrc commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sparrc commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

sstarcher commented Jul 20, 2016

olalonde commented Aug 3, 2016 • edited

felixbuenemann commented Aug 12, 2016 • edited

felipejfc commented Aug 12, 2016

felixbuenemann commented Aug 12, 2016 • edited

felixbuenemann commented Aug 12, 2016

felixbuenemann commented Aug 12, 2016

dvdred commented Dec 1, 2016 • edited

m4ce commented Jan 8, 2017

m4ce commented Jan 11, 2017

sparrc commented Jan 11, 2017

Dark0096 commented Feb 6, 2017 • edited

Dark0096 commented Feb 6, 2017 • edited

Dark0096 commented Feb 7, 2017

Dark0096 commented Feb 7, 2017 • edited

jeremydenoun commented Feb 20, 2017

danielnelson commented Nov 12, 2018

gunnaraasen commented Jun 8, 2016 •

edited by sparrc

olalonde commented Aug 3, 2016 •

edited

felixbuenemann commented Aug 12, 2016 •

edited

felixbuenemann commented Aug 12, 2016 •

edited

dvdred commented Dec 1, 2016 •

edited

Dark0096 commented Feb 6, 2017 •

edited

Dark0096 commented Feb 6, 2017 •

edited

Dark0096 commented Feb 7, 2017 •

edited