New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk input gives error which disappears on restart #1352
Comments
can you provide your configuration? |
@sparrc I get it while running - https://github.com/deis/monitor/tree/master/telegraf |
+1 |
does anyone have steps to reproduce this? |
@sparrc my guess is it's related to telegraf running early in the startup process of CoreOS. As I have only reproduced it on newly booted boxes. |
The things I know
|
My configuration
|
@sstarcher the entire telegraf process hangs when this happens? |
@sparrc I still get the other inputs shipped so I doubt it's hanging. I could turn debug on and quiet off to see if anything else is happening. |
Disk usage and iNodes are not collected and shipped, but cpu/memory/network is being shipped. |
The first time telegraf is starting I see the following in my journal logs
|
Just noticed this in my papertrail logs and google brought me here:
Repeats every 10s |
I'm seeing the same issue with all telegraf pods created by deis workflow 2.3.0 on kubernetes 1.3.4, 1.0.0-beta2-18-g755b2ec and CoreOS stable as the host os. I think @sstarcher is on the right track. If I attach to a telegraf container and traverse the rootfs with find / -type s
find: '/rootfs/proc/sys/fs/binfmt_misc': Too many levels of symbolic links The error only occurs for The output of
|
same as @felixbuenemann |
I am pretty sure that all of the affected host systems (not the containers) are running systemd. I found this related ubuntu issue: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760 Both coreos stable (what I am running) and ubuntu xenial (listed in the report) are using systemd 229. |
OK, I think the issue at hand is that the If I do |
I have been able to work around the issue on CoreOS by requiring the This is the cloud-config userdata to define the drop-in: coreos:
units:
- name: docker.service
drop-ins:
- name: 60-mount-binfmt-misc.conf
content: |
[Unit]
Wants=proc-sys-fs-binfmt_misc.mount
After=proc-sys-fs-binfmt_misc.mount The same could be done on other systems by editing or extending the docker systemd service. |
CentOS 7.2, docker 1.12.3, telegraf 1.1.1 (in container) docker-compose.yml
telegraf.conf:
|
Having the same issue #1544 |
I personally solved this issue by creating a shell script that wraps telegraf around. # FIXME: for some reasons, this doesn't happen to work in the entrypoint :(
if [ -n "$HOST_MOUNT_PREFIX" ]; then
while (mount | grep "$HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc" &>/dev/null); do
# this solves #TELEGRAF-1352 (too many levels of symbolic links)
umount $HOST_MOUNT_PREFIX/proc/sys/fs/binfmt_misc
done
fi
/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS I've changed the Dockerfile's CMD statement to use the script and the issue is gone for good. |
has anyone been able to confirm that this is the same issue as https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760? |
Host OS Information Docker OS Information Description 2017-02-06T15: 32: 40Z E! ERROR in input [inputs.disk]: error getting disk usage info: too many levels of symbolic links |
@gunnaraasen Can you tell me how you resolved the issue? |
I found the cause. My case is that the host binfmt_misc is basically an automount so it will mount at the point of use.
At first, you can see the following logs.
After running telegraf, you can see the following logs.
I've never used binfmt_misc on my host before, which means it will not mount. However, since bingmt_misc is mounted to access mount information from telegraf first, since then volume mapping information is included as well, so telegraf can collect information based on disk information without problems. |
same issue, I made a quick fix with |
Closing, please use the workaround provided by @Dark0096 |
NOTE this is currently attributed to an issue with systemd: https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1555760
Bug report
System info:
Telegraf version 0.13.1
CoreOS version 1010.5.0
Many plugins running, including the
disk
input.Steps to reproduce:
We can reproduce the issue in our environment.
Expected behavior:
Expect the Telegraf disk input to work correctly on startup.
Actual behavior:
The Telegraf disk input reports the following errors every collection cycle.
The input starts working correctly when Telegraf is restarted.
The text was updated successfully, but these errors were encountered: