Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to launch telegraf docker after capabilities changes #561

Closed
ptxmac opened this issue Dec 20, 2021 · 14 comments · Fixed by #562
Closed

Unable to launch telegraf docker after capabilities changes #561

ptxmac opened this issue Dec 20, 2021 · 14 comments · Fixed by #562
Labels

Comments

@ptxmac
Copy link

ptxmac commented Dec 20, 2021

When just trying to run the latest telegraf docker fails with the following:

# docker run --rm telegraf
Failed to set capabilities on file `/usr/bin/telegraf' (Operation not supported)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file

The latest version that works correctly is 1.20.3

@powersj
Copy link
Contributor

powersj commented Dec 20, 2021

Can you provide me with some more details on how you are running docker?

❯ docker run --rm telegraf
2021-12-20T18:19:05Z I! Starting Telegraf 1.21.1
2021-12-20T18:19:05Z I! Using config file: /etc/telegraf/telegraf.conf
2021-12-20T18:19:05Z I! Loaded inputs: cpu disk diskio kernel mem processes swap system
2021-12-20T18:19:05Z I! Loaded aggregators: 
2021-12-20T18:19:05Z I! Loaded processors: 
2021-12-20T18:19:05Z I! Loaded outputs: influxdb
2021-12-20T18:19:05Z I! Tags enabled: host=ae8379d08244
2021-12-20T18:19:05Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"ae8379d08244", Flush Interval:10s
2021-12-20T18:19:05Z W! [outputs.influxdb] When writing to [http://localhost:8086]: database "telegraf" creation failed: Post "http://localhost:8086/query": dial tcp 127.0.0.1:8086: connect: connection refused

@ptxmac
Copy link
Author

ptxmac commented Dec 20, 2021

Yes, it on a synology based system:

# uname -a
Linux Orion 3.10.108 #42218 SMP Mon Oct 18 19:16:10 CST 2021 x86_64 GNU/Linux synology_avoton_1517+

I'm running docker as root

Docker info:

``` # docker info Client: Context: default Debug Mode: false

Server:
Containers: 19
Running: 19
Paused: 0
Stopped: 0
Images: 176
Server Version: 20.10.3
Storage Driver: aufs
Root Dir: /volume5/@docker/aufs
Backing Filesystem: extfs
Dirs: 1195
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs db fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ea3508454ff2268c32720eb4d2fc9816d6f75f88
runc version: 31cc25f16f5eba4d0f53e35374532873744f4b31
init version: ed96d00 (expected: de40ad0)
Security Options:
apparmor
Kernel Version: 3.10.108
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: Orion
ID: KUUS:FXZ5:INHY:ZSPV:ESAM:AZYG:7NH3:G5VT:S4OE:3HE4:PMHN:X2NI
Docker Root Dir: /volume5/@docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: No kernel memory TCP limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No blkio weight support
WARNING: No blkio weight_device support
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: the aufs storage-driver is deprecated, and will be removed in a future release.

</details

@powersj
Copy link
Contributor

powersj commented Dec 20, 2021

This is looking like influxdata/telegraf#10302

Can you grab the logs from the container as well as the larger docker logs?

Also can you get into the container and run stat /usr/bin/telegraf as this doesn't look right:

The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file

@ptxmac
Copy link
Author

ptxmac commented Dec 20, 2021

Looks like it is a regular file allright

# docker run -it --rm --entrypoint bash telegraf
root@a49fc3873eee:/# stat /usr/bin/telegraf
  File: /usr/bin/telegraf
  Size: 133312512 	Blocks: 260384     IO Block: 4096   regular file
Device: 10000dh/1048589d	Inode: 71          Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-12-16 22:46:43.000000000 +0000
Modify: 2021-12-16 22:46:43.000000000 +0000
Change: 2021-12-20 17:20:04.301263329 +0000
 Birth: -

But set/getcap is not happy

# getcap  /usr/bin/telegraf
Failed to get capabilities of file `/usr/bin/telegraf' (Operation not supported)

Even when using --privileged

# docker run --privileged -it --rm --entrypoint bash telegraf
root@d4949d4bf9dc:/# getcap /usr/bin/telegraf
Failed to get capabilities of file `/usr/bin/telegraf' (Operation not supported)

@ptxmac
Copy link
Author

ptxmac commented Dec 20, 2021

docker.log doesn't contain anything interesting

2021-12-20T19:39:30+01:00 Orion docker[21455]: time="2021-12-20T19:39:30.013074890+01:00" level=warning msg="Could not get operating system name: Error opening /usr/lib/os-release: open /usr/lib/os-release: no such file or directory"
2021-12-20T19:39:30+01:00 Orion docker[21455]: time="2021-12-20T19:39:30.014036475+01:00" level=warning msg="Could not get operating system version: Error opening /usr/lib/os-release: open /usr/lib/os-release: no such file or directory"
2021-12-20T19:39:40+01:00 Orion docker[21455]: time="2021-12-20T19:39:40.011775403+01:00" level=warning msg="Could not get operating system name: Error opening /usr/lib/os-release: open /usr/lib/os-release: no such file or directory"
2021-12-20T19:39:40+01:00 Orion docker[21455]: time="2021-12-20T19:39:40.012653189+01:00" level=warning msg="Could not get operating system version: Error opening /usr/lib/os-release: open /usr/lib/os-release: no such file or directory"
2021-12-20T19:39:40+01:00 Orion docker[21455]: time="2021-12-20T19:39:40.488962061+01:00" level=warning msg="Failed to delete conntrack state for 172.17.0.2: invalid argument"

@powersj
Copy link
Contributor

powersj commented Dec 20, 2021

Now I'm wondering if the kernel even has support for CAP_NET_BIND_SERVICE or other kernel options required for this to work. Can you try the following and hope it returns 0:

capsh --supports=cap_net_bind_service; echo $?
capsh --supports=cap_net_raw; echo $?

The other thing is to confirm docker is not using aufs:

docker info

@ptxmac
Copy link
Author

ptxmac commented Dec 20, 2021

I posted docker info above: #561 (comment)

Unfortunately, docker IS using aufs on DSM. When and if Synology will ever decide to upgrade to a better default config is unknown. Historically they have been very to fix anything docker related sadly.

In more positive news, the caps are supported:

# capsh --supports=cap_net_bind_service; echo $?
0
# capsh --supports=cap_net_raw; echo $?
0

@powersj
Copy link
Contributor

powersj commented Dec 20, 2021

Ah sorry I didn't see the drop down in that comment!

At this point I think this means setting these capabilities, while supported by the kernel, is not supported due to the use of aufs. I'm thinking we can wrap the set_cap in at least an error message if it fails to run.

@ptxmac
Copy link
Author

ptxmac commented Dec 21, 2021

I'm thinking we can wrap the set_cap in at least an error message if it fails to run.

I think that would be a fine solution. The capabilities are only needed for sending ping packages while not running as root right?

@powersj
Copy link
Contributor

powersj commented Dec 22, 2021

I think that would be a fine solution. The capabilities are only needed for sending ping packages while not running as root right?

One capability allows sending ping packages and the other capability allows binding to privileged ports.

I have an MP up with the fix, but I will want someone to review/bikeshed the error message with me.

@WS-Dave
Copy link

WS-Dave commented Dec 22, 2021

I had what may be a similar issue using newer version of telegraf on Synology Disktation Docker.

As indicated at influxdata/telegraf#10302

I added the user flag as follows
-u "telegraf"
to my docker run command (to set up the container)
and It Just Works

Hope this helps!

@Foxlik
Copy link

Foxlik commented Dec 31, 2021

I'm actually running into the same issue because of

      --read-only                      Mount the container's root filesystem as read only

I'm providing the capabilities needed via

      --cap-add list                   Add Linux capabilities

Can we introduce an environment variable to do the setcap?
Or maybe something like:

    setcap cap_net_raw,cap_net_bind_service+ep /usr/bin/telegraf || echo "^^ Could not set capabilities, some networking plugins may not work as expected." > /dev/stderr

@powersj
Copy link
Contributor

powersj commented Jan 4, 2022

@Foxlik please see #562 for not outright failing. I would like to avoid creating/adding any environment variables for now.

@Foxlik
Copy link

Foxlik commented Jan 4, 2022

@powersj Great! Works for me. I just did not dig deep enough to find that PR. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants