Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/resourcedetection], [receiver/dockerstats] Collector cannot query Docker socket in official contrib images #11791

Open
mx-psi opened this issue Jun 29, 2022 · 18 comments
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed priority:p2 Medium processor/resourcedetection Resource detection processor receiver/dockerstats

Comments

@mx-psi
Copy link
Member

mx-psi commented Jun 29, 2022

Describe the bug

The docker detector from the resource detection processor and the dockerstats receiver do not work on official opentelemetry-collector-contrib images, or any other image that runs the Collector under a user other than root.

Steps to reproduce

Run the resource detection processor docker detector or the dockerstats receiver, while mounting the /var/run/docker.sock socket:

docker run -v /var/run/docker.sock:/var/run/docker.sock:ro -v <mount config here> otel/opentelemetry-collector-contrib

What did you expect to see?

The Docker detector should add the host.name of the host machine, and its operating system.

The Docker stats receiver should produce valid metrics.

What did you see instead?

Both components fail because of lack of permissions

What version did you use?

Can be reproduced on the latest version, happens since v0.40.0 (more specifically, since #6380).

What config did you use?

For both components the default configuration on the README can reproduce this; see e.g. the resource detection processor:

processors:
  resourcedetection/docker:
    detectors: [env, docker]
    timeout: 2s
    override: false

Environment

This happens on every Docker version and every Collector image since v0.40.0

Additional context

This happens since #6380, because of a permissions issue: the mounted socket is only readable by root. AFAICT, Docker does not currently allow mounting volumes with permissions for a specific user (see moby/moby#2259), and we can't chown the socket at build time, so we have to choose between running as rootless or supporting this.

This is not a problem on downstream or custom distributions that run as root.

For getting the hostname on the Docker detector, a workaround is to override the OS hostname on the Docker image using something like --hostname $(hostname). I don't know of a workaround for getting the hosts' operating system or getting the metrics on the dockerstats receiver.

@mx-psi mx-psi added bug Something isn't working processor/resourcedetection Resource detection processor labels Jun 29, 2022
@mx-psi
Copy link
Member Author

mx-psi commented Jun 29, 2022

@open-telemetry/collector-contrib-maintainer I assume we want to keep running under a non-root user. If we don't find a solution that works when not running as root, should the docker detector be deprecated and eventually removed? This would still be useful on downstream distros that run as root, but I don't know if that is a common case.

@mx-psi
Copy link
Member Author

mx-psi commented Jun 30, 2022

It is possible to override the user by doing docker run -u 0, but I don't feel very comfortable telling people to run as root if our official policy is to run as non-root.

@TylerHelmuth
Copy link
Member

pinging @jrcamp @pmm-sumo @Aneurysm9 @dashpole as code owners

@TylerHelmuth TylerHelmuth added the priority:p2 Medium label Jul 1, 2022
@dashpole
Copy link
Contributor

Using the docker socket is a really high level of privilege generally, and I agree with it not being the recommended configuration. Mounting individual files (e.g. /etc/hostname) seems like a better way to get some of the information you are interested in than fetching it from docker. I haven't looked into it at all, but I wonder if the system detector with a few files mounted readonly would work the same as the docker detector.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 9, 2022
@mx-psi mx-psi removed the Stale label Nov 9, 2022
@mx-psi
Copy link
Member Author

mx-psi commented Nov 9, 2022

The dockerstatsreceiver also queries the Docker socket and thus suffers from the same problem https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/dockerstatsreceiver#configuration

@ErvalhouS
Copy link

The dockerstatsreceiver also queries the Docker socket and thus suffers from the same problem https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/dockerstatsreceiver#configuration

This is also why I'm here

@mx-psi mx-psi changed the title [processor/resourcedetection] 'docker' detector does not work in official contrib images [processor/resourcedetection], [receiver/dockerstats] Collector cannot query Docker socket in official contrib images Dec 28, 2022
@github-actions
Copy link
Contributor

Pinging code owners for receiver/dockerstats: @rmfitzpatrick @jamesmoessis. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mx-psi
Copy link
Member Author

mx-psi commented Dec 29, 2022

I think realistically we only have two options here:

  • (Option 1) deprecate and eventually remove these components
  • (Option 2) tell people to run the Collector as root via e.g. docker run -u 0 and explain the security implications

I feel like, at least for the dockerstats receiver, option 1 would cause a lot of pain, so I would rather work on option 2.

@rmfitzpatrick
Copy link
Contributor

This is a general docker concern and the container user needs to be in the host's docker group:

$ docker run -v /var/run/docker.sock:/var/run/docker.sock:ro --group-add $(stat -c '%g' /var/run/docker.sock) otel/opentelemetry-collector-contrib <...>
# or if specifying the user:group directly
$ docker run -v /var/run/docker.sock:/var/run/docker.sock:ro --user "some.user:$(stat -c '%g' /var/run/docker.sock)" otel/opentelemetry-collector-contrib <...>

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 6, 2023
@dmitryax dmitryax removed the Stale label Mar 6, 2023
@gbbr
Copy link
Member

gbbr commented Mar 28, 2023

I don't want to make a strong promise, but I am interested in working on this and will try out the proposal above and report back. Hopefully I can reproduce.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 29, 2023
@carlreid
Copy link

@gbbr I see you're making a lot of nice progress relating to the dockerstats receiver, but I also just hit this permission issue when trying to mount the docker.sock. Did you manage to figure something out that works without needing to run as privileged? Especially since we'd like to take advantage of #22149.

@gbbr
Copy link
Member

gbbr commented Jun 20, 2023

@carlreid on this front specifically no. I am not aware of a better method unfortunately.

@github-actions github-actions bot removed the Stale label Jun 21, 2023
@R-Sommer
Copy link

For docker-compose.yml group_add could be added with docker's group ID of the host e.g.:

group_add:
  - "998"

Double quotes are necessary otherwise this error would occur:

* 'group_add[0]' expected type 'string', got unconvertible type 'int', value: '998'

Using "docker" instead of its ID result in this error:

Error response from daemon: Unable to find group docker: no matching entries in group file

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 25, 2023
@mx-psi mx-psi added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Sep 25, 2023
@schewara
Copy link

As it was already mentioned in #11791 (comment),
setting the user parameter with the required <UID>:<GID> is definitely the most straight forward solution.

In my opinion the permission issue is a common theme for all containers trying to access the docker socket or other files on the host system and is actually a good thing, which forces everyone to take a step back and re-think if the access is really needed.

I also think this is outside of the collectors scope, as you can never foresee what the runtime environment looks like.

When thinking of a scenario, where the docker engine is running in rootless mode and the individual permissions of the user on the host OS will most certainly break everything all over again.

Just for completeness, here the snipped from our compose file which for us works without any issues.

    ...
    volumes: 
      - type: bind
        source: /path/to/otelcol-config.yaml
        target: /etc/otelcol/config.yaml
      - type: bind 
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: True
    user: 10001:998
    command: ['--config=file:/etc/otelcol/config.yaml']
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed priority:p2 Medium processor/resourcedetection Resource detection processor receiver/dockerstats
Projects
None yet
Development

No branches or pull requests

10 participants