Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic-agent] Evaluate whether root is the correct user to run the elastic-agent docker image as #27648

Closed
andrewvc opened this issue Aug 30, 2021 · 11 comments
Labels
docs enhancement Team:Elastic-Agent Label for the Agent team Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.16.0

Comments

@andrewvc
Copy link
Contributor

andrewvc commented Aug 30, 2021

There's currently some confusion about which user to run dockerized `elastic-agent containers as. This has popped up in elastic/cloud-on-k8s#4794

We currently do create an elastic-agent user in the Dockerfile for agent, but since the docker agent isn't documented we have no established best practice here.

We also have two beats in conflict, Heartbeat does not allow script based monitors to run as the root user. This has created confusion for users who do want to run agent as a root user as in elastic/cloud-on-k8s#4794 . It seems that the main rationale for root is that metricbeat requires root for hostpath volumes, which are recommended against for security reasons. There may be other reasons for root I'm not aware.

So, we need to resolve these problems:

  1. We need clear advice, either we tell users to run as root or not. If we do prefer root there's no need for the elastic-agent user.
  2. We need to ensure that the choose made in 1. is secure. Since container root is not a 'real' root, AFAICT there's not a huge risk in running as UID 0 in a docker container. Esp. if the container daemon is set to run as a regular user (which is the best defense).
  3. If running as non-root we have issues with setcap, as covered in [elastic-agent][heartbeat] Heartbeat binary should have setcap privs for ICMP ping #27651

Would appreciate thoughts from @ruflin @blakerouse and others.

@andrewvc andrewvc added enhancement docs Team:obs-ds-hosted-services Label for the Observability Hosted Services team [zube]: Investigate Team:Elastic-Agent Label for the Agent team labels Aug 30, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@andrewvc andrewvc changed the title [elastic-agent] Stop recommending docker container be run as root [elastic-agent] Evaluate whether root is the correct user to run the elastic-agent docker image as Aug 30, 2021
@joshbressers
Copy link

It is considered a best practice to not run as root. Google has a nice blog explaining this
https://cloud.google.com/architecture/best-practices-for-operating-containers#avoid_running_as_root

@andrewvc
Copy link
Contributor Author

@joshbressers makes sense, +1 on running as non-root.

I've also opened this companion issue to cover the issues we have using linux capabilities in agent: #27651

@blakerouse
Copy link
Contributor

I think we need to ensure that running as non-root allows access to all the same contents the Elastic Agent needs. If the hostfs is bind mounted into the container and its not running as root will it still be able to collect all the needed system logs and metrics? How will this work once we can run Endpoint Security in the container? Will Elastic Agent be able to spawn that not being root? Can we still inject eBPF without being root?

There is a lot of open questions that I think we need product to be involved into before we can just drop root.

@andrewvc
Copy link
Contributor Author

According to this relatively this LWN article only operations on sockets can be done in an unprivileged manner.

That said, there's some middle ground here. We could run the agent as root, but specify else where which user each beat runs as then run the binary as the less privileged elastic-agent user when we execve to beats that don't need that privilege.

@joshbressers
Copy link

@blakerouse ACK, it's a journey that often takes a long time.

FWIW, there is a CAP_BPF capability. I admit I've never used it though

@andrewvc
Copy link
Contributor Author

More detail on CAP_BPF is available here . Does this satisfy the needs of metricbeat @blakerouse ?

@blakerouse
Copy link
Contributor

We currently do not have support for running Endpoint in the container at the moment. It's just ensuring that we do not switch to one way to then need to switch it back later. That is why I think it's important to get product management involved so that we don't rush to a decision and change it, without knowing the future vision of the project and what permissions we will need.

I would prefer that Elastic Agent does not run as root in a container as well, but we need to know the side-effects of that as of today and in the future. Elastic Agent does run as root on an installed host, and that cannot change (because of Endpoint). So keeping them at the same level in containers is also another idea instead of having 2 levels of permissions (1 for installed on a host vs inside of a container).

The CAP_BPF might work in the future for Endpoint once its starts protection of container systems like Kubernetes, but again that is yet to be known.

As for metricbeat now, I think we would need to just try it and see if running as not root with a bind mounted hostfs will work. That is something that I do not know enough about to give enough context to give a conclusion.

@ruflin
Copy link
Member

ruflin commented Sep 1, 2021

In general I like the idea of not having to run as root. If I remember correctly, the root part is likely coming from the early days of Beats and we took it over to Elastic Agent. My biggest concern is that we don't know yet in detail what will break and if anything will break without having root and we need to investigate this before making any decision. @andrewkroh You might have some more insights here?

@andrewvc
Copy link
Contributor Author

andrewvc commented Sep 14, 2021

Closing in favor of elastic/elastic-agent#147 which has more detail

@zube zube bot added [zube]: Done and removed [zube]: Ready labels Sep 14, 2021
andrewvc added a commit that referenced this issue Oct 13, 2021
…#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
mergify bot pushed a commit that referenced this issue Oct 13, 2021
…#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

(cherry picked from commit a78a980)

# Conflicts:
#	NOTICE.txt
#	dev-tools/packaging/packages.yml
#	go.mod
newly12 pushed a commit to newly12/beats that referenced this issue Oct 13, 2021
…elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
andrewvc pushed a commit that referenced this issue Oct 14, 2021
…abilities when possible (#28377)

* [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
fearful-symmetry pushed a commit that referenced this issue Oct 20, 2021
* singleton sysinfo host to avoid frequently collecting host info

* add Host object to Stats object

* update changelog

* set procStats.host to nil if any error calling sysinfo.Host()

* Update aws-lambda-go library version to 1.13.3 (#28236)

* [cloud][docker] use the private docker namespace (#28286)

* [7.x] [DOCS] Update api_key example on elasticsearch output (#28288)

* packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (#28297)

* seccomp: allow clone3 syscall for x86 (#28117)

clone3 is a linux syscall that is now used by glibc starting version
2.34. It is used when pthread_create() gets called. Current seccomp
filters do not allow this syscall leading to crashes like
runtime/cgo: pthread_create failed: Operation not permitted

See elastic/apm-server#6238 for more details

* Osquerybeat: Improve handling of osquery.autoload file, allow customizations (#28289)

Previously the osquery.autoload file was overwritten every time on
osquerybeat start and stamped with our extension.
After the change we check the content of the file and do not overwrite it on
each osquerybeat start. This allows the user to deploy their own
extensions if their want and start osquery with that.

* Osquerybeat: Runner and Fetcher unit tests (#28290)

* Runner and Fetcher unit tests

* Fix header formatting

* Tweak test

* Update go release version 1.17.1 (#27543)

* format of conditional build tags has changed
* matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123
* iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests
* go.mod file contains separate require directive for transitive dependencies

* Move labels and annotations under kubernetes.namespace. (#27917)

* Move labels and annotations under kubernetes.namespace.

* Remove GCP support from Functionbeat (#28253)

* Fix build tags for Go 1.17 (#28338)

* [Elastic Agent] Add ability to communicate with Kibana through service token (#28096)

* Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand.

* Add changelog entry.

* Fix go fmt.

* Add username to ASA Security negotiation log (#26975)

* Add username to ASA Security negotiation log

I added the username user.name field to ASA Security negotiation log line.

* adding support for both formats

* adding changelog entry

* updating geo fields in expected output files

* reverse formatting

* reverting to older version of file

* reverting formatting again

* regenrate golden files again

* remove formatting, ready for review

* fixing missing message due to no newline

* fix dissect pattern to fit correctly

Co-authored-by: Marius Iversen <marius.iversen@elastic.co>

* x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (#28325)

* Redis: remove deprecated fields (#28246)

* Redis: remove deprecated fields

* Disable generator tests temporarily (#28362)

* Windows/perfmon metricset -  remove deprecated perfmon.counters configuration (#28282)

* remove deprecated config

* changelog

* [Filebeat] - S3 Input - Add support for only iterating/accessing only… (#28252)

* [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (#28230)

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations

* Take care of namespace being nil

* [Heartbeat] Setuid to regular user / lower capabilities when possible (#27878)

partial fix for #27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

* mage fmt

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr>
Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: LaZyDK <dennisperto@gmail.com>
Co-authored-by: Marius Iversen <marius.iversen@elastic.co>
Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Co-authored-by: Mariana Dima <mariana@elastic.co>
Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
Icedroid pushed a commit to Icedroid/beats that referenced this issue Nov 1, 2021
…elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.
Icedroid pushed a commit to Icedroid/beats that referenced this issue Nov 1, 2021
* singleton sysinfo host to avoid frequently collecting host info

* add Host object to Stats object

* update changelog

* set procStats.host to nil if any error calling sysinfo.Host()

* Update aws-lambda-go library version to 1.13.3 (elastic#28236)

* [cloud][docker] use the private docker namespace (elastic#28286)

* [7.x] [DOCS] Update api_key example on elasticsearch output (elastic#28288)

* packetbeat/protos/dns: don't render missing A and AAAA addresses from truncated records (elastic#28297)

* seccomp: allow clone3 syscall for x86 (elastic#28117)

clone3 is a linux syscall that is now used by glibc starting version
2.34. It is used when pthread_create() gets called. Current seccomp
filters do not allow this syscall leading to crashes like
runtime/cgo: pthread_create failed: Operation not permitted

See elastic/apm-server#6238 for more details

* Osquerybeat: Improve handling of osquery.autoload file, allow customizations (elastic#28289)

Previously the osquery.autoload file was overwritten every time on
osquerybeat start and stamped with our extension.
After the change we check the content of the file and do not overwrite it on
each osquerybeat start. This allows the user to deploy their own
extensions if their want and start osquery with that.

* Osquerybeat: Runner and Fetcher unit tests (elastic#28290)

* Runner and Fetcher unit tests

* Fix header formatting

* Tweak test

* Update go release version 1.17.1 (elastic#27543)

* format of conditional build tags has changed
* matching of * in regexes was fixed, thus breaking some of our code: golang/go#46123
* iproute package was missing from the new Golang Docker image, thus, we had to add it for our tests
* go.mod file contains separate require directive for transitive dependencies

* Move labels and annotations under kubernetes.namespace. (elastic#27917)

* Move labels and annotations under kubernetes.namespace.

* Remove GCP support from Functionbeat (elastic#28253)

* Fix build tags for Go 1.17 (elastic#28338)

* [Elastic Agent] Add ability to communicate with Kibana through service token (elastic#28096)

* Add ability to communicate with Kibana through service token. Add ability to pass service token to container subcommand.

* Add changelog entry.

* Fix go fmt.

* Add username to ASA Security negotiation log (elastic#26975)

* Add username to ASA Security negotiation log

I added the username user.name field to ASA Security negotiation log line.

* adding support for both formats

* adding changelog entry

* updating geo fields in expected output files

* reverse formatting

* reverting to older version of file

* reverting formatting again

* regenrate golden files again

* remove formatting, ready for review

* fixing missing message due to no newline

* fix dissect pattern to fit correctly

Co-authored-by: Marius Iversen <marius.iversen@elastic.co>

* x-pack/filebeat/module/cisco: loosen time parsing and add group and session type capture (elastic#28325)

* Redis: remove deprecated fields (elastic#28246)

* Redis: remove deprecated fields

* Disable generator tests temporarily (elastic#28362)

* Windows/perfmon metricset -  remove deprecated perfmon.counters configuration (elastic#28282)

* remove deprecated config

* changelog

* [Filebeat] - S3 Input - Add support for only iterating/accessing only… (elastic#28252)

* [Filebeat] - S3 Input - Add support for only iterating/accessing only specific folders or datapaths

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations (elastic#28230)

* Breaking change for 8.0, namespace_annotations replaced by namespace.annotations

* Take care of namespace being nil

* [Heartbeat] Setuid to regular user / lower capabilities when possible (elastic#27878)

partial fix for elastic#27648 , this PR:

Detects if the user is running as root then:
Checks to see if an environment variable BEAT_SETUID_AS (set in our Docker.tmpl) is present
Attempts to Setuid , Setgid and Setgroups to that user / groups
Invokes setcap to drop all privileges except NET_RAW+ep
This PR also fixes the broken syscall filtering in heartbeat, some non-syscall strings were breaking that.

With the changes here elastic-agent can still run as root, but the subprocesses can lower their privileges ASAP. This should also make it possible for heartbeat to safely run ICMP pings and synthetics. Synthetics must run as non-root, but ICMP requires NET_RAW. This lets us be consistent in our docs with the recommendation that elastic-agent run as root.

* mage fmt

Co-authored-by: kaiyan-sheng <kaiyan.sheng@elastic.co>
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Arnaud Lefebvre <a.lefebvre@outlook.fr>
Co-authored-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Co-authored-by: apmmachine <58790750+apmmachine@users.noreply.github.com>
Co-authored-by: Michael Katsoulis <michaelkatsoulis88@gmail.com>
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Co-authored-by: LaZyDK <dennisperto@gmail.com>
Co-authored-by: Marius Iversen <marius.iversen@elastic.co>
Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Co-authored-by: Mariana Dima <mariana@elastic.co>
Co-authored-by: Andrew Cholakian <andrew@andrewvc.com>
@zube zube bot removed the [zube]: Done label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs enhancement Team:Elastic-Agent Label for the Agent team Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.16.0
Projects
None yet
Development

No branches or pull requests

6 participants