Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review prometheusexecreceiver from security perspective #6722

Closed
tigrannajaryan opened this issue Dec 13, 2021 · 16 comments
Closed

Review prometheusexecreceiver from security perspective #6722

tigrannajaryan opened this issue Dec 13, 2021 · 16 comments

Comments

@tigrannajaryan
Copy link
Member

prometheusexecreceiver allows executing prometheus exporters and accepts the command line to execute.

This is potentially a security problem, especially coupled with upcoming remote configuration capabilities. We need to make sure the Collector cannot be compelled to execute arbitrary code.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-contrib-approvers @keitwb I think we have a dangerous functionality in prometheusexecreceiver. It allows executing arbitrary commands with arbitrary parameters.

Some possible ways we can reduce the risks:

  • Disallow executing arbitrary commands. Limit them to a certain directory location and/or filename pattern, which can match only Prometheus exporters.
  • Disallow passing command line parameters. Is this necessary for this receiver? Do exporters need parameters?

A more radical approach if we do not think the above is sufficient:

  • Remove the component from the default build. If someone needs it they can custom-build a Collector and take the responsibility for the consequences.

I would also like to hear from the users of this feature.

@jpkrohling
Copy link
Member

Same as the fluentbit: I'll review and provide my feedback soon.

@keitwb
Copy link
Contributor

keitwb commented Dec 13, 2021

Another possibility is to make these receivers work when the config is hardcoded into the config file used directly by the collector, but disable them when there is any kind of remote config going on. Not sure how easy this would be to distinguish but it would mitigate concerns for potential remote config exploits. Presumably if somebody can alter the config file on disk directly, they probably have nearly the same level of access they would need to execute arbitrary binaries anyway.

@Aneurysm9
Copy link
Member

I'm not sure that we can distinguish between configuration sources at an individual configuration item level. Even if we could, I'm not sure that it is helpful given that network filesystems exist. As for whether the ability to modify the configuration is equivalent to the ability to run executables, I'm not sure that is the case. In the event that the collector is running as a user with elevated privileges but the configuration can be modified without those privileges, this could lead to privilege escalation. Would that require some level of misconfiguration on the part of the operator? Probably, but I think it is a scenario that can't be discounted when considering the risk posed by this capability.

@bertysentry
Copy link
Contributor

I understand the security concerns with remote config, but this remote config feature (as I understand it) is not present yet.

WRT the security issue of having a file that defines commands to be executed, I fail to see how this is any different than any shell script, which is literally a text file containing commands to be executed. It's the sysadmin's responsibility to make sure these files cannot be edited by non-authorized people. Now, we would need to make that extremely clear in the documentation of prometheusexecreceiver, fluentbitextension and the jmx receiver.

Note that if a user has the rights to edit the config file, it may also have the rights to edit the collector's executable, and make it run anything possible. Yes, that will require some x64 assembly knowledge, but that's definitely feasible.

A local file (Otel's config file in our case) can be considered trustworthy, while any remote input data (anything received by the oltp-grpc receiver for example) must be considered as a potential threat. That's where you guys need to draw the line, imho.

Last note: to help sysadmins enforce security best practices, it may be helpful to ship the collector as an installable package, with "traditional" directories:

  • ./otel-collector/bin/otel-collector (the executable)
  • ./otel-collector/config/otel-config.yaml (the default config)
  • ./otel-collector/LICENSE
  • etc.

This will help avoid cases of users running /tmp/download/otel-collector --config=/tmp/test.yaml... 😉

@jpkrohling
Copy link
Member

WRT the security issue of having a file that defines commands to be executed

My focus on the review is to make sure that users cannot trick the collector into running untrusted code. Components that spawn processes are intrinsically more prone to errors in this area.

@tigrannajaryan
Copy link
Member Author

I understand the security concerns with remote config, but this remote config feature (as I understand it) is not present yet.

@bertysentry This is currently work in progress, with 2 separate initiatives likely bringing some sort of ability to get configuration from a remote source. We need to prepare for this.

WRT the security issue of having a file that defines commands to be executed, I fail to see how this is any different than any shell script, which is literally a text file containing commands to be executed. It's the sysadmin's responsibility to make sure these files cannot be edited by non-authorized people.

If you look at this issue in isolation, only as an ability to specify commands in a file, sure, it appears equivalent to a shell script. However combined with remote configuration it is a different story. If our code downloads a file containing executable instructions I believe it is our responsibility to make it safe.

A local file (Otel's config file in our case) can be considered trustworthy, while any remote input data (anything received by the oltp-grpc receiver for example) must be considered as a potential threat. That's where you guys need to draw the line, imho.

This is a good point. However, it is not clear what the solution looks like. A couple options I can think of:

  • Don't receive configuration from remote sources. Sure, this remove the threat, but users want this capability. I would prefer to find a solution that allows this feature to exits.
  • Sanitize configuration received from remote sources before it is applied. I am not sure how to do this reliably. Perhaps we have a list of components which cannot be enabled if remote configuration is used or have some other form of sanitization.

Last note: to help sysadmins enforce security best practices, it may be helpful to ship the collector as an installable package, with "traditional" directories

What does this achieve? Are you suggesting to remove --config command line option and hard-code the location of the config file?

@tigrannajaryan
Copy link
Member Author

My focus on the review is to make sure that users cannot trick the collector into running untrusted code. Components that spawn processes are intrinsically more prone to errors in this area.

@jpkrohling Are you thinking about users who have access to local configuration files of the Collector? This scenario was not part of the security threats I was thinking about. Are you assuming a user who has less privileges than the Collector process but who can edit the Collector's config file? Isn't this a very bad sysadmin practice to allow editing a config file of a privileged process? Do we need to try to protect from this?

@jpkrohling
Copy link
Member

Are you thinking about users who have access to local configuration files of the Collector?

No, I'm thinking about non-sanitized input consumed by those components.

@tigrannajaryan
Copy link
Member Author

Are you thinking about users who have access to local configuration files of the Collector?

No, I'm thinking about non-sanitized input consumed by those components.

I see. That's a different valid attack vector. I am not sure why you think externally executed processes are more prone to errors to validate/sanitize their input compared to the code that lives in the Collector process, can you clarify?

@jpkrohling
Copy link
Member

jpkrohling commented Dec 15, 2021

When using user-provided input as part of a string that goes to, say, a log body, there's no problem if it's concatenated with safe input. When loading files or executing external processes (via os.Exec), a non-sanitized input might mean that a different file will be loaded/executed than intended. It's quite possible that none of those components are mixing trusted with untrusted inputs when dealing with files and URLs, but that's what I have in mind to check.

@bertysentry
Copy link
Contributor

bertysentry commented Dec 15, 2021

If you look at this issue in isolation, only as an ability to specify commands in a file, sure, it appears equivalent to a shell script. However combined with remote configuration it is a different story. If our code downloads a file containing executable instructions I believe it is our responsibility to make it safe.

We are on the same page!

This is a good point. However, it is not clear what the solution looks like. A couple options I can think of:

  • Don't receive configuration from remote sources. Sure, this remove the threat, but users want this capability. I would prefer to find a solution that allows this feature to exits.

Personally, I think that remote configuration introduces serious potential security breaches for a monitoring/observability tool, that are not limited to trigger the execution of an arbitrary command line, like for example:

  • stop logs pipelines so that the attackers actions are not visible
  • declare a new exporter to send trace data to the attacker, to analyze the internals of an application, maybe extract credentials or tokens from span attributes
  • enable components that have known vulnerabilities (like one would enable log4j

Remote configuration must be limited to authorized users (Edit: I haven't read the design documentation for this feature, so I don't know what you guys planned for authentication and authorization Now that I've read this doc, I copied this comment over there, and I think mTLS should be strongly encouraged).

Last note: to help sysadmins enforce security best practices, it may be helpful to ship the collector as an installable package, with "traditional" directories
What does this achieve? Are you suggesting to remove --config command line option and hard-code the location of the config file?

You're right: in theory it doesn't achieve much. It's just a practical way to lower the bar to ensure security best practices: you place files where they are usually expected, with the default secured permissions (not 0777 on the default config file, e.g.). And while the --config option should be kept imo, it would be good that it's not required, in which case the collector would use ../config/otel-config.yaml for example. So that in 99% of the cases, the collector is started without any argument, thus uses the configuration file stored in a location that is secure by default, and thus avoiding malicious edits of the config file.

@gouthamve
Copy link
Member

Hi, @jpkrohling asked me to take a look at this, and I came up with two other privilege escalations that have not been covered here:

  1. If a user cannot edit the configuration file, but if they can edit the executable, then they can make the collector run arbitrary code. For example: exec: ./run_exporter.sh is a valid configuration for the receiver. If the user can edit or replace run_exporter.sh, then it is also privilege escalation.

  2. If there is a component inside the collector that writes things out to disk (like a file based log-exporter), user might be able to override the content of ./run_exporter.sh even when they don’t have the ability to change the file themselves. While it is unlikely to change the contents of an arbitrary file, operators need to be careful.

Beyond that, there are some semantic issues like If the exporter service dies, it is automatically restarted by the Otel collector. This means if there is a bug or panic in the exporter, a new process starts and some of the metrics / counters exported are reset. It is usually better to fail scrapes in most cases which would alert the operators who might fix issues.

Finally, imo, I think it should be removed from the default build as suggested by @tigrannajaryan. Managing processes should be out of scope for OTel Collector. There are already process managers like systemd or kubernetes to make sure processes are run and restarted correctly. While I understand the reason for the receiver, it has a lot of non-obvious security holes and I think we should not bundle it in the default release.

I've written a small design doc on this topic to get some clarity, might be useful: https://docs.google.com/document/d/1o-1pYl_NDVCThMDIKciUX8xtBHJFfZo0dJRqpwa-Lew/edit

@bertysentry
Copy link
Contributor

Hi, as @tigrannajaryan mentioned, if a non-privileged user can edit ./run_exporter.sh and have a privileged user execute it, it's not the fault of the OpenTelemetry Collector and prometheusexecreceiver: it's really just a poorly secured system.

Like if a non-privileged user can edit the collector binary file (with some x64 asm knowledge), we're screwed anyway ;-)

If you guys decide to drop the process execution feature altogether for security reasons, don't forget to also remove jmxreceiver and fluentdextension.

@jpkrohling
Copy link
Member

jpkrohling commented Feb 9, 2022

if a non-privileged user can edit ./run_exporter.sh and have a privileged user execute it, it's not the fault of the OpenTelemetry Collector and prometheusexecreceiver: it's really just a poorly secured system.

That's true, although a system is typically compromised by exploiting multiple pieces. The components mentioned here are highly vulnerable ones, easy to be leveraged in an attack.

I'm in favor of not having any of those components as part of any official binaries. It's easy to build a small binary that exposes an OpenMetrics endpoint or sends OTLP to a collector somewhere, getting it managed via systemd or some other orchestrator. If people need an example on how to do it, we can certainly provide it.

@jpkrohling
Copy link
Member

As discussed during the SIG meeting yesterday, the code owners are expected to present a design for making this component secure. If none is provided by 0.46.0, we'll place this component behind a feature gate, eventually removing it.

@bogdandrutu also mentioned that, for the short-term, it might make sense to add a log entry when this component is constructed, stating that this component is under security review and that it should be used with caution.

@dmitryax mentioned that he might work on this if the code owners aren't available.

codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this issue Mar 31, 2022
After many discussions it seems the community is leaning towards removing the components that execute subprocesses. As such, marking the prom_exec receiver as deprecated.

Fixes open-telemetry#6722
codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this issue Mar 31, 2022
After many discussions it seems the community is leaning towards removing the components that execute subprocesses. As such, marking the prom_exec receiver as deprecated.

Fixes open-telemetry#6722
codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this issue Mar 31, 2022
After many discussions it seems the community is leaning towards removing the components that execute subprocesses. As such, marking the prom_exec receiver as deprecated.

Fixes open-telemetry#6722
povilasv referenced this issue in coralogix/opentelemetry-collector-contrib Dec 19, 2022
…#6722)

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
TylerHelmuth added a commit that referenced this issue Aug 3, 2023
…24740)

**Description:**
Removes the deprecated prometheus_exec receiver. This component was
[deprecated over a year
ago](#9058),
even before our now [documented `two minor
releases`](https://github.com/open-telemetry/opentelemetry-collector#deprecated)
timeline.

**Link to tracking Issue:** <Issue number if applicable>

Related to
#6722

**Testing:**
Confirmed `make otelcontribcol` works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants