Skip to content

Conversation

@swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Nov 15, 2025

What does this PR do?

It ensures that the prometheus metrics input we use to monitor the otel collector always runs in said otel collector. The reason this has to be the case is that the input relies on an environment variable we only inject into the otel collector process, and not into the metricbeat process.

The fix is quite hacky, but I think this is acceptable for two reasons:

  • We're redoing how this self-monitoring works in 9.3, and the problem will disappear.
  • This is a symptom of a broader issue, where the self-monitoring config generation doesn't know which runtime a component will actually run in - just what we want it to run in. The fix for this is more involved and probably shouldn't go into a patch release.

Why is it important?

If a user has a monitoring output which isn't supported by the otel runtime, their agent will become unhealthy for no good reason.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

Build agent locally and run it using the following configuration, which falls back to the process runtime, as the indices setting on the elasticsearch output isn't supported for otel:

agent:
  logging:
    to_stderr: true
  monitoring:
    enabled: true
inputs:
- data_stream:
    namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream:
      dataset: system.cpu
    metricsets:
    - cpu
  type: system/metrics
  use_output: default
outputs:
  default:
    username: elastic
    password: xxx
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch
    indices: []

You should see a log line about the prometheus/metrics-monitoring output being skipped, and it shouldn't show up in status.

Related issues

@mergify
Copy link
Contributor

mergify bot commented Nov 15, 2025

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@swiatekm swiatekm added the backport-9.2 Automated backport to the 9.2 branch label Nov 15, 2025
@swiatekm swiatekm force-pushed the fix/prometheus-self-monitoring-output branch from 7070c29 to 8bf8c48 Compare November 17, 2025 12:26
@swiatekm swiatekm added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Nov 17, 2025
@swiatekm swiatekm marked this pull request as ready for review November 17, 2025 12:37
@swiatekm swiatekm requested a review from a team as a code owner November 17, 2025 12:37
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm force-pushed the fix/prometheus-self-monitoring-output branch 2 times, most recently from e65d38b to d52cfcd Compare November 17, 2025 18:26
@swiatekm swiatekm force-pushed the fix/prometheus-self-monitoring-output branch from d52cfcd to 0c87598 Compare November 19, 2025 14:04
@swiatekm swiatekm requested a review from cmacknz November 19, 2025 14:05
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions, nothing major

@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @swiatekm

@swiatekm swiatekm requested a review from pchila November 19, 2025 17:15
@swiatekm swiatekm added the backport-8.19 Automated backport to the 8.19 branch label Nov 19, 2025
@swiatekm swiatekm merged commit a9f4420 into main Nov 20, 2025
23 checks passed
@swiatekm swiatekm deleted the fix/prometheus-self-monitoring-output branch November 20, 2025 11:00
mergify bot pushed a commit that referenced this pull request Nov 20, 2025
…1204)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants

(cherry picked from commit a9f4420)
mergify bot pushed a commit that referenced this pull request Nov 20, 2025
…1204)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants

(cherry picked from commit a9f4420)
swiatekm added a commit that referenced this pull request Nov 20, 2025
…1204)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants

(cherry picked from commit a9f4420)
swiatekm added a commit that referenced this pull request Nov 20, 2025
…1204) (#11283)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants

(cherry picked from commit a9f4420)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
swiatekm added a commit that referenced this pull request Nov 21, 2025
…1204) (#11284)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants

(cherry picked from commit a9f4420)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
hayotbisonai pushed a commit to hayotbisonai/elastic-agent that referenced this pull request Nov 23, 2025
…astic#11204)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants
swiatekm added a commit that referenced this pull request Nov 24, 2025
…1204)

* Ensure monitoring the Otel collector never runs in a beat process

* Add changelog entry

* Move log lines to constants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-8.19 Automated backport to the 8.19 branch backport-9.2 Automated backport to the 9.2 branch bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

5 participants