Skip to content

Fix missing Metricbeat events when kube-state-metrics denylists *_created and reduce ProcessMetrics CPU#51255

Merged
MichaelKatsoulis merged 13 commits into
elastic:mainfrom
MichaelKatsoulis:fix/ksm-state-service-storageclass-event-anchor
Jun 22, 2026
Merged

Fix missing Metricbeat events when kube-state-metrics denylists *_created and reduce ProcessMetrics CPU#51255
MichaelKatsoulis merged 13 commits into
elastic:mainfrom
MichaelKatsoulis:fix/ksm-state-service-storageclass-event-anchor

Conversation

@MichaelKatsoulis

@MichaelKatsoulis MichaelKatsoulis commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Proposed commit message

1. Fix missing state_service and state_storageclass events on OpenShift

When kube-state-metrics is configured with --metric-denylist=.*_created
(default on OpenShift), these metricsets produce zero events because
kube_service_created / kube_storageclass_created are the only
event-anchoring metrics in their mappings.

Promote InfoMetric data to standalone events in the Prometheus helper
when no event-creating metrics produce results, so label metadata from
_info and _spec_type gauges is not silently dropped.

2. Reduce CPU usage in ProcessMetrics InfoMetric merging

mapstr.Flatten() was called inside the inner events loop for every
infoMetric, re-allocating and re-traversing the same map on every
iteration. Moved the Flatten() call outside the inner events loop so it runs once per infoMetric instead of once per infoMetric × event combination.

Benchmark results (state_container-like workload):

Containers Before (ns/op) After (ns/op) Improvement
100 2,353,000 1,481,000 37% faster
500 42,038,000 18,314,000 56% faster
1000 168,472,000 65,860,000 61% faster

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Related issues

@MichaelKatsoulis MichaelKatsoulis requested a review from a team as a code owner June 15, 2026 10:26
@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 15, 2026
@MichaelKatsoulis MichaelKatsoulis marked this pull request as draft June 15, 2026 10:26
@github-actions

Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.

@MichaelKatsoulis MichaelKatsoulis added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Jun 15, 2026
@botelastic botelastic Bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 15, 2026
@mergify

mergify Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @MichaelKatsoulis? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 5f701ec0-ae9d-4a0f-a474-1087b0466c47

📥 Commits

Reviewing files that changed from the base of the PR and between ad9ef20 and 10e3514.

📒 Files selected for processing (1)
  • metricbeat/helper/prometheus/prometheus.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • metricbeat/helper/prometheus/prometheus.go

📝 Walkthrough

Walkthrough

ProcessMetrics in metricbeat/helper/prometheus/prometheus.go optimizes info-label flattening by precomputing info.Labels.Flatten() once per info item instead of repeatedly in the event loop. It then adds a conditional block: when the events map is empty after processing all metric families, accumulated infoMetrics are promoted to standalone events by grouping and merging metadata for matching label sets and applying mapping.ExtraFields. Two Prometheus text fixtures and a new table-driven test TestInfoMetricPromotionWhenNoEventsCreated verify promotion occurs only when no other metrics produce events and that promoted events omit the created field. A performance helper and benchmark exercise the promotion path, and a changelog fragment documents the bug fix.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR addresses both objectives from issue #34074: promoting InfoMetric data to standalone events when event-creating metrics are filtered out [#34074], and optimizing ProcessMetrics performance by moving Flatten() outside inner loops [#34074].
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: changelog fragment for bug documentation, Prometheus helper modifications for InfoMetric promotion and Flatten() optimization, and comprehensive test coverage for the new behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@MichaelKatsoulis MichaelKatsoulis added the backport-active-all Automated backport with mergify to all the active branches label Jun 16, 2026
@MichaelKatsoulis MichaelKatsoulis marked this pull request as ready for review June 16, 2026 07:56
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@MichaelKatsoulis MichaelKatsoulis requested a review from gizas June 16, 2026 13:06
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jun 16, 2026
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@pierrehilbert pierrehilbert requested a review from orestisfl June 17, 2026 06:44
@MichaelKatsoulis MichaelKatsoulis changed the title Fix missing Metricbeat events when kube-state-metrics denylists *_created Fix missing Metricbeat events when kube-state-metrics denylists *_created and reduce ProcessMetrics CPU Jun 18, 2026
@MichaelKatsoulis

Copy link
Copy Markdown
Contributor Author

@orestisfl Can I have a review on this one ?

Comment thread metricbeat/helper/prometheus/prometheus.go
Comment thread metricbeat/helper/prometheus/prometheus.go
@github-actions

Copy link
Copy Markdown
Contributor

TL;DR

The failing Buildkite step only exposes Error: failed modules: cockroachdb; the captured log does not include the CockroachDB test failure body. I don’t see evidence that this PR’s ProcessMetrics change caused it: the CockroachDB status metricset is wired through the Prometheus collector metricset, not the helper mapping path changed in this PR.

Remediation

  • Retry the failed :ubuntu: x-pack/metricbeat: Go Integration Tests (Module) job, or inspect the uploaded x-pack/metricbeat/build/TEST-go-integration-cockroachdb.out.json / .xml artifacts for the actual TestFetch failure.
  • If it reproduces, run the narrow integration test with verbose output: cd x-pack/metricbeat && go test -v -tags=oracle,integration ./module/cockroachdb/status -run TestFetch.
Investigation details

Root Cause

Inconclusive from the available log data. The prefetched Buildkite log ends with only the module-level failure summary, not the CockroachDB assertion/error output.

The changed PR code is in metricbeat/helper/prometheus/prometheus.go:234-283, specifically ProcessMetrics info-metric merging/promotion. The failing module’s manifest uses the Prometheus collector path instead: x-pack/metricbeat/module/cockroachdb/status/manifest.yml:1-6 contains input.module: prometheus, metricset: collector, and metrics_path: /_status/vars.

That collector path fetches Prometheus families directly in metricbeat/module/prometheus/collector/data.go:149 and converts them via GeneratePromEvents in metricbeat/module/prometheus/collector/collector.go:52-178; it does not call ProcessMetrics.

Evidence

  • Build: https://buildkite.com/elastic/beats/builds/48005
  • Failed job/step: :ubuntu: x-pack/metricbeat: Go Integration Tests (Module) in beats-xpack-metricbeat build 33498
  • Key log excerpt from /tmp/gh-aw/buildkite-logs/beats-xpack-metricbeat-ubuntu-x-packmetricbeat-go-integration-tests-module.txt:17-22:
Error: failed modules: cockroachdb

^^^ +++
🚨 Error: The command exited with status 1
^^^ +++
user command error: exit status 1
  • The same log shows CockroachDB artifacts were uploaded but not prefetched, including x-pack/metricbeat/build/TEST-go-integration-cockroachdb.xml and x-pack/metricbeat/build/TEST-go-integration-cockroachdb.out.json at lines 67 and 133.

Verification

  • Ran cd x-pack/metricbeat && go test -run '^$' -tags=oracle,integration ./module/cockroachdb/status; the package compiled successfully with ok ... [no tests to run].
  • Full TestFetch was not run here because this environment does not support Docker-in-Docker, which the integration test requires through compose.EnsureUp(t, "cockroachdb") at x-pack/metricbeat/module/cockroachdb/status/status_integration_test.go:31-35.

Follow-up

If the retry fails again, the next actionable data is the CockroachDB JSON/XML test artifact rather than the top-level Buildkite log.


What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@MichaelKatsoulis

Copy link
Copy Markdown
Contributor Author

/test

@MichaelKatsoulis MichaelKatsoulis enabled auto-merge (squash) June 22, 2026 12:53
@MichaelKatsoulis MichaelKatsoulis merged commit a84fb68 into elastic:main Jun 22, 2026
46 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport 9.4 9.3 8.19

@mergify

mergify Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

MichaelKatsoulis added a commit that referenced this pull request Jun 22, 2026
…ated and reduce ProcessMetrics CPU (#51255) (#51411)

* Fix missing Metricbeat events when kube-state-metrics denylists *_created

* add changelog

* linter error

* Reduce CPU in ProcessMetrics by hoisting Flatten out of inner loop

* linter error

* add inline comment about dedup key metrics

(cherry picked from commit a84fb68)

Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
MichaelKatsoulis added a commit that referenced this pull request Jun 26, 2026
…ated and reduce ProcessMetrics CPU (#51255) (#51410)

* Fix missing Metricbeat events when kube-state-metrics denylists *_created

* add changelog

* linter error

* Reduce CPU in ProcessMetrics by hoisting Flatten out of inner loop

* linter error

* add inline comment about dedup key metrics

(cherry picked from commit a84fb68)

Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
MichaelKatsoulis added a commit that referenced this pull request Jun 29, 2026
…ated and reduce ProcessMetrics CPU (#51255) (#51409)

* Fix missing Metricbeat events when kube-state-metrics denylists *_created

* add changelog

* linter error

* Reduce CPU in ProcessMetrics by hoisting Flatten out of inner loop

* linter error

* add inline comment about dedup key metrics

(cherry picked from commit a84fb68)

Co-authored-by: Michalis Katsoulis <michaelkatsoulis88@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing two metricsets from kube-state-metrics of the kubernetes module

4 participants