Skip to content

fix(filebeat/cel): avoid counting degraded runs as success#48734

Merged
andrewkroh merged 4 commits intoelastic:mainfrom
andrewkroh:filebeat/fix/cel-otel-success-metric
Feb 17, 2026
Merged

fix(filebeat/cel): avoid counting degraded runs as success#48734
andrewkroh merged 4 commits intoelastic:mainfrom
andrewkroh:filebeat/fix/cel-otel-success-metric

Conversation

@andrewkroh
Copy link
Member

@andrewkroh andrewkroh commented Feb 6, 2026

Proposed commit message

The `input.cel.periodic.program.run.success` OTel metric was incorrectly
counting CEL program evaluation failures as successes, resulting in
artificially inflated success rates. This occurred because success was
recorded when event publication succeeded, ignoring the `isDegraded` flag
that tracks actual evaluation failures.

This change ensures program success is recorded only when publication
succeeds and no degraded execution occurred, and adds a regression test
covering error-event runs via the OTel exporter.

Fixes #48714

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Related issues

The `input.cel.periodic.program.run.success` OTel metric was incorrectly
counting CEL program evaluation failures as successes, resulting in
artificially inflated success rates. This occurred because success was
recorded when event publication succeeded, ignoring the `isDegraded` flag
that tracks actual evaluation failures.

This change ensures program success is recorded only when publication
succeeds and no degraded execution occurred, and adds a regression test
covering error-event runs via the OTel exporter.

Fixes elastic#48714
@andrewkroh andrewkroh added Filebeat Filebeat bugfix Team:Security-Service Integrations Security Service Integrations Team backport-9.3 Automated backport to the 9.3 branch labels Feb 6, 2026
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewkroh andrewkroh marked this pull request as ready for review February 6, 2026 17:07
@andrewkroh andrewkroh requested review from a team as code owners February 6, 2026 17:07
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Feb 7, 2026
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@andrewkroh andrewkroh merged commit b7ddc3d into elastic:main Feb 17, 2026
19 of 20 checks passed
mergify bot pushed a commit that referenced this pull request Feb 17, 2026
The `input.cel.periodic.program.run.success` OTel metric was incorrectly
counting CEL program evaluation failures as successes, resulting in
artificially inflated success rates. This occurred because success was
recorded when event publication succeeded, ignoring the `isDegraded` flag
that tracks actual evaluation failures.

This change ensures program success is recorded only when publication
succeeds and no degraded execution occurred, and adds a regression test
covering error-event runs via the OTel exporter.

Fixes #48714

Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
(cherry picked from commit b7ddc3d)
andrewkroh added a commit that referenced this pull request Feb 17, 2026
…48894)

The `input.cel.periodic.program.run.success` OTel metric was incorrectly
counting CEL program evaluation failures as successes, resulting in
artificially inflated success rates. This occurred because success was
recorded when event publication succeeded, ignoring the `isDegraded` flag
that tracks actual evaluation failures.

This change ensures program success is recorded only when publication
succeeds and no degraded execution occurred, and adds a regression test
covering error-event runs via the OTel exporter.

Fixes #48714

(cherry picked from commit b7ddc3d)

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-9.3 Automated backport to the 9.3 branch bugfix Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Security-Service Integrations Security Service Integrations Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Filebeat] CEL input: input.cel.periodic.program.run.success metric counts evaluation failures as successes

5 participants