Skip to content

otelcol: drop unparseable container logs quietly + filter nil container.name#15

Merged
samcm merged 1 commit into
masterfrom
samcm/otelcol-drop-quiet
May 19, 2026
Merged

otelcol: drop unparseable container logs quietly + filter nil container.name#15
samcm merged 1 commit into
masterfrom
samcm/otelcol-drop-quiet

Conversation

@samcm
Copy link
Copy Markdown
Member

@samcm samcm commented May 19, 2026

Filelog 'container' operator hits add_metadata_from_filepath failures during container churn and logs them at error level. otelcol then tails its own stderr via filelog and re-ships those errors, ~10x amplification per failure (stack traces). on_error: drop_quiet silences and drops the failed entry; updated filter expr drops entries with nil container.name as a safety net. Was ~18% of blob-devnet-0 log volume.

…er.name

The filelog 'container' operator hits add_metadata_from_filepath failures during container churn and logs them at error. Those errors then re-enter the pipeline via filelog tailing otelcol's own stderr, multiplying volume ~10x per failure (stack traces). Adds on_error: drop_quiet to silence and drop the failed entry; updates the exclude filter to also drop entries with nil container.name as a safety net.
@qu0b-reviewer
Copy link
Copy Markdown

qu0b-reviewer Bot commented May 19, 2026

🤖 qu0b-reviewer

Summary

Two changes to the filelog operator pipeline: (1) on_error: drop_quiet added to the container operator so malformed Docker JSON log lines don't propagate errors; (2) the filter expr changed from != nil and matches to == nil or matches, intentionally allowing nil container.name through alongside the allowlist. The logic is correct — both nil and matched names are now kept.

Issues

  • 🟢 all.yaml:362 — The negated expression flips the polarity correctly, but the diff swaps a plain matches (whole-string anchor implied by Otelcol regex semantics) for an explicit ^...$ anchored pattern at the same time. That's a pre-existing style improvement, not part of this PR's intent. No functional issue, but worth confirming the anchoring wasn't accidentally lost if there were edge cases before.

Suggestions

  • 🟢 all.yaml:359on_error: drop_quiet is good. If it becomes necessary to debug why logs are being dropped silently in production, there's no alerting surface for this. Consider adding a log operator immediately after the container one (or a second branch) with on_error: send to count drops, even if they don't get forwarded — so the drop rate is observable without surfacing errors upstream.

Reviewed @ 46d96585
"First law of programming: whatever the bug, blame DNS."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant