Skip to content

fix(logs): Recover log containers with lone Unicode surrogates#5833

Closed
antonis wants to merge 2 commits intomasterfrom
antonis/fix-lone-surrogate-log-container
Closed

fix(logs): Recover log containers with lone Unicode surrogates#5833
antonis wants to merge 2 commits intomasterfrom
antonis/fix-lone-surrogate-log-container

Conversation

@antonis
Copy link
Copy Markdown

@antonis antonis commented Apr 14, 2026

Summary

  • When a log container payload contains JSON-escaped lone surrogates (\uD800\uDFFF), serde_json rejects the entire payload, discarding all logs in the batch — not just the malformed one
  • Adds a fallback in log container parsing: on deserialization failure, scans the raw payload for lone surrogates, replaces them with \uFFFD (Unicode replacement character), and re-parses
  • Zero overhead on the happy path — sanitization only runs when parsing already failed
  • Scoped to log containers only (spans and trace metrics are not affected)
  • Emits a logs.container.surrogate_sanitized metric when sanitization is triggered

Ref: getsentry/sentry-react-native#5186
Related JS SDK fix: getsentry/sentry-javascript#20245 — this Relay fix covers all SDKs (Python, Go, Ruby, etc.) at the ingestion boundary, so the JS SDK fix could be dropped in favour of this

Test plan

  • Unit tests for sanitize_lone_surrogates: lone high/low surrogates, valid pairs preserved, mixed cases, boundary conditions
  • Integration tests for expand_log_container: full fallback path with lone surrogate, clean data unchanged
  • cargo fmt passes
  • cargo clippy passes (no warnings)
  • cargo test --all-features passes (24/24 relevant tests)

🤖 Generated with Claude Code

When a log container payload contains JSON-escaped lone surrogates
(\uD800–\uDFFF), serde_json rejects the entire payload, discarding
all logs in the batch. This is a data loss amplification issue where
a single malformed log entry causes the entire container to be dropped.

This adds a fallback in log container parsing: on deserialization
failure, the raw payload is scanned for lone surrogates and they are
replaced with the Unicode replacement character (\uFFFD). The
sanitized payload is then re-parsed. This only runs on the error path,
so there is zero overhead on valid payloads.

Scoped to log containers only. Spans and trace metrics are not affected.

Ref: getsentry/sentry-react-native#5186

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 568b972. Configure here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@antonis antonis marked this pull request as ready for review April 14, 2026 13:39
@antonis antonis requested a review from a team as a code owner April 14, 2026 13:39
@loewenheim
Copy link
Copy Markdown
Contributor

Sorry, but we won't be moving forward with this. We require payloads to be valid UTF-8 after JSON decoding in Relay. SDKs sending non-compliant data should be fixed at the SDK level.

However, this PR and the linked issue raise a very valid point: we should probably not discard an entire item container because of one malformed item. I've opened #5837 to track improving this.

@loewenheim loewenheim closed this Apr 14, 2026
@antonis
Copy link
Copy Markdown
Author

antonis commented Apr 14, 2026

Thank you for the feedback @loewenheim 🙇 Makes sense 👍
I'll handle this on the SDK side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants