fix(logs): Recover log containers with lone Unicode surrogates#5833
Closed
fix(logs): Recover log containers with lone Unicode surrogates#5833
Conversation
When a log container payload contains JSON-escaped lone surrogates (\uD800–\uDFFF), serde_json rejects the entire payload, discarding all logs in the batch. This is a data loss amplification issue where a single malformed log entry causes the entire container to be dropped. This adds a fallback in log container parsing: on deserialization failure, the raw payload is scanned for lone surrogates and they are replaced with the Unicode replacement character (\uFFFD). The sanitized payload is then re-parsed. This only runs on the error path, so there is zero overhead on valid payloads. Scoped to log containers only. Spans and trace metrics are not affected. Ref: getsentry/sentry-react-native#5186 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22b98ad to
568b972
Compare
Open
5 tasks
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 568b972. Configure here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
Sorry, but we won't be moving forward with this. We require payloads to be valid UTF-8 after JSON decoding in Relay. SDKs sending non-compliant data should be fixed at the SDK level. However, this PR and the linked issue raise a very valid point: we should probably not discard an entire item container because of one malformed item. I've opened #5837 to track improving this. |
Author
|
Thank you for the feedback @loewenheim 🙇 Makes sense 👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
\uD800–\uDFFF),serde_jsonrejects the entire payload, discarding all logs in the batch — not just the malformed one\uFFFD(Unicode replacement character), and re-parseslogs.container.surrogate_sanitizedmetric when sanitization is triggeredRef: getsentry/sentry-react-native#5186
Related JS SDK fix: getsentry/sentry-javascript#20245 — this Relay fix covers all SDKs (Python, Go, Ruby, etc.) at the ingestion boundary, so the JS SDK fix could be dropped in favour of this
Test plan
sanitize_lone_surrogates: lone high/low surrogates, valid pairs preserved, mixed cases, boundary conditionsexpand_log_container: full fallback path with lone surrogate, clean data unchangedcargo fmtpassescargo clippypasses (no warnings)cargo test --all-featurespasses (24/24 relevant tests)🤖 Generated with Claude Code