Skip to content

Fix UTF-8 decoding error for multi-byte sequence at chunk boundary#13

Merged
k-lukas merged 1 commit intorewritefrom
karnowski/fix-utf8-chunk-boundary
Mar 18, 2026
Merged

Fix UTF-8 decoding error for multi-byte sequence at chunk boundary#13
k-lukas merged 1 commit intorewritefrom
karnowski/fix-utf8-chunk-boundary

Conversation

@k-lukas
Copy link

@k-lukas k-lukas commented Mar 18, 2026

Summary

HTTP chunked transfer encoding can split multi-byte UTF-8 characters (like box-drawing characters , ) across chunk boundaries. The previous code decoded each chunk independently, which caused spurious "Invalid UTF-8 in response" errors on perfectly valid output — particularly with server-side formats like PSQL that use box-drawing characters in table borders.

This PR:

  • Extracts a ChunkDecoder struct that carries incomplete multi-byte sequences across chunk boundaries, only reporting an error when bytes are genuinely invalid
  • Improves the error message for real UTF-8 errors to include the byte position, a hex dump around the bad bytes, and a text preview — making it much easier to diagnose whether the issue is corrupted data, wrong encoding, or unexpected binary content
  • Adds 11 unit tests covering split 2/3/4-byte characters, byte-at-a-time feeding, truncated streams, invalid bytes, and global position tracking

@k-lukas k-lukas changed the base branch from main to rewrite March 18, 2026 09:48
@k-lukas k-lukas requested a review from tobias-fire March 18, 2026 09:51
@k-lukas
Copy link
Author

k-lukas commented Mar 18, 2026

@cursor review

@cursor
Copy link

cursor bot commented Mar 18, 2026

Skipping Bugbot: Bugbot is disabled for this repository. Visit the Bugbot dashboard to update your settings.

@k-lukas k-lukas merged commit 0c19d51 into rewrite Mar 18, 2026
3 checks passed
@k-lukas k-lukas deleted the karnowski/fix-utf8-chunk-boundary branch March 18, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants