fix(node-adapter): handle utf-8 characters split across stream chunks by zxhyc131 · Pull Request #1496 · middleapi/orpc

zxhyc131 · 2026-03-24T07:30:25Z

Summary

use StringDecoder to decode stream content safely across chunk boundaries
add tests for JSON and text bodies with UTF-8 characters split between chunks

Problem

Direct chunk.toString() decoding can corrupt UTF-8 characters when multi-byte sequences are split across stream chunks.

Testing

added unit tests for chunk-boundary UTF-8 decoding in JSON body
added unit tests for chunk-boundary UTF-8 decoding in text body

Summary by CodeRabbit

Bug Fixes
- Improved UTF‑8 decoding for streamed request bodies so multi‑byte characters split across chunks are decoded correctly and incomplete final bytes are handled gracefully (replacement character used).
Tests
- Added tests validating JSON and plain‑text payloads with multi‑byte UTF‑8 characters split across stream boundaries, including an incomplete final byte scenario.

gemini-code-assist · 2026-03-24T07:30:39Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the server's ability to process incoming request bodies by addressing a critical issue where multi-byte UTF-8 characters could be corrupted if they were split across different data chunks during streaming. By adopting a more sophisticated decoding mechanism, the change ensures the integrity of all text-based data, particularly for internationalized content, thereby improving the reliability and correctness of data handling.

Highlights

UTF-8 Character Handling: Implemented the use of StringDecoder to correctly handle UTF-8 characters that may be split across stream chunk boundaries, preventing data corruption.
Improved Test Coverage: Added new unit tests specifically designed to validate the robust decoding of UTF-8 characters in both JSON and plain text bodies when they are fragmented across stream chunks.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-24T07:30:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c325df9-2a05-4c84-a5fe-f176578fdfae

📥 Commits

Reviewing files that changed from the base of the PR and between f03d49a and c7b1e64.

📒 Files selected for processing (2)

packages/standard-server-node/src/body.test.ts
packages/standard-server-node/src/body.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/standard-server-node/src/body.test.ts

📝 Walkthrough

Walkthrough

Replaces naive chunk-to-string concatenation with incremental UTF-8 decoding via TextDecoder at runtime and adds tests that verify multi-byte UTF-8 characters split across chunk boundaries, including a case with an intentionally incomplete final byte sequence yielding the replacement character.

Changes

Cohort / File(s)	Summary
Tests — chunked UTF-8 cases `packages/standard-server-node/src/body.test.ts`	Added helpers to construct an `IncomingMessage` from `Buffer` chunks and to split a UTF‑8 string into two `Buffer` parts at a split character; added tests for `application/json` and `text/plain` where multi‑byte characters span chunks, including a case with a final incomplete UTF‑8 sequence expecting the replacement character (`�`).
Runtime — UTF‑8 incremental decoding `packages/standard-server-node/src/body.ts`	Rewrote `_streamToString` to use `TextDecoder` with `{ stream: true }` to incrementally decode normalized `Buffer` chunks, accumulate decoded text, and finalize with `decoder.decode()`; changed `Buffer` import to a value import (`import { Buffer } from 'node:buffer'`) for runtime usage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble bytes split in two,
TextDecoder stitches them through,
When a rune ends half-way, I sigh,
It blooms as � beneath my eye,
Little rabbit hops, tests say woo-hoo. 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: handle utf-8 characters split across stream chunks' directly and clearly summarizes the main change: fixing a UTF-8 decoding issue where multi-byte characters split across stream chunks are corrupted.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly addresses an issue with decoding UTF-8 characters that are split across stream chunks by using StringDecoder. The added tests are a great way to verify the fix and prevent future regressions. I've found a minor issue in one of the new test cases that will cause it to fail, and I've provided a suggestion to fix it. Otherwise, the changes look good.

packages/standard-server-node/src/body.test.ts

coderabbitai

🧹 Nitpick comments (2)

packages/standard-server-node/src/body.test.ts (2)

77-91: Optional: tighten JSON expectation to strict structural equality.

Because JSON serialization drops undefined fields, this test can be made stricter and less matcher-dependent.

Suggested tightening

   it('json with utf-8 characters split across chunk boundaries', async () => {
     const original = {
       json: {
         text: '滚滚长江东逝水',
       },
-      meta: undefined,
     }
@@
-    expect(standardBody).toEqual(original)
+    expect(standardBody).toStrictEqual(original)
   })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/standard-server-node/src/body.test.ts` around lines 77 - 91, The
test expects the original object to equal the parsed JSON but includes a field
with value undefined (meta) which JSON.stringify drops, so update the
expectation to match the actual parsed structure; for example, remove the meta:
undefined from the original test data or replace the
expect(standardBody).toEqual(original) with
expect(standardBody).toEqual(JSON.parse(JSON.stringify(original))). Locate the
test using splitBufferInsideCharacter, createChunkedRequest and toStandardBody
and make the expected value strictly reflect
JSON.parse(JSON.stringify(original)) so the assertion uses structural equality
with no undefined fields.

27-40: Optional: enforce multi-byte split intent in helper.

splitBufferInsideCharacter only truly splits inside a character when splitCharacter is multi-byte. A guard would prevent accidental weak test setups.

Suggested hardening

 function splitBufferInsideCharacter(text: string, splitCharacter: string): Buffer[] {
   const buffer = Buffer.from(text)
   const splitBytes = Buffer.from(splitCharacter)
+  if (splitBytes.length < 2) {
+    throw new Error(`split character must be multi-byte UTF-8: ${splitCharacter}`)
+  }
   const splitIndex = buffer.indexOf(splitBytes)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/standard-server-node/src/body.test.ts` around lines 27 - 40, The
helper splitBufferInsideCharacter currently allows single-byte splitCharacter
which means it doesn't actually exercise "splitting inside a character"; update
splitBufferInsideCharacter to assert or throw if the provided splitCharacter
encodes to a single byte (e.g. check Buffer.from(splitCharacter).length <= 1) so
callers must pass a multi-byte split token, and adjust tests to supply a
multi-byte splitCharacter where needed; keep the same return behavior and error
message for split not found but add the new guard early in the function.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/standard-server-node/src/body.test.ts`:
- Around line 77-91: The test expects the original object to equal the parsed
JSON but includes a field with value undefined (meta) which JSON.stringify
drops, so update the expectation to match the actual parsed structure; for
example, remove the meta: undefined from the original test data or replace the
expect(standardBody).toEqual(original) with
expect(standardBody).toEqual(JSON.parse(JSON.stringify(original))). Locate the
test using splitBufferInsideCharacter, createChunkedRequest and toStandardBody
and make the expected value strictly reflect
JSON.parse(JSON.stringify(original)) so the assertion uses structural equality
with no undefined fields.
- Around line 27-40: The helper splitBufferInsideCharacter currently allows
single-byte splitCharacter which means it doesn't actually exercise "splitting
inside a character"; update splitBufferInsideCharacter to assert or throw if the
provided splitCharacter encodes to a single byte (e.g. check
Buffer.from(splitCharacter).length <= 1) so callers must pass a multi-byte split
token, and adjust tests to supply a multi-byte splitCharacter where needed; keep
the same return behavior and error message for split not found but add the new
guard early in the function.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 175ec3a2-8ea4-4c4e-a1a0-2396ee0a697f

📥 Commits

Reviewing files that changed from the base of the PR and between f30210a and 4d9a7ef.

📒 Files selected for processing (2)

packages/standard-server-node/src/body.test.ts
packages/standard-server-node/src/body.ts

zxhyc131 · 2026-03-24T08:02:50Z

The issue was caused by decoding each incoming stream chunk independently with chunk.toString(). For UTF-8 text, a single character can span
multiple bytes, and those bytes may be split across chunk boundaries. When that happens, decoding each chunk separately produces broken text or
replacement characters, which can then corrupt plain-text bodies and make JSON parsing fail.

This change fixes that by using a streaming UTF-8 decoder (StringDecoder) so incomplete multi-byte sequences are buffered until the remaining
bytes arrive in the next chunk. That preserves the original text across chunk boundaries.

dinwwwh · 2026-03-24T09:34:27Z

Thanks @zxhyc131 for the awesome PR! I made a small change - instead of using StringDecoder, I opted for TextDecoder since it's more modern and widely supported. I also added tests to ensure text += decoder.decode() is called at the end. What do you think about my changes?

pkg-pr-new · 2026-03-24T09:40:39Z

More templates

@orpc/ai-sdk

npm i https://pkg.pr.new/@orpc/ai-sdk@1496

@orpc/arktype

npm i https://pkg.pr.new/@orpc/arktype@1496

@orpc/client

npm i https://pkg.pr.new/@orpc/client@1496

@orpc/contract

npm i https://pkg.pr.new/@orpc/contract@1496

@orpc/experimental-durable-iterator

npm i https://pkg.pr.new/@orpc/experimental-durable-iterator@1496

@orpc/hey-api

npm i https://pkg.pr.new/@orpc/hey-api@1496

@orpc/interop

npm i https://pkg.pr.new/@orpc/interop@1496

@orpc/json-schema

npm i https://pkg.pr.new/@orpc/json-schema@1496

@orpc/nest

npm i https://pkg.pr.new/@orpc/nest@1496

@orpc/openapi

npm i https://pkg.pr.new/@orpc/openapi@1496

@orpc/openapi-client

npm i https://pkg.pr.new/@orpc/openapi-client@1496

@orpc/otel

npm i https://pkg.pr.new/@orpc/otel@1496

@orpc/experimental-pino

npm i https://pkg.pr.new/@orpc/experimental-pino@1496

@orpc/experimental-publisher

npm i https://pkg.pr.new/@orpc/experimental-publisher@1496

@orpc/experimental-publisher-durable-object

npm i https://pkg.pr.new/@orpc/experimental-publisher-durable-object@1496

@orpc/experimental-ratelimit

npm i https://pkg.pr.new/@orpc/experimental-ratelimit@1496

@orpc/react

npm i https://pkg.pr.new/@orpc/react@1496

@orpc/react-query

npm i https://pkg.pr.new/@orpc/react-query@1496

@orpc/experimental-react-swr

npm i https://pkg.pr.new/@orpc/experimental-react-swr@1496

@orpc/server

npm i https://pkg.pr.new/@orpc/server@1496

@orpc/shared

npm i https://pkg.pr.new/@orpc/shared@1496

@orpc/solid-query

npm i https://pkg.pr.new/@orpc/solid-query@1496

@orpc/standard-server

npm i https://pkg.pr.new/@orpc/standard-server@1496

@orpc/standard-server-aws-lambda

npm i https://pkg.pr.new/@orpc/standard-server-aws-lambda@1496

@orpc/standard-server-fastify

npm i https://pkg.pr.new/@orpc/standard-server-fastify@1496

@orpc/standard-server-fetch

npm i https://pkg.pr.new/@orpc/standard-server-fetch@1496

@orpc/standard-server-node

npm i https://pkg.pr.new/@orpc/standard-server-node@1496

@orpc/standard-server-peer

npm i https://pkg.pr.new/@orpc/standard-server-peer@1496

@orpc/svelte-query

npm i https://pkg.pr.new/@orpc/svelte-query@1496

@orpc/tanstack-query

npm i https://pkg.pr.new/@orpc/tanstack-query@1496

@orpc/trpc

npm i https://pkg.pr.new/@orpc/trpc@1496

@orpc/valibot

npm i https://pkg.pr.new/@orpc/valibot@1496

@orpc/vue-colada

npm i https://pkg.pr.new/@orpc/vue-colada@1496

@orpc/vue-query

npm i https://pkg.pr.new/@orpc/vue-query@1496

@orpc/zod

npm i https://pkg.pr.new/@orpc/zod@1496

commit: e1e5222

codecov · 2026-03-24T09:40:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

fix: handle utf-8 characters split across stream chunks

4d9a7ef

dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 24, 2026

dosubot bot added bug Something isn't working javascript Pull requests that update javascript code labels Mar 24, 2026

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

packages/standard-server-node/src/body.test.ts Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

test: remove undefined meta from utf-8 body test

f03d49a

use TextDecoder instead of StringDecoder

c7b1e64

dinwwwh approved these changes Mar 24, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 24, 2026

dinwwwh merged commit 6bc474e into middleapi:main Mar 24, 2026
5 checks passed

dinwwwh changed the title ~~fix: handle utf-8 characters split across stream chunks~~ fix(node-adapter): handle utf-8 characters split across stream chunks Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(node-adapter): handle utf-8 characters split across stream chunks#1496

fix(node-adapter): handle utf-8 characters split across stream chunks#1496
dinwwwh merged 3 commits intomiddleapi:mainfrom
zxhyc131:fix/utf8-chunk-decoding

zxhyc131 commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

zxhyc131 commented Mar 24, 2026

Uh oh!

dinwwwh commented Mar 24, 2026

Uh oh!

pkg-pr-new bot commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zxhyc131 commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Testing

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

zxhyc131 commented Mar 24, 2026

Uh oh!

dinwwwh commented Mar 24, 2026

Uh oh!

pkg-pr-new bot commented Mar 24, 2026

Uh oh!

codecov bot commented Mar 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zxhyc131 commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading