fix(core): decode API error messages from raw bytes to readable text#23283
fix(core): decode API error messages from raw bytes to readable text#23283cRAN-cg wants to merge 15 commits intogoogle-gemini:mainfrom
Conversation
Detect comma-separated byte values from Uint8Array.toString() in API error messages and decode them to UTF-8 text. Fixes google-gemini#19851
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug where API error messages, particularly those originating from the @google/genai SDK, were presented to users as unreadable byte sequences. By introducing a robust byte-decoding utility and integrating it into the core error parsing mechanisms, the change significantly enhances the clarity and usability of error messages, improving the developer and user experience when encountering API failures. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses an issue where API error messages were being displayed as raw byte strings. The core change introduces utility functions to detect and decode these byte-coded strings into readable text, and applies this decoding in various error handling paths. The added unit tests are comprehensive and cover the new functionality well.
My main feedback is regarding code duplication. The new byte-decoding logic has been implemented identically in two separate files (errors.ts and googleErrors.ts). I've left comments suggesting this logic be extracted into a shared utility file to improve maintainability and avoid future inconsistencies.
| /** | ||
| * Detects if a string looks like comma-separated byte values (e.g. from a | ||
| * Uint8Array that was coerced to string via its default toString()) and | ||
| * decodes it to readable UTF-8 text. | ||
| * | ||
| * Example input: "91,123,10,32,32,34,101,114,114,111,114,34,58,32,123,10" | ||
| * Example output: '[{\n "error": {\n' | ||
| */ | ||
| export function decodeByteCodedString(value: string): string { | ||
| if (!value || !value.includes(',')) { | ||
| return value; | ||
| } | ||
|
|
||
| // The message may have a prefix like "got status: 429 Too Many Requests. " | ||
| // followed by the byte-coded body. Try to find where the byte codes start. | ||
| const decoded = tryDecodeBytes(value); | ||
| if (decoded !== null) { | ||
| return decoded; | ||
| } | ||
|
|
||
| // Try splitting on ". " to find a prefix + byte-coded body | ||
| const dotIndex = value.lastIndexOf('. '); | ||
| if (dotIndex !== -1) { | ||
| const prefix = value.substring(0, dotIndex + 2); | ||
| const rest = value.substring(dotIndex + 2); | ||
| const decodedRest = tryDecodeBytes(rest); | ||
| if (decodedRest !== null) { | ||
| return prefix + decodedRest; | ||
| } | ||
| } | ||
|
|
||
| return value; | ||
| } | ||
|
|
||
| /** | ||
| * Attempts to decode a string of comma-separated byte values into UTF-8 text. | ||
| * Returns the decoded string, or null if the input doesn't look like byte codes. | ||
| */ | ||
| function tryDecodeBytes(value: string): string | null { | ||
| const parts = value.split(','); | ||
| // Require at least a few parts to avoid false positives on normal text with commas | ||
| if (parts.length < 4) { | ||
| return null; | ||
| } | ||
| const bytes: number[] = []; | ||
| for (const part of parts) { | ||
| const trimmed = part.trim(); | ||
| // Each part must be a non-negative integer in the byte range 0-255 | ||
| if (!/^\d{1,3}$/.test(trimmed)) { | ||
| return null; | ||
| } | ||
| const num = Number(trimmed); | ||
| if (num > 255 || (trimmed.length > 1 && trimmed.startsWith('0'))) { | ||
| return null; | ||
| } | ||
| bytes.push(num); | ||
| } | ||
| try { | ||
| return String(friendlyError); | ||
| return new TextDecoder('utf-8').decode(new Uint8Array(bytes)); | ||
| } catch { | ||
| return 'Failed to get error details'; | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
There is significant code duplication between the newly added decodeByteCodedString and tryDecodeBytes functions here, and the tryDecodeByteCodedMessage and tryDecodeByteString functions in packages/core/src/utils/googleErrors.ts. The logic is identical.
To improve maintainability and avoid having to fix bugs in two places, this shared logic should be extracted into a new, single utility file (e.g., packages/core/src/utils/byteDecoder.ts). Both errors.ts and googleErrors.ts can then import and use this common utility, which would eliminate the duplicated code.
Given that errors.ts already depends on googleErrors.ts, placing the new utility in a separate file is necessary to prevent circular dependencies.
There was a problem hiding this comment.
please re - review, have resolved the comments
There was a problem hiding this comment.
Thank you for addressing the code duplication. I've reviewed the changes, and the extraction of decodeByteCodedString and tryDecodeBytes into packages/core/src/utils/byteDecoder.ts successfully resolves the issue. The relevant files now correctly import and utilize this shared utility, improving maintainability and preventing circular dependencies as suggested.
| /** | ||
| * Detects if a string looks like comma-separated byte values (e.g. from a | ||
| * Uint8Array.toString()) and decodes it to readable UTF-8 text. This handles | ||
| * API error responses where the body is returned as raw byte codes instead of | ||
| * decoded text. | ||
| * | ||
| * The message may have a prefix (e.g., "got status: 429 Too Many Requests. ") | ||
| * followed by the byte-coded body. | ||
| */ | ||
| function tryDecodeByteCodedMessage(value: string): string { | ||
| if (!value || !value.includes(',')) { | ||
| return value; | ||
| } | ||
|
|
||
| const decoded = tryDecodeByteString(value); | ||
| if (decoded !== null) { | ||
| return decoded; | ||
| } | ||
|
|
||
| // Try splitting on ". " to find a prefix + byte-coded body | ||
| const dotIndex = value.lastIndexOf('. '); | ||
| if (dotIndex !== -1) { | ||
| const prefix = value.substring(0, dotIndex + 2); | ||
| const rest = value.substring(dotIndex + 2); | ||
| const decodedRest = tryDecodeByteString(rest); | ||
| if (decodedRest !== null) { | ||
| return prefix + decodedRest; | ||
| } | ||
| } | ||
|
|
||
| return value; | ||
| } | ||
|
|
||
| /** | ||
| * Attempts to decode a string of comma-separated byte values into UTF-8 text. | ||
| * Returns the decoded string, or null if the input doesn't look like byte codes. | ||
| */ | ||
| function tryDecodeByteString(value: string): string | null { | ||
| const parts = value.split(','); | ||
| if (parts.length < 4) { | ||
| return null; | ||
| } | ||
| const bytes: number[] = []; | ||
| for (const part of parts) { | ||
| const trimmed = part.trim(); | ||
| if (!/^\d{1,3}$/.test(trimmed)) { | ||
| return null; | ||
| } | ||
| const num = Number(trimmed); | ||
| if (num > 255 || (trimmed.length > 1 && trimmed.startsWith('0'))) { | ||
| return null; | ||
| } | ||
| bytes.push(num); | ||
| } | ||
| try { | ||
| return new TextDecoder('utf-8').decode(new Uint8Array(bytes)); | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
The new functions tryDecodeByteCodedMessage and tryDecodeByteString are identical to decodeByteCodedString and tryDecodeBytes in packages/core/src/utils/errors.ts. This code duplication can lead to maintenance issues in the future.
I recommend extracting this logic into a new, shared utility file. This new file can then be imported by both googleErrors.ts and errors.ts, ensuring the logic is defined in a single place. This will make the code easier to maintain and prevent potential inconsistencies.
There was a problem hiding this comment.
please re - review, have resolved the comments
There was a problem hiding this comment.
The code duplication issue has been successfully resolved. The logic for decoding byte-coded messages has been extracted into the new packages/core/src/utils/byteDecoder.ts utility file, and errorParsing.ts, errors.ts, and googleErrors.ts now correctly import and utilize this shared functionality. This is a great improvement for maintainability.
Move decodeByteCodedString and tryDecodeBytes into a dedicated byteDecoder.ts module, eliminating identical implementations in errors.ts and googleErrors.ts. Both files now import from the shared utility, preventing future divergence. Fixes google-gemini#19851
|
/assign |
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request introduces a new decodeByteCodedString utility to convert comma-separated byte strings (often from Uint8Array.toString()) into readable UTF-8 text, including cases with non-byte-coded prefixes. This utility is integrated into the core error parsing functions (parseAndFormatApiError, getErrorMessage, parseGoogleApiError) to improve the readability of error messages, and new tests were added to cover these decoding scenarios. A review comment highlights that the current prefix detection logic in decodeByteCodedString is too specific and brittle, suggesting a more robust regex-based approach to handle diverse prefix formats.
| // Try splitting on ". " to find a prefix + byte-coded body | ||
| // (e.g., "got status: 429 Too Many Requests. 91,123,10,...") | ||
| const dotIndex = value.lastIndexOf('. '); | ||
| if (dotIndex !== -1) { | ||
| const prefix = value.substring(0, dotIndex + 2); | ||
| const rest = value.substring(dotIndex + 2); | ||
| const decodedRest = tryDecodeBytes(rest); | ||
| if (decodedRest !== null) { | ||
| return prefix + decodedRest; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current logic for detecting a prefix by searching for . is too specific and brittle. It will fail to decode byte-coded strings that have different prefixes, such as API Error: 91,123,....
A more robust approach would be to use a regular expression to find a potential byte sequence at the end of the string. This would handle various prefix formats without being tied to a specific separator.
| // Try splitting on ". " to find a prefix + byte-coded body | |
| // (e.g., "got status: 429 Too Many Requests. 91,123,10,...") | |
| const dotIndex = value.lastIndexOf('. '); | |
| if (dotIndex !== -1) { | |
| const prefix = value.substring(0, dotIndex + 2); | |
| const rest = value.substring(dotIndex + 2); | |
| const decodedRest = tryDecodeBytes(rest); | |
| if (decodedRest !== null) { | |
| return prefix + decodedRest; | |
| } | |
| } | |
| // If the whole string isn't bytes, it might have a prefix. Try to find | |
| // a byte-like sequence at the end of the string. | |
| const match = value.match(/((?:\d{1,3},)+\d{1,3})$/); | |
| if (match) { | |
| const byteString = match[1]; | |
| const decodedRest = tryDecodeBytes(byteString); | |
| if (decodedRest !== null) { | |
| const prefix = value.substring(0, value.length - byteString.length); | |
| return prefix + decodedRest; | |
| } | |
| } |
… ". " The ". " split was too brittle and missed prefixes like "API Error: " or "status 429 ". Use a regex to match a trailing comma-separated number sequence, handling any prefix format.
- Raise minimum byte count from 8 to 16 (smallest real JSON error
is 14+ bytes)
- Use TextDecoder({ fatal: true }) to reject invalid UTF-8 sequences
- Add isPrintableText check to reject decoded output dominated by
control characters
- Add tests for invalid UTF-8, control char rejection, and threshold
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a robust solution for decoding API error messages that are incorrectly formatted as comma-separated byte strings. The new decodeByteCodedString utility is well-designed with several heuristics to prevent false positives and is accompanied by a comprehensive suite of unit tests. The utility is correctly integrated into the existing error handling pathways in errorParsing.ts, errors.ts, and googleErrors.ts, ensuring that these garbled error messages are now presented in a human-readable format. The changes are of high quality and effectively address the issue.
Note: Security Review is unavailable for this PR.
Summary
91,123,10,32,32,34,101,114,...) instead of human-readable text when the@google/genaiSDK'sApiErrorcontains a response body coerced fromUint8Arrayvia.toString()decodeByteCodedString()utility that detects comma-separated byte patterns and decodes them to UTF-8errorParsing.ts,errors.ts, andgoogleErrors.tsbefore displaying or parsing error messagesFixes #19851
Test plan
errors.test.tsfordecodeByteCodedString()— pure byte strings, prefixed byte strings, normal strings unchanged, empty stringserrors.test.tsforgetErrorMessage()— byte-coded ApiError messages, prefixed byte messagesgoogleErrors.test.tsforparseGoogleApiError()— byte-coded error parsing, status-prefixed byte messageserrors.test.ts: 36,googleErrors.test.ts: 17)