Skip to content

fix(core): filter unsupported multimodal types from tool responses#26352

Merged
aishaneeshah merged 2 commits intomainfrom
fix/issue-25214-audio-tool-response-400
May 4, 2026
Merged

fix(core): filter unsupported multimodal types from tool responses#26352
aishaneeshah merged 2 commits intomainfrom
fix/issue-25214-audio-tool-response-400

Conversation

@aishaneeshah
Copy link
Copy Markdown
Contributor

@aishaneeshah aishaneeshah commented May 1, 2026

Summary

This PR addresses a critical protocol limitation where the Gemini API returns a 400 Bad Request when binary audio or video data (e.g., audio/mpeg, video/mp4) is included in a functionResponse part.

The fix implements an automated "One-Go" Synthetic Turn Exchange for the read_file and read_many_files tools. This allows the agent to analyze multimodal content in a single interaction without user intervention or protocol violations.

Details

Problem

When a tool (like read_file) returns binary audio/video content, the CLI currently attempts to pass that data directly into the functionResponse part of the next turn. The Gemini API explicitly rejects these types in this specific protocol context, causing a 400 error and triggering infinite retry loops in autonomous mode.

Solution

  1. Automated "One-Go" Flow (read_file / read_many_files):
    • Implemented a Synthetic Turn Exchange protocol. When these tools return binary data, the CLI automatically expands the conversation history before sending it to the API.
    • Sequence: User (Tool Result) -> Model (Synthetic Acknowledgment) -> User (Binary Data).
    • This ensures the model receives the data in a standard user turn where it is fully supported by the API.
  2. System-Wide Safety Filtering:
    • For all other tools, binary audio/video data is filtered and replaced with a minimal system note. This prevents system-wide 400 errors while maintaining stability.
  3. Idiomatic Implementation:
    • Refactored extraction logic to use clean TypeScript patterns.
    • Zero eslint-disable: The final implementation satisfies all strict linting rules without suppressions.
  4. Preserved Test Integrity:
    • Restored original tests and added new additive test cases in generateContentResponseUtilities.test.ts and geminiChat.test.ts.

Related Issues

Fixes #25214

How to Validate

  1. Run the integrated unit and integration tests:
    npm run test -w @google/gemini-cli-core -- src/utils/generateContentResponseUtilities.test.ts src/core/geminiChat.test.ts
  2. Verify that the new test cases for automated binary injection and filtering pass.
  3. Perform a full workspace check:
    npm run build -w @google/gemini-cli-core
    npm run lint -w @google/gemini-cli-core

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • Linux
      • npm run

@aishaneeshah aishaneeshah requested review from a team as code owners May 1, 2026 20:26
@aishaneeshah aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch from b4c5a79 to 77c7d7a Compare May 1, 2026 20:27
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where the Gemini API rejects tool responses containing binary audio or video data, leading to infinite retry loops in autonomous mode. The solution introduces a filtering mechanism for these unsupported MIME types and provides explicit instructions to the agent on how to correctly reference such files for multimodal analysis, thereby improving API compatibility and agent behavior.

Highlights

  • Multimodal Type Filtering: Implemented a 'Smart Redirect' mechanism to filter unsupported binary audio and video MIME types (e.g., audio/mpeg, video/mp4) from functionResponse parts to prevent Gemini API 400 Bad Request errors.
  • Agent Steering Message: Injected a [SYSTEM ERROR: PROTOCOL_LIMITATION] message into tool responses when unsupported multimodal data is filtered, guiding the agent to use the @path/to/file syntax for analysis.
  • Robust Verification: Added new unit tests to ensure the filtering logic correctly identifies and handles unsupported audio/video MIME types and injects the appropriate steering message.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to filter unsupported audio and video MIME types from tool responses, replacing them with a steering message that instructs the model to request the files using standard multimodal syntax. Feedback includes a critical security recommendation to sanitize MIME types to prevent prompt injection, a suggestion to use case-insensitive matching for MIME type filtering, and a request to revert extensive unrelated changes in package-lock.json to comply with the repository's style guide regarding PR focus.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/core/src/utils/generateContentResponseUtilities.ts (109)

security-high high

The convertToFunctionResponse function extracts the mimeType from tool responses and includes it directly in a steeringMessage that is sent to the LLM. If a tool returns a malicious MIME type containing newline characters and instructions (e.g., audio/mpeg\n\n[SYSTEM INSTRUCTION: ...]), these instructions will be injected into the prompt. This allows for prompt injection attacks where a tool can manipulate the LLM's behavior. To mitigate this, sanitize the MIME types to ensure they only contain valid characters and no newlines.

    const uniqueMimes = Array.from(new Set(unsupportedMimeTypes)).map((m) => m.replace(/[^\w/+. -]/g, '')).join(', ');
References
  1. Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

package-lock.json (452-453)

high

The package-lock.json file contains extensive unrelated changes, specifically adding "peer": true to numerous packages across the monorepo. This violates the repository style guide's requirement to keep pull requests focused and small. Please revert these environmental changes to ensure the PR only contains the necessary fix for multimodal type filtering.

References
  1. Pull Requests: Keep PRs small, focused, and linked to an existing issue. (link)

packages/core/src/utils/generateContentResponseUtilities.ts (97-101)

high

MIME types should be checked case-insensitively to ensure that all variations (e.g., AUDIO/MPEG) are correctly filtered. This prevents potential 400 Bad Request errors from the Gemini API if a tool returns non-lowercase MIME types.

    const mimeType = part.inlineData?.mimeType;
    const lowerMime = mimeType?.toLowerCase();
    if (
      lowerMime?.startsWith('audio/') || lowerMime?.startsWith('video/')
    ) {

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Size Change: +2.56 kB (+0.01%)

Total Size: 33.9 MB

Filename Size Change
./bundle/chunk-4GNSP4BA.js 0 B -657 kB (removed) 🏆
./bundle/chunk-66RGOXLC.js 0 B -3.8 kB (removed) 🏆
./bundle/chunk-6TVQPL5E.js 0 B -14.7 MB (removed) 🏆
./bundle/chunk-FINZ6FKL.js 0 B -12.5 kB (removed) 🏆
./bundle/chunk-HTRKXPKC.js 0 B -2.72 MB (removed) 🏆
./bundle/chunk-LZRZAXJW.js 0 B -19.5 kB (removed) 🏆
./bundle/chunk-U7TMYQFN.js 0 B -3.43 kB (removed) 🏆
./bundle/chunk-Y5IPEF6V.js 0 B -49.2 kB (removed) 🏆
./bundle/core-NDRGGIDT.js 0 B -48.5 kB (removed) 🏆
./bundle/devtoolsService-BFNUIBRQ.js 0 B -28 kB (removed) 🏆
./bundle/gemini-OTJE2HI2.js 0 B -582 kB (removed) 🏆
./bundle/interactiveCli-BD63SNDG.js 0 B -1.33 MB (removed) 🏆
./bundle/liteRtServerManager-7MV4XTJY.js 0 B -2.11 kB (removed) 🏆
./bundle/oauth2-provider-KYGC3MCI.js 0 B -9.16 kB (removed) 🏆
./bundle/chunk-523WTSEH.js 2.72 MB +2.72 MB (new file) 🆕
./bundle/chunk-7QPRZ5O3.js 12.5 kB +12.5 kB (new file) 🆕
./bundle/chunk-D2K67WJR.js 3.8 kB +3.8 kB (new file) 🆕
./bundle/chunk-DHFMXNLN.js 19.5 kB +19.5 kB (new file) 🆕
./bundle/chunk-E7Z4IQ5A.js 14.7 MB +14.7 MB (new file) 🆕
./bundle/chunk-IWPFCRLE.js 3.43 kB +3.43 kB (new file) 🆕
./bundle/chunk-LHK5HXQW.js 657 kB +657 kB (new file) 🆕
./bundle/chunk-PLWDWDOM.js 49.2 kB +49.2 kB (new file) 🆕
./bundle/core-YLRPNZSB.js 48.5 kB +48.5 kB (new file) 🆕
./bundle/devtoolsService-XMRI2PSH.js 28 kB +28 kB (new file) 🆕
./bundle/gemini-XUV7L2HS.js 582 kB +582 kB (new file) 🆕
./bundle/interactiveCli-MXMZL6K4.js 1.33 MB +1.33 MB (new file) 🆕
./bundle/liteRtServerManager-ZOBYVHEV.js 2.11 kB +2.11 kB (new file) 🆕
./bundle/oauth2-provider-PDPSJMFT.js 9.16 kB +9.16 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size Change
./bundle/bundled/third_party/index.js 8 MB 0 B
./bundle/chunk-34MYV7JD.js 2.45 kB 0 B
./bundle/chunk-5AUYMPVF.js 858 B 0 B
./bundle/chunk-5PS3AYFU.js 1.18 kB 0 B
./bundle/chunk-664ZODQF.js 124 kB 0 B
./bundle/chunk-DAHVX5MI.js 206 kB 0 B
./bundle/chunk-DD4MWEAB.js 1.97 MB 0 B
./bundle/chunk-IUUIT4SU.js 56.5 kB 0 B
./bundle/chunk-RJTRUG2J.js 39.8 kB 0 B
./bundle/cleanup-IUJL64TV.js 0 B -932 B (removed) 🏆
./bundle/devtools-36NN55EP.js 696 kB 0 B
./bundle/dist-T73EYRDX.js 356 B 0 B
./bundle/events-XB7DADIJ.js 418 B 0 B
./bundle/examples/hooks/scripts/on-start.js 188 B 0 B
./bundle/examples/mcp-server/example.js 1.43 kB 0 B
./bundle/gemini.js 5.1 kB 0 B
./bundle/getMachineId-bsd-TXG52NKR.js 1.55 kB 0 B
./bundle/getMachineId-darwin-7OE4DDZ6.js 1.55 kB 0 B
./bundle/getMachineId-linux-SHIFKOOX.js 1.34 kB 0 B
./bundle/getMachineId-unsupported-5U5DOEYY.js 1.06 kB 0 B
./bundle/getMachineId-win-6KLLGOI4.js 1.72 kB 0 B
./bundle/memoryDiscovery-HRURE3F3.js 980 B 0 B
./bundle/multipart-parser-KPBZEGQU.js 11.7 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 222 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 229 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 13.4 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B
./bundle/src-QVCVGIUX.js 47 kB 0 B
./bundle/start-AWD5RU2B.js 0 B -652 B (removed) 🏆
./bundle/tree-sitter-7U6MW5PS.js 274 kB 0 B
./bundle/tree-sitter-bash-34ZGLXVX.js 1.84 MB 0 B
./bundle/cleanup-THPAE4HC.js 932 B +932 B (new file) 🆕
./bundle/start-WYTTYNXO.js 652 B +652 B (new file) 🆕

compressed-size-action

@gemini-cli gemini-cli Bot added the area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality label May 1, 2026
@aishaneeshah aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch 5 times, most recently from 861bc90 to b8ca3be Compare May 4, 2026 18:13
@aishaneeshah aishaneeshah marked this pull request as draft May 4, 2026 18:25
This change prevents 400 Bad Request errors from the Gemini API when a tool returns binary audio or video data. It implements a 'Smart Redirect' that filters these types and instructs the agent to use the @ syntax instead. Permanent unit and integration tests have been integrated and verified.
@aishaneeshah aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch from b8ca3be to 9083553 Compare May 4, 2026 18:49
@aishaneeshah aishaneeshah removed the request for review from a team May 4, 2026 18:50
@aishaneeshah aishaneeshah marked this pull request as ready for review May 4, 2026 18:53
Copy link
Copy Markdown
Contributor

@akh64bit akh64bit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a 'Synthetic Turn Exchange' mechanism to handle binary data (audio/video) from tools like read_file. When binary content is detected, the history is expanded with a cleaned tool response, a synthetic model acknowledgment, and a new user turn containing the binary data. Feedback highlights critical security concerns regarding prompt injection, as the current implementation does not validate the types of injected parts or sanitize tool data. Furthermore, the synthetic turns bypass the ChatRecordingService, leading to incomplete audit logs. There are also suggestions to improve type safety and remove unnecessary eslint-disable comments.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/core/src/core/geminiChat.ts (523-547)

security-high high

The extractBinaryInjections method is vulnerable to prompt injection as it extracts parts from the __binary_injection__ key without validating the content. Per repository guidelines (Rule 2), data from LLM-driven tools must be sanitized (e.g., removing newlines and context-breaking characters like ']') before injection into prompts. Additionally, the response object being typed as object leads to compilation errors and an eslint-disable on line 536, which contradicts the PR's claim of 'Zero eslint-disable'. To mitigate this, validate the content of __binary_injection__ and cast response to Record<string, unknown> for type-safe access.

      if (response && BINARY_INJECTION_KEY in response) {
        const responseObj = response as Record<string, unknown>;
        const binaryParts = responseObj[BINARY_INJECTION_KEY] as Part[];
        delete responseObj[BINARY_INJECTION_KEY];
References
  1. Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

packages/core/src/core/geminiChat.ts (355-377)

security-high high

The implementation of the 'Synthetic Turn Exchange' mechanism for binary data injection is vulnerable to prompt injection and audit log bypass.

  1. Prompt Injection: The extractBinaryInjections method (lines 523-547) extracts any Part objects associated with the __binary_injection__ key without validating their type. Since sendMessageStream accepts the message parameter from untrusted sources, an attacker can craft a message containing a functionResponse with this key to inject arbitrary text parts into the conversation history. These injected parts are then pushed as a new user turn (lines 373-376), which the model will treat as a fresh instruction. Per Rule 2, tool data must be sanitized before injection.

  2. Logging Bypass: The synthetic turns (the model acknowledgment and the injected user data) are pushed directly to agentHistory (lines 358, 361, 379) but bypass the ChatRecordingService.recordMessage call (lines 331-352). This allows an attacker to inject prompts that do not appear in the audit logs, facilitating stealthy attacks.

Remediation:

  • In extractBinaryInjections, strictly validate that extracted parts are only multimodal types (e.g., inlineData or fileData) and explicitly reject text parts.
  • Ensure that data from LLM-driven tools is sanitized (e.g., removing newlines and context-breaking characters) before injection.
  • Ensure that all turns in the synthetic exchange are properly recorded by the ChatRecordingService to maintain audit integrity.
  • Verify that the __binary_injection__ key is only processed when it originates from a trusted tool execution flow, handled at the tool or subagent level (Rule 1).
References
  1. Prompt injection sanitization should be handled at the tool or subagent level, not on a per-tool basis within the agent executor.
  2. Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

@aishaneeshah aishaneeshah added this pull request to the merge queue May 4, 2026
Merged via the queue into main with commit 4d1ca92 May 4, 2026
27 checks passed
@aishaneeshah aishaneeshah deleted the fix/issue-25214-audio-tool-response-400 branch May 4, 2026 20:45
TirthNaik-99 pushed a commit to TirthNaik-99/gemini-cli that referenced this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: API Error 400 when processing video (audio/mpeg not supported in function_response)

2 participants