fix(core): filter unsupported multimodal types from tool responses by aishaneeshah · Pull Request #26352 · google-gemini/gemini-cli

aishaneeshah · 2026-05-01T20:25:59Z

Summary

This PR addresses a critical protocol limitation where the Gemini API returns a 400 Bad Request when binary audio or video data (e.g., audio/mpeg, video/mp4) is included in a functionResponse part.

The fix implements an automated "One-Go" Synthetic Turn Exchange for the read_file and read_many_files tools. This allows the agent to analyze multimodal content in a single interaction without user intervention or protocol violations.

Details

Problem

When a tool (like read_file) returns binary audio/video content, the CLI currently attempts to pass that data directly into the functionResponse part of the next turn. The Gemini API explicitly rejects these types in this specific protocol context, causing a 400 error and triggering infinite retry loops in autonomous mode.

Solution

Automated "One-Go" Flow (read_file / read_many_files):
- Implemented a Synthetic Turn Exchange protocol. When these tools return binary data, the CLI automatically expands the conversation history before sending it to the API.
- Sequence: User (Tool Result) -> Model (Synthetic Acknowledgment) -> User (Binary Data).
- This ensures the model receives the data in a standard user turn where it is fully supported by the API.
System-Wide Safety Filtering:
- For all other tools, binary audio/video data is filtered and replaced with a minimal system note. This prevents system-wide 400 errors while maintaining stability.
Idiomatic Implementation:
- Refactored extraction logic to use clean TypeScript patterns.
- Zero eslint-disable: The final implementation satisfies all strict linting rules without suppressions.
Preserved Test Integrity:
- Restored original tests and added new additive test cases in generateContentResponseUtilities.test.ts and geminiChat.test.ts.

Related Issues

Fixes #25214

How to Validate

Run the integrated unit and integration tests:

npm run test -w @google/gemini-cli-core -- src/utils/generateContentResponseUtilities.test.ts src/core/geminiChat.test.ts

Verify that the new test cases for automated binary injection and filtering pass.

Perform a full workspace check:

npm run build -w @google/gemini-cli-core
npm run lint -w @google/gemini-cli-core

Pre-Merge Checklist

Updated relevant documentation and README (if needed)
Added/updated tests (if needed)
Noted breaking changes (if any)
Validated on required platforms/methods:
- Linux
  - npm run

gemini-code-assist · 2026-05-01T20:28:25Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where the Gemini API rejects tool responses containing binary audio or video data, leading to infinite retry loops in autonomous mode. The solution introduces a filtering mechanism for these unsupported MIME types and provides explicit instructions to the agent on how to correctly reference such files for multimodal analysis, thereby improving API compatibility and agent behavior.

Highlights

Multimodal Type Filtering: Implemented a 'Smart Redirect' mechanism to filter unsupported binary audio and video MIME types (e.g., audio/mpeg, video/mp4) from functionResponse parts to prevent Gemini API 400 Bad Request errors.
Agent Steering Message: Injected a [SYSTEM ERROR: PROTOCOL_LIMITATION] message into tool responses when unsupported multimodal data is filtered, guiding the agent to use the @path/to/file syntax for analysis.
Robust Verification: Added new unit tests to ensure the filtering logic correctly identifies and handles unsupported audio/video MIME types and injects the appropriate steering message.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces logic to filter unsupported audio and video MIME types from tool responses, replacing them with a steering message that instructs the model to request the files using standard multimodal syntax. Feedback includes a critical security recommendation to sanitize MIME types to prevent prompt injection, a suggestion to use case-insensitive matching for MIME type filtering, and a request to revert extensive unrelated changes in package-lock.json to comply with the repository's style guide regarding PR focus.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/core/src/utils/generateContentResponseUtilities.ts (109)

The convertToFunctionResponse function extracts the mimeType from tool responses and includes it directly in a steeringMessage that is sent to the LLM. If a tool returns a malicious MIME type containing newline characters and instructions (e.g., audio/mpeg\n\n[SYSTEM INSTRUCTION: ...]), these instructions will be injected into the prompt. This allows for prompt injection attacks where a tool can manipulate the LLM's behavior. To mitigate this, sanitize the MIME types to ensure they only contain valid characters and no newlines.

    const uniqueMimes = Array.from(new Set(unsupportedMimeTypes)).map((m) => m.replace(/[^\w/+. -]/g, '')).join(', ');

References

Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

package-lock.json (452-453)

The package-lock.json file contains extensive unrelated changes, specifically adding "peer": true to numerous packages across the monorepo. This violates the repository style guide's requirement to keep pull requests focused and small. Please revert these environmental changes to ensure the PR only contains the necessary fix for multimodal type filtering.

References

Pull Requests: Keep PRs small, focused, and linked to an existing issue. ^(link)

packages/core/src/utils/generateContentResponseUtilities.ts (97-101)

MIME types should be checked case-insensitively to ensure that all variations (e.g., AUDIO/MPEG) are correctly filtered. This prevents potential 400 Bad Request errors from the Gemini API if a tool returns non-lowercase MIME types.

    const mimeType = part.inlineData?.mimeType;
    const lowerMime = mimeType?.toLowerCase();
    if (
      lowerMime?.startsWith('audio/') || lowerMime?.startsWith('video/')
    ) {

github-actions · 2026-05-01T20:31:44Z

Size Change: +2.56 kB (+0.01%)

Total Size: 33.9 MB

Filename	Size	Change
`./bundle/chunk-4GNSP4BA.js`	0 B	-657 kB (removed)	🏆
`./bundle/chunk-66RGOXLC.js`	0 B	-3.8 kB (removed)	🏆
`./bundle/chunk-6TVQPL5E.js`	0 B	-14.7 MB (removed)	🏆
`./bundle/chunk-FINZ6FKL.js`	0 B	-12.5 kB (removed)	🏆
`./bundle/chunk-HTRKXPKC.js`	0 B	-2.72 MB (removed)	🏆
`./bundle/chunk-LZRZAXJW.js`	0 B	-19.5 kB (removed)	🏆
`./bundle/chunk-U7TMYQFN.js`	0 B	-3.43 kB (removed)	🏆
`./bundle/chunk-Y5IPEF6V.js`	0 B	-49.2 kB (removed)	🏆
`./bundle/core-NDRGGIDT.js`	0 B	-48.5 kB (removed)	🏆
`./bundle/devtoolsService-BFNUIBRQ.js`	0 B	-28 kB (removed)	🏆
`./bundle/gemini-OTJE2HI2.js`	0 B	-582 kB (removed)	🏆
`./bundle/interactiveCli-BD63SNDG.js`	0 B	-1.33 MB (removed)	🏆
`./bundle/liteRtServerManager-7MV4XTJY.js`	0 B	-2.11 kB (removed)	🏆
`./bundle/oauth2-provider-KYGC3MCI.js`	0 B	-9.16 kB (removed)	🏆
`./bundle/chunk-523WTSEH.js`	2.72 MB	+2.72 MB (new file)	🆕
`./bundle/chunk-7QPRZ5O3.js`	12.5 kB	+12.5 kB (new file)	🆕
`./bundle/chunk-D2K67WJR.js`	3.8 kB	+3.8 kB (new file)	🆕
`./bundle/chunk-DHFMXNLN.js`	19.5 kB	+19.5 kB (new file)	🆕
`./bundle/chunk-E7Z4IQ5A.js`	14.7 MB	+14.7 MB (new file)	🆕
`./bundle/chunk-IWPFCRLE.js`	3.43 kB	+3.43 kB (new file)	🆕
`./bundle/chunk-LHK5HXQW.js`	657 kB	+657 kB (new file)	🆕
`./bundle/chunk-PLWDWDOM.js`	49.2 kB	+49.2 kB (new file)	🆕
`./bundle/core-YLRPNZSB.js`	48.5 kB	+48.5 kB (new file)	🆕
`./bundle/devtoolsService-XMRI2PSH.js`	28 kB	+28 kB (new file)	🆕
`./bundle/gemini-XUV7L2HS.js`	582 kB	+582 kB (new file)	🆕
`./bundle/interactiveCli-MXMZL6K4.js`	1.33 MB	+1.33 MB (new file)	🆕
`./bundle/liteRtServerManager-ZOBYVHEV.js`	2.11 kB	+2.11 kB (new file)	🆕
`./bundle/oauth2-provider-PDPSJMFT.js`	9.16 kB	+9.16 kB (new file)	🆕

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/bundled/third_party/index.js`	8 MB	0 B
`./bundle/chunk-34MYV7JD.js`	2.45 kB	0 B
`./bundle/chunk-5AUYMPVF.js`	858 B	0 B
`./bundle/chunk-5PS3AYFU.js`	1.18 kB	0 B
`./bundle/chunk-664ZODQF.js`	124 kB	0 B
`./bundle/chunk-DAHVX5MI.js`	206 kB	0 B
`./bundle/chunk-DD4MWEAB.js`	1.97 MB	0 B
`./bundle/chunk-IUUIT4SU.js`	56.5 kB	0 B
`./bundle/chunk-RJTRUG2J.js`	39.8 kB	0 B
`./bundle/cleanup-IUJL64TV.js`	0 B	-932 B (removed)	🏆
`./bundle/devtools-36NN55EP.js`	696 kB	0 B
`./bundle/dist-T73EYRDX.js`	356 B	0 B
`./bundle/events-XB7DADIJ.js`	418 B	0 B
`./bundle/examples/hooks/scripts/on-start.js`	188 B	0 B
`./bundle/examples/mcp-server/example.js`	1.43 kB	0 B
`./bundle/gemini.js`	5.1 kB	0 B
`./bundle/getMachineId-bsd-TXG52NKR.js`	1.55 kB	0 B
`./bundle/getMachineId-darwin-7OE4DDZ6.js`	1.55 kB	0 B
`./bundle/getMachineId-linux-SHIFKOOX.js`	1.34 kB	0 B
`./bundle/getMachineId-unsupported-5U5DOEYY.js`	1.06 kB	0 B
`./bundle/getMachineId-win-6KLLGOI4.js`	1.72 kB	0 B
`./bundle/memoryDiscovery-HRURE3F3.js`	980 B	0 B
`./bundle/multipart-parser-KPBZEGQU.js`	11.7 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js`	222 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js`	229 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js`	13.4 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js`	132 B	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B
`./bundle/src-QVCVGIUX.js`	47 kB	0 B
`./bundle/start-AWD5RU2B.js`	0 B	-652 B (removed)	🏆
`./bundle/tree-sitter-7U6MW5PS.js`	274 kB	0 B
`./bundle/tree-sitter-bash-34ZGLXVX.js`	1.84 MB	0 B
`./bundle/cleanup-THPAE4HC.js`	932 B	+932 B (new file)	🆕
`./bundle/start-WYTTYNXO.js`	652 B	+652 B (new file)	🆕

_{compressed-size-action}

This change prevents 400 Bad Request errors from the Gemini API when a tool returns binary audio or video data. It implements a 'Smart Redirect' that filters these types and instructs the agent to use the @ syntax instead. Permanent unit and integration tests have been integrated and verified.

akh64bit

LGTM!

gemini-code-assist

Code Review

This pull request implements a 'Synthetic Turn Exchange' mechanism to handle binary data (audio/video) from tools like read_file. When binary content is detected, the history is expanded with a cleaned tool response, a synthetic model acknowledgment, and a new user turn containing the binary data. Feedback highlights critical security concerns regarding prompt injection, as the current implementation does not validate the types of injected parts or sanitize tool data. Furthermore, the synthetic turns bypass the ChatRecordingService, leading to incomplete audit logs. There are also suggestions to improve type safety and remove unnecessary eslint-disable comments.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/core/src/core/geminiChat.ts (523-547)

The extractBinaryInjections method is vulnerable to prompt injection as it extracts parts from the __binary_injection__ key without validating the content. Per repository guidelines (Rule 2), data from LLM-driven tools must be sanitized (e.g., removing newlines and context-breaking characters like ']') before injection into prompts. Additionally, the response object being typed as object leads to compilation errors and an eslint-disable on line 536, which contradicts the PR's claim of 'Zero eslint-disable'. To mitigate this, validate the content of __binary_injection__ and cast response to Record<string, unknown> for type-safe access.

      if (response && BINARY_INJECTION_KEY in response) {
        const responseObj = response as Record<string, unknown>;
        const binaryParts = responseObj[BINARY_INJECTION_KEY] as Part[];
        delete responseObj[BINARY_INJECTION_KEY];

References

Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

packages/core/src/core/geminiChat.ts (355-377)

The implementation of the 'Synthetic Turn Exchange' mechanism for binary data injection is vulnerable to prompt injection and audit log bypass.

Prompt Injection: The extractBinaryInjections method (lines 523-547) extracts any Part objects associated with the __binary_injection__ key without validating their type. Since sendMessageStream accepts the message parameter from untrusted sources, an attacker can craft a message containing a functionResponse with this key to inject arbitrary text parts into the conversation history. These injected parts are then pushed as a new user turn (lines 373-376), which the model will treat as a fresh instruction. Per Rule 2, tool data must be sanitized before injection.
Logging Bypass: The synthetic turns (the model acknowledgment and the injected user data) are pushed directly to agentHistory (lines 358, 361, 379) but bypass the ChatRecordingService.recordMessage call (lines 331-352). This allows an attacker to inject prompts that do not appear in the audit logs, facilitating stealthy attacks.

Remediation:

In extractBinaryInjections, strictly validate that extracted parts are only multimodal types (e.g., inlineData or fileData) and explicitly reject text parts.
Ensure that data from LLM-driven tools is sanitized (e.g., removing newlines and context-breaking characters) before injection.
Ensure that all turns in the synthetic exchange are properly recorded by the ChatRecordingService to maintain audit integrity.
Verify that the __binary_injection__ key is only processed when it originates from a trusted tool execution flow, handled at the tool or subagent level (Rule 1).

References

Prompt injection sanitization should be handled at the tool or subagent level, not on a per-tool basis within the agent executor.
Sanitize data from LLM-driven tools before injecting it into a system prompt to prevent prompt injection. At a minimum, remove newlines and context-breaking characters (e.g., ']').

…oogle-gemini#26352)

aishaneeshah requested review from a team as code owners May 1, 2026 20:26

aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch from b4c5a79 to 77c7d7a Compare May 1, 2026 20:27

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

gemini-cli Bot added the area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality label May 1, 2026

This was referenced May 2, 2026

📊 AI CLI 工具社区动态日报 2026-05-02 gsscsd/big_model_radar#280

Open

📊 AI CLI Tools Digest 2026-05-02 borq168/big_model_radar#107

Open

aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch 5 times, most recently from 861bc90 to b8ca3be Compare May 4, 2026 18:13

aishaneeshah marked this pull request as draft May 4, 2026 18:25

aishaneeshah force-pushed the fix/issue-25214-audio-tool-response-400 branch from b8ca3be to 9083553 Compare May 4, 2026 18:49

aishaneeshah removed the request for review from a team May 4, 2026 18:50

aishaneeshah marked this pull request as ready for review May 4, 2026 18:53

Merge branch 'main' into fix/issue-25214-audio-tool-response-400

61b637e

akh64bit approved these changes May 4, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

aishaneeshah added this pull request to the merge queue May 4, 2026

Merged via the queue into main with commit 4d1ca92 May 4, 2026
27 checks passed

aishaneeshah deleted the fix/issue-25214-audio-tool-response-400 branch May 4, 2026 20:45

TirthNaik-99 pushed a commit to TirthNaik-99/gemini-cli that referenced this pull request May 4, 2026

fix(core): filter unsupported multimodal types from tool responses (g…

cadba14

…oogle-gemini#26352)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): filter unsupported multimodal types from tool responses#26352

fix(core): filter unsupported multimodal types from tool responses#26352
aishaneeshah merged 2 commits intomainfrom
fix/issue-25214-audio-tool-response-400

aishaneeshah commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

akh64bit left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aishaneeshah commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Problem

Solution

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

gemini-code-assist Bot commented May 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

packages/core/src/utils/generateContentResponseUtilities.ts (109)

package-lock.json (452-453)

packages/core/src/utils/generateContentResponseUtilities.ts (97-101)

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akh64bit left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

packages/core/src/core/geminiChat.ts (523-547)

packages/core/src/core/geminiChat.ts (355-377)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aishaneeshah commented May 1, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading