fix(shell): force UTF-8 encoding for PowerShell commands on Windows#25262
fix(shell): force UTF-8 encoding for PowerShell commands on Windows#25262enjoykumawat wants to merge 2 commits intogoogle-gemini:mainfrom
Conversation
Add POWERSHELL_UTF8_GUARD preamble to force [Console]::OutputEncoding and $OutputEncoding to UTF-8 before each PowerShell command, ensuring non-ASCII output is correctly captured through Node.js pipes. Also rename ensurePromptvarsDisabled to applyShellPreamble to reflect its broader responsibility, and skip getCachedEncodingForBuffer when shell is powershell so the TextDecoder always uses UTF-8 regardless of the system code page. Fixes google-gemini#20661
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses character encoding issues encountered when capturing PowerShell output on Windows systems with non-UTF-8 default code pages. By forcing UTF-8 encoding at the shell level and updating the output decoding logic, the changes ensure that non-ASCII characters are correctly captured and displayed. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a shell preamble to force UTF-8 output encoding in PowerShell, ensuring consistent non-ASCII character handling across different system locales. It also updates the output decoder to skip system encoding detection when PowerShell is used. A critical issue was identified where the shell variable is referenced in the handleOutput closure but is not defined within that scope, which would lead to a ReferenceError at runtime.
| const encoding = getCachedEncodingForBuffer(data); | ||
| // PowerShell preamble forces UTF-8 output; skip system encoding detection. | ||
| const encoding = | ||
| shell === 'powershell' ? 'utf-8' : getCachedEncodingForBuffer(data); |
There was a problem hiding this comment.
A ReferenceError is introduced in the childProcessFallback method. The variable shell is used on line 632 within the handleOutput closure, but it is not defined in the scope of childProcessFallback or the closure itself. This will cause the application to crash with a ReferenceError whenever handleOutput is invoked, leading to a reliable Denial of Service (DoS) for any shell execution that falls back to childProcessFallback. To remediate, ensure the shell variable is available in the scope of childProcessFallback, for example, by calling getShellConfiguration() here. Additionally, please note that executeWithPty (around line 1127) appears to be missing this logic, which means PTY-based executions might still suffer from encoding issues on Windows.
const { shell } = getShellConfiguration();
const encoding =
shell === 'powershell' ? 'utf-8' : getCachedEncodingForBuffer(data);
Fixes #20661
Problem
On Windows with a non-UTF-8 system code page (e.g. CP936 for Chinese locale), PowerShell uses the system default encoding for console output. When gemini-cli captures this output through Node.js pipes, non-ASCII characters (e.g. Chinese text) are decoded incorrectly and appear garbled.
Two issues combine to cause this:
[Console]::OutputEncodingdefaults to the system code page, so bytes written to stdout use that encoding.getCachedEncodingForBufferdetects the system code page viachcpand uses it for theTextDecoder, but since we now force UTF-8 output, using the system code page decoder produces garbled text.Fix
Add a
POWERSHELL_UTF8_GUARDpreamble prepended to every PowerShell command:This forces PowerShell to write UTF-8 bytes to stdout/stderr before the user's command runs.
Rename
ensurePromptvarsDisabled→applyShellPreambleto reflect its broader responsibility (it now handles both bash shopt options and PowerShell UTF-8 setup).In
handleOutput, skipgetCachedEncodingForBufferwhenshell === 'powershell'and always construct theTextDecoderwith'utf-8', ensuring the forced UTF-8 output is decoded correctly regardless of the system code page.The guard is idempotent — if the command already starts with it, it is not prepended again.