Body
Version: gws v0.22.5 (Windows x86_64 binary from GitHub Releases)
Environment:
- OS: Windows 10/11
- Shell / invocation:
cmd /c gws ... (also reproducible from a plain cmd.exe prompt)
- Default console codepage: CP-1252 (en-US, "Windows-1252")
Summary
Non-ASCII UTF-8 bytes in API response data are emitted to stdout as if each byte were a CP-1252 character. The classic ’ and â€" patterns appear in place of smart quotes and em-dashes respectively.
This is the standard "UTF-8 bytes interpreted as CP-1252" mojibake signature. The underlying API response is correct UTF-8; the corruption is introduced in gws's output pipeline on Windows when the console codepage is not UTF-8.
Reproduction
Any gws command whose response contains smart punctuation or other non-ASCII characters will reproduce this. A reliable test case:
gws docs documents get --document-id <ID> --fields "body(content(paragraph(elements(textRun(content)))))"
on a Google Doc containing AI-generated prose (which typically includes smart quotes and em-dashes).
Expected:
it's a balanced — considered — approach
Actual:
it’s a balanced â€" considered â€" approach
Confirmation the source data is correct:
- Open the same document in the Google Docs web UI — characters display correctly.
- Fetch the same document via any UTF-8-aware HTTP client — characters are correct UTF-8.
- Run the same
gws command after executing chcp 65001 in the same shell session — output is correct.
Why the obvious workarounds don't fully solve it
chcp 65001 — works only for persistent sessions
chcp 65001
gws docs documents get ...
This resolves the issue for interactive sessions. However, any automation layer that spawns a fresh cmd /c per command gets a new child process with the default codepage. The chcp change does not persist across cmd /c invocations.
PowerShell with console encoding set
pwsh -Command "[Console]::OutputEncoding = [System.Text.Encoding]::UTF8; gws docs documents get ..."
Resolves the mojibake in some scenarios, but introduces different reliability issues in automated environments (stdout capture failures, timeouts). Not a clean substitute for a fix in gws itself.
Hypothesis
On Windows, Rust's default stdout writer honors the console's output codepage. When the codepage is CP-1252 and the program writes UTF-8 bytes, each byte above 0x7F is re-rendered through the CP-1252 glyph table. gws should either:
- Call
SetConsoleOutputCP(65001) at startup (via windows-sys or equivalent), or
- Write to stdout through a binary-mode handle that bypasses codepage translation entirely.
Both approaches are well-established for Rust CLIs targeting Windows and should be wrapped in #[cfg(target_os = "windows")] to leave Unix builds unaffected.
Impact
Affects every read operation whose response includes non-ASCII characters:
- Google Docs content (smart quotes and em-dashes are common in AI-assisted writing)
- Gmail message bodies (smart punctuation in signatures, quoted replies, non-English correspondence)
- Contact names with accented characters
- Calendar event titles and descriptions with non-ASCII characters
- File names containing non-ASCII characters
The corruption is silent — no error, no warning, exit 0. Downstream consumers that store or forward gws output will persist the corrupted bytes.
Requested behavior
On Windows, force UTF-8 output regardless of the user's console codepage. Two concrete options:
- Call
SetConsoleOutputCP(65001) at startup. Minor downside: modifies global console state, which could surprise users with a specific codepage set for other tools.
- Write stdout as raw bytes, bypassing codepage translation. Use
GetStdHandle(STD_OUTPUT_HANDLE) + WriteFile, or write to std::io::stdout().lock() after confirming the underlying handle mode. More correct long-term — does not modify global state.
Either approach should be conditional on cfg(target_os = "windows").
Body
Version:
gwsv0.22.5 (Windows x86_64 binary from GitHub Releases)Environment:
cmd /c gws ...(also reproducible from a plaincmd.exeprompt)Summary
Non-ASCII UTF-8 bytes in API response data are emitted to stdout as if each byte were a CP-1252 character. The classic
’andâ€"patterns appear in place of smart quotes and em-dashes respectively.This is the standard "UTF-8 bytes interpreted as CP-1252" mojibake signature. The underlying API response is correct UTF-8; the corruption is introduced in
gws's output pipeline on Windows when the console codepage is not UTF-8.Reproduction
Any
gwscommand whose response contains smart punctuation or other non-ASCII characters will reproduce this. A reliable test case:on a Google Doc containing AI-generated prose (which typically includes smart quotes and em-dashes).
Expected:
Actual:
Confirmation the source data is correct:
gwscommand after executingchcp 65001in the same shell session — output is correct.Why the obvious workarounds don't fully solve it
chcp 65001— works only for persistent sessionsThis resolves the issue for interactive sessions. However, any automation layer that spawns a fresh
cmd /cper command gets a new child process with the default codepage. Thechcpchange does not persist acrosscmd /cinvocations.PowerShell with console encoding set
Resolves the mojibake in some scenarios, but introduces different reliability issues in automated environments (stdout capture failures, timeouts). Not a clean substitute for a fix in
gwsitself.Hypothesis
On Windows, Rust's default stdout writer honors the console's output codepage. When the codepage is CP-1252 and the program writes UTF-8 bytes, each byte above 0x7F is re-rendered through the CP-1252 glyph table.
gwsshould either:SetConsoleOutputCP(65001)at startup (viawindows-sysor equivalent), orBoth approaches are well-established for Rust CLIs targeting Windows and should be wrapped in
#[cfg(target_os = "windows")]to leave Unix builds unaffected.Impact
Affects every read operation whose response includes non-ASCII characters:
The corruption is silent — no error, no warning, exit 0. Downstream consumers that store or forward
gwsoutput will persist the corrupted bytes.Requested behavior
On Windows, force UTF-8 output regardless of the user's console codepage. Two concrete options:
SetConsoleOutputCP(65001)at startup. Minor downside: modifies global console state, which could surprise users with a specific codepage set for other tools.GetStdHandle(STD_OUTPUT_HANDLE)+WriteFile, or write tostd::io::stdout().lock()after confirming the underlying handle mode. More correct long-term — does not modify global state.Either approach should be conditional on
cfg(target_os = "windows").