Skip to content

Non-ASCII characters in API responses are mangled on Windows when the console codepage is CP-1252 (default) #742

@hoyt-harness

Description

@hoyt-harness

Body

Version: gws v0.22.5 (Windows x86_64 binary from GitHub Releases)

Environment:

  • OS: Windows 10/11
  • Shell / invocation: cmd /c gws ... (also reproducible from a plain cmd.exe prompt)
  • Default console codepage: CP-1252 (en-US, "Windows-1252")

Summary

Non-ASCII UTF-8 bytes in API response data are emitted to stdout as if each byte were a CP-1252 character. The classic ’ and â€" patterns appear in place of smart quotes and em-dashes respectively.

This is the standard "UTF-8 bytes interpreted as CP-1252" mojibake signature. The underlying API response is correct UTF-8; the corruption is introduced in gws's output pipeline on Windows when the console codepage is not UTF-8.

Reproduction

Any gws command whose response contains smart punctuation or other non-ASCII characters will reproduce this. A reliable test case:

gws docs documents get --document-id <ID> --fields "body(content(paragraph(elements(textRun(content)))))"

on a Google Doc containing AI-generated prose (which typically includes smart quotes and em-dashes).

Expected:

it's a balanced — considered — approach

Actual:

it’s a balanced â€" considered â€" approach

Confirmation the source data is correct:

  1. Open the same document in the Google Docs web UI — characters display correctly.
  2. Fetch the same document via any UTF-8-aware HTTP client — characters are correct UTF-8.
  3. Run the same gws command after executing chcp 65001 in the same shell session — output is correct.

Why the obvious workarounds don't fully solve it

chcp 65001 — works only for persistent sessions

chcp 65001
gws docs documents get ...

This resolves the issue for interactive sessions. However, any automation layer that spawns a fresh cmd /c per command gets a new child process with the default codepage. The chcp change does not persist across cmd /c invocations.

PowerShell with console encoding set

pwsh -Command "[Console]::OutputEncoding = [System.Text.Encoding]::UTF8; gws docs documents get ..."

Resolves the mojibake in some scenarios, but introduces different reliability issues in automated environments (stdout capture failures, timeouts). Not a clean substitute for a fix in gws itself.

Hypothesis

On Windows, Rust's default stdout writer honors the console's output codepage. When the codepage is CP-1252 and the program writes UTF-8 bytes, each byte above 0x7F is re-rendered through the CP-1252 glyph table. gws should either:

  • Call SetConsoleOutputCP(65001) at startup (via windows-sys or equivalent), or
  • Write to stdout through a binary-mode handle that bypasses codepage translation entirely.

Both approaches are well-established for Rust CLIs targeting Windows and should be wrapped in #[cfg(target_os = "windows")] to leave Unix builds unaffected.

Impact

Affects every read operation whose response includes non-ASCII characters:

  • Google Docs content (smart quotes and em-dashes are common in AI-assisted writing)
  • Gmail message bodies (smart punctuation in signatures, quoted replies, non-English correspondence)
  • Contact names with accented characters
  • Calendar event titles and descriptions with non-ASCII characters
  • File names containing non-ASCII characters

The corruption is silent — no error, no warning, exit 0. Downstream consumers that store or forward gws output will persist the corrupted bytes.

Requested behavior

On Windows, force UTF-8 output regardless of the user's console codepage. Two concrete options:

  1. Call SetConsoleOutputCP(65001) at startup. Minor downside: modifies global console state, which could surprise users with a specific codepage set for other tools.
  2. Write stdout as raw bytes, bypassing codepage translation. Use GetStdHandle(STD_OUTPUT_HANDLE) + WriteFile, or write to std::io::stdout().lock() after confirming the underlying handle mode. More correct long-term — does not modify global state.

Either approach should be conditional on cfg(target_os = "windows").

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions