Skip to content

PR comment bodies posted from Windows PowerShell 5.x can be mojibake-corrupted #255

@abeltrano

Description

@abeltrano

Summary

Templates that post comments to GitHub PRs (respond-to-pr-comments, review-pull-request) and to Azure DevOps PRs are vulnerable to silent character corruption (mojibake) when the agent runs on Windows PowerShell 5.x and the comment body contains non-ASCII characters (em-dashes, smart quotes, accented names, currency symbols, etc.).

Reproduction

Observed during work on PR #254. The PR description was edited via:

gh pr view 254 --json body --jq .body | Out-File -Encoding utf8 pr-body.md
# ...sed-style replace...
gh pr edit 254 --body-file pr-body.md

After the round-trip, em-dashes (, U+2014) appeared in the GitHub UI as דÇù-style garbage and apostrophes (') as ''.

Root cause

Two interacting issues on Windows PowerShell 5.x:

  1. Pipe decoding: gh ... | Out-File decodes the UTF-8 byte stream from gh as the console codepage (Windows-1252 by default), then re-encodes when writing — producing classic UTF-8 → CP1252 → UTF-8 mojibake.
  2. Out-File defaults: Out-File and Set-Content default to Windows-1252 (not UTF-8). Out-File -Encoding utf8 writes UTF-8 with a BOM, which some APIs misinterpret.

Both bash / zsh / PowerShell 7+ default to UTF-8 and are not affected by the same defaults.

Impact

Any template that drafts user-voice text and posts it via gh api, gh pr edit, or az rest is at risk:

  • respond-to-pr-comments (GitHub + ADO reply paths)
  • review-pull-request (inline comments + overall review summary)
  • Future templates that post comments, descriptions, or release notes

The risk is amplified by the new human-voice-fidelity protocol (PR #254), which encourages calibrated voice — including punctuation habits like em-dashes for users who use them, and accented names in greetings.

Proposed fix

Add an "Encoding for external posts" subsection to protocols/guardrails/operational-constraints.md:

  • Always write comment / reply / description bodies to a temp file before passing to gh / az rest.
  • On Windows PowerShell 5.x, use [System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false)) — never Out-File or Set-Content for non-ASCII bodies.
  • Never round-trip existing comment text through gh pr view --jq | Out-File for editing — write fresh content in clean UTF-8.
  • bash / zsh / PowerShell 7+: default UTF-8 is fine.

The protocol is already included by both PR-posting templates, so this single edit fixes all current consumers and any future ones that include the protocol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions