Problem
On Windows with PowerShell 5.1, UTF-8 Markdown files sometimes showed up as mojibake in the agent/tool context even though the files themselves were valid UTF-8 on disk.
Typical example:
This made patch-based edits unreliable, because the tool seemed to be matching against the misdecoded text it was seeing rather than the real file contents.
What I observed
The strange part was that the file was not actually corrupted.
- Normal agent/shell context could show mojibake
Get-Content -Encoding utf8 <file> showed the file correctly
- hex inspection confirmed the file bytes were valid UTF-8
So the problem looked less like “the file was broken” and more like “the tool/context path was decoding it incorrectly.”
Environment
- Windows
- PowerShell 5.1
- Initial code page:
437
- Initial console input encoding:
IBM437
- Initial
$OutputEncoding: us-ascii
Why this matters
When the displayed context is wrong, patch/edit operations can fail even though the file itself is fine.
This was especially painful with large .md files, because the agent could end up seeing mojibaked context, fail to match hunks, and behave as if the file content were different from what was really on disk.
How I worked around it
These local environment changes helped a lot:
chcp 65001
- set console input/output encoding to UTF-8
- set
$OutputEncoding to UTF-8
After that, UTF-8 reads and edit verification behaved much more reliably.
Expected behavior
The tool should read UTF-8 files correctly for context and patch matching regardless of Windows shell/codepage defaults, or at least use a deterministic UTF-8 file-reading path instead of inheriting console encoding behavior.
Possible related issue
There may also be a separate Windows issue where editing Markdown can normalize line endings in touched regions. That was much less serious than the mojibake problem, but it showed up during the same debugging session.
Problem
On Windows with PowerShell 5.1, UTF-8 Markdown files sometimes showed up as mojibake in the agent/tool context even though the files themselves were valid UTF-8 on disk.
Typical example:
—showing up as—This made patch-based edits unreliable, because the tool seemed to be matching against the misdecoded text it was seeing rather than the real file contents.
What I observed
The strange part was that the file was not actually corrupted.
Get-Content -Encoding utf8 <file>showed the file correctlySo the problem looked less like “the file was broken” and more like “the tool/context path was decoding it incorrectly.”
Environment
437IBM437$OutputEncoding:us-asciiWhy this matters
When the displayed context is wrong, patch/edit operations can fail even though the file itself is fine.
This was especially painful with large
.mdfiles, because the agent could end up seeing mojibaked context, fail to match hunks, and behave as if the file content were different from what was really on disk.How I worked around it
These local environment changes helped a lot:
chcp 65001$OutputEncodingto UTF-8After that, UTF-8 reads and edit verification behaved much more reliably.
Expected behavior
The tool should read UTF-8 files correctly for context and patch matching regardless of Windows shell/codepage defaults, or at least use a deterministic UTF-8 file-reading path instead of inheriting console encoding behavior.
Possible related issue
There may also be a separate Windows issue where editing Markdown can normalize line endings in touched regions. That was much less serious than the mojibake problem, but it showed up during the same debugging session.