Summary
The mcp-publisher CLI on Windows mangles non-ASCII bytes in the description field of server.json somewhere between reading the local file and sending the publish request to the registry. The result: published entries contain invalid UTF-16 surrogates and broken text that JSON parsers flag as malformed.
Environment
- OS: Windows 11
- Shell: Git Bash (MSYS2) on Windows
- System locale: zh-CN (default Windows codepage CP-936 / GBK)
mcp-publisher binary: mcp-publisher_windows_amd64.tar.gz from the official release
- Auth method tried: both DNS-based and GitHub OIDC — same outcome
Reproducer
-
mcp-publisher init and edit the description to include an em-dash (U+2014, UTF-8 E2 80 94):
{
"name": "org.example/my-server",
"description": "Turn rough requests into rigorously structured prompts — for any coding agent.",
"version": "0.1.3",
"...": "..."
}
-
Verify the local file is correct UTF-8:
$ xxd server.json | grep e28094
... e2 80 94 ... # em-dash present as expected
-
mcp-publisher login ... and mcp-publisher publish. The publish step reports success.
-
Query the registry:
curl 'https://registry.modelcontextprotocol.io/v0.1/servers?search=<name>'
-
The returned description no longer contains —. It contains a mangled sequence with an invalid lone low surrogate.
Concrete evidence — live registry entry
A real broken entry is still queryable (we had to bump version to ship a fix):
$ curl -s 'https://registry.modelcontextprotocol.io/v0.1/servers?search=org.sciscale/agentforge-mcp' \
| python -m json.tool
Version 0.1.3 returns:
"description": "Turn rough requests into rigorously structured prompts 鈥\udc94 for any coding agent."
Version 0.1.4 (em-dash removed as workaround) is clean:
"description": "Turn rough requests into rigorously structured prompts, for any coding agent."
Note the mangling pattern: the three UTF-8 bytes E2 80 94 came out as 鈥\udc94:
E2 80 decoded as CP-936 (GBK) → 鈥 (a valid CJK character, 硥)
- The orphan
94 byte → \udc94, a lone low surrogate (invalid UTF-16; Go's typical way of round-tripping an unrepresentable byte via the "surrogateescape"-like scheme)
This is the unmistakable signature of UTF-8 bytes being reinterpreted as the Windows ANSI / OEM codepage somewhere in the pipeline, then the orphan byte being escaped into a surrogate.
Likely root cause
Something in the publish path is going through a Windows codepage API instead of treating the JSON body as opaque UTF-8 bytes. Candidates worth auditing in mcp-publisher:
syscall.UTF16FromString / UTF16ToString on anything that touches the description string
- File reading that goes through a layer doing
windows.MultiByteToWideChar(CP_ACP, ...) instead of os.ReadFile
os.Args decoding on Windows (less likely here since the field is in a file)
- Any logging / printing step that re-serializes the body and then sends the re-serialized version on the wire
For context, we observed the same exact \udc94-style mangling with gh api and curl when posting non-ASCII bodies from Git Bash on this machine. Switching the same request to Python's urllib.request (which uses Windows-native sockets and doesn't pass the body through the MSYS / Git Bash layer) preserved bytes correctly. So the issue may not even be in mcp-publisher's Go code per se — it could be that the binary inherits a mangled environment from Git Bash / MSYS. Either way, the publisher should be robust to this.
Impact
- Any Windows publisher with non-ASCII in
description (em-dash, en-dash, smart quotes, emoji, accented characters, CJK text) will silently ship mangled metadata to the official registry.
- Directory listings on
registry.modelcontextprotocol.io and downstream consumers display broken text.
- Strict JSON parsers refuse to load the response at all (Python's
json.dumps rejects lone surrogates by default).
Compounding problem
The registry refuses to republish the same version, so a publisher who hits this can't simply re-run publish after spotting the bug — they have to bump the package version (and re-publish to npm / PyPI / wherever) just to fix a cosmetic display issue. In our case we had to bump npm 0.1.3 → 0.1.4 with no actual code change other than removing the em-dash from one string.
Suggested fix direction
- Validate before sending. Right before transmitting the publish request body, run
utf8.Valid(jsonBytes) and a check that no 0xDC00..0xDFFF lone surrogates appear in any string field. Fail loudly with a clear message naming the field if either check fails — better to refuse to publish than to silently ship garbage.
- Audit the read path. Confirm
server.json is read with os.ReadFile (which gives you raw bytes) and that the JSON decoder never round-trips through syscall.UTF16FromString or any CP_ACP-aware Windows API.
- Consider sniffing the environment. If
mcp-publisher detects it's running under Git Bash / MSYS with a non-UTF-8 ANSI codepage, it could print a one-line warning at startup pointing at this issue.
Offer
Happy to test patches against this exact repro — we have a Windows 11 + Git Bash + zh-CN setup that reliably reproduces the bug.
Affected package
- npm package:
agentforge-mcp
- Registry name:
org.sciscale/agentforge-mcp
- Broken version:
0.1.3 (still queryable — see evidence above)
- Workaround version:
0.1.4 (em-dash removed; not a real fix, just avoidance)
Summary
The
mcp-publisherCLI on Windows mangles non-ASCII bytes in thedescriptionfield ofserver.jsonsomewhere between reading the local file and sending the publish request to the registry. The result: published entries contain invalid UTF-16 surrogates and broken text that JSON parsers flag as malformed.Environment
mcp-publisherbinary:mcp-publisher_windows_amd64.tar.gzfrom the official releaseReproducer
mcp-publisher initand edit the description to include an em-dash (U+2014, UTF-8E2 80 94):{ "name": "org.example/my-server", "description": "Turn rough requests into rigorously structured prompts — for any coding agent.", "version": "0.1.3", "...": "..." }Verify the local file is correct UTF-8:
mcp-publisher login ...andmcp-publisher publish. The publish step reports success.Query the registry:
The returned
descriptionno longer contains—. It contains a mangled sequence with an invalid lone low surrogate.Concrete evidence — live registry entry
A real broken entry is still queryable (we had to bump version to ship a fix):
Version
0.1.3returns:Version
0.1.4(em-dash removed as workaround) is clean:Note the mangling pattern: the three UTF-8 bytes
E2 80 94came out as鈥\udc94:E2 80decoded as CP-936 (GBK) →鈥(a valid CJK character, 硥)94byte →\udc94, a lone low surrogate (invalid UTF-16; Go's typical way of round-tripping an unrepresentable byte via the "surrogateescape"-like scheme)This is the unmistakable signature of UTF-8 bytes being reinterpreted as the Windows ANSI / OEM codepage somewhere in the pipeline, then the orphan byte being escaped into a surrogate.
Likely root cause
Something in the publish path is going through a Windows codepage API instead of treating the JSON body as opaque UTF-8 bytes. Candidates worth auditing in
mcp-publisher:syscall.UTF16FromString/UTF16ToStringon anything that touches the description stringwindows.MultiByteToWideChar(CP_ACP, ...)instead ofos.ReadFileos.Argsdecoding on Windows (less likely here since the field is in a file)For context, we observed the same exact
\udc94-style mangling withgh apiandcurlwhen posting non-ASCII bodies from Git Bash on this machine. Switching the same request to Python'surllib.request(which uses Windows-native sockets and doesn't pass the body through the MSYS / Git Bash layer) preserved bytes correctly. So the issue may not even be inmcp-publisher's Go code per se — it could be that the binary inherits a mangled environment from Git Bash / MSYS. Either way, the publisher should be robust to this.Impact
description(em-dash, en-dash, smart quotes, emoji, accented characters, CJK text) will silently ship mangled metadata to the official registry.registry.modelcontextprotocol.ioand downstream consumers display broken text.json.dumpsrejects lone surrogates by default).Compounding problem
The registry refuses to republish the same version, so a publisher who hits this can't simply re-run
publishafter spotting the bug — they have to bump the package version (and re-publish to npm / PyPI / wherever) just to fix a cosmetic display issue. In our case we had to bump npm 0.1.3 → 0.1.4 with no actual code change other than removing the em-dash from one string.Suggested fix direction
utf8.Valid(jsonBytes)and a check that no0xDC00..0xDFFFlone surrogates appear in any string field. Fail loudly with a clear message naming the field if either check fails — better to refuse to publish than to silently ship garbage.server.jsonis read withos.ReadFile(which gives you raw bytes) and that the JSON decoder never round-trips throughsyscall.UTF16FromStringor any CP_ACP-aware Windows API.mcp-publisherdetects it's running under Git Bash / MSYS with a non-UTF-8 ANSI codepage, it could print a one-line warning at startup pointing at this issue.Offer
Happy to test patches against this exact repro — we have a Windows 11 + Git Bash + zh-CN setup that reliably reproduces the bug.
Affected package
agentforge-mcporg.sciscale/agentforge-mcp0.1.3(still queryable — see evidence above)0.1.4(em-dash removed; not a real fix, just avoidance)