Pass ensure_ascii=False when serializing MCP tool results#7730
Open
adityasingh2400 wants to merge 1 commit into
Open
Pass ensure_ascii=False when serializing MCP tool results#7730adityasingh2400 wants to merge 1 commit into
adityasingh2400 wants to merge 1 commit into
Conversation
json.dumps defaults to ensure_ascii=True, which mangles every non-ASCII string into \u escapes before the result is sent back to the model. Powerful models technically un-escape these, but the extra tokens hurt cost and quality for tools that return Japanese/Chinese/Arabic text. Pass ensure_ascii=False so the original characters survive. Fixes microsoft#6995
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
McpToolAdapter.return_value_as_stringbuilds the string that flows back to the model viajson.dumps(...).json.dumpsdefaults toensure_ascii=True, so every non-ASCII character (Japanese, Chinese, Arabic, Hebrew, emoji, ...) is escaped into\uXXXXsequences before the model ever sees it.That has two effects on the LLM side:
日本語is 1 visible character per glyph but 6 tokens per glyph after escaping (日etc.). For tools returning paragraphs of CJK text this is a real cost and latency tax.Passing
ensure_ascii=Falsekeeps the original characters intact. JSON is still valid (UTF-8 JSON is the default in every spec since RFC 8259) and downstream consumers that already expect UTF-8 are unaffected.What changed
python/packages/autogen-ext/src/autogen_ext/tools/mcp/_base.py: one-line change,json.dumps(..., ensure_ascii=False)on the result serialization path.python/packages/autogen-ext/tests/tools/test_mcp_tools.py: new regression testtest_return_value_as_string_preserves_non_ascii_textthat round-trips Japanese and ChineseTextContentthroughreturn_value_as_stringand asserts the original characters survive (and that no\uescapes leak through).Fixes #6995.
Verification
146 passed, 1 skipped. The existing
test_adapter_from_server_params_with_return_value_as_string(which exercises the same code path with ASCII-only payloads) still passes unchanged, confirming no behavior change for the ASCII case.uv run ruff checkanduv run ruff formatare both clean on the touched files.Notes on PR #7256
There is an older PR (#7256) that touches the same line plus an additional
json.dumpsin_host/_elicitation.py. That PR has had no reviewer activity since February. This PR keeps the change minimal and focused on the user-facing path called out in the issue (tool result serialization back to the model). The elicitationjson.dumpsformats a schema for a local human prompt rather than something sent to the LLM, so it is intentionally not changed here; happy to follow up if reviewers want it included.