fix(litellm): parse DeepSeek-V3 proprietary inline tool-call tokens#5654
fix(litellm): parse DeepSeek-V3 proprietary inline tool-call tokens#5654fuchun1010 wants to merge 5 commits into
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
c319bae to
e91b1f6
Compare
DeepSeek-V3 emits tool calls using proprietary special tokens (<|tool▁calls▁begin|>…<|tool▁call▁begin|>function<|tool▁sep|>NAME) embedded in the content field. When LiteLLM does not translate these into structured tool_calls (intermittent), the existing fallback JSON parser rejects the payload because the function name is stored inside the tokens rather than as a 'name' key in the JSON object. Add _parse_deepseek_tool_calls_from_text that detects the proprietary token format, extracts the function name and arguments, and emits standard ChatCompletionMessageToolCall objects. Integrate it into the existing _parse_tool_calls_from_text pipeline. Also add _extract_json_from_deepseek_args helper to handle optional Markdown code fences (json … ) that DeepSeek wraps around the arguments payload. Closes google#5024
e91b1f6 to
08e864e
Compare
|
Hi @fuchun1010 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @xuanyang15 , can you please review this. |
|
@GWeale Could you please help review? |
fuchun1010
left a comment
There was a problem hiding this comment.
Review Summary
Thanks for this PR! The DeepSeek inline tool-call format is a real pain point when LiteLLM's translation is inconsistent, and this parser is a clean solution.
What Works Well ✅
- Well-documented: Clear references to DeepSeek API docs and inline comments explaining the token format
- Comprehensive test coverage: 8 test cases covering single/multi calls, plain JSON args (no code fences), surrounding text, mixed formats (DeepSeek + standard inline JSON), empty/whitespace-only input, and integration with the generic parser
- Clean remainder handling: Surrounding text is correctly preserved and returned, matching the existing
_parse_tool_calls_from_textcontract - Recursive mixed-format support: When both DeepSeek tokens and standard inline JSON appear in the same text, the fallback recursion in
_parse_tool_calls_from_texthandles both correctly — nice touch - Quick guard optimization: The
_DS_TCALLS_BEGIN not in text_block and _DS_TCALL_BEGIN not in text_blockcheck avoids regex overhead on normal responses
Suggestions / Questions
-
_extract_json_from_deepseek_argsround-trip: The function doesjson.loads(raw_decode(...))→json.dumps(candidate, ensure_ascii=False). While functionally correct (JSON objects are unordered by spec), this round-trip could theoretically reorder keys. Is there a reason not to return the raw substring fromraw_decode? Something like:candidate, end = _JSON_DECODER.raw_decode(args_text, open_brace) return args_text[open_brace:end]
This preserves the original formatting and avoids the serialize/deserialize cycle.
-
Edge case — truncated tokens: What happens when the model output is cut off mid-token (e.g., partial
<|tool▁call▁begin|due to max_tokens)? The current code appends the unparsed text toremainder_partsvia theend_idx == -1branches, which seems correct — the partial token becomes remainder text. Worth adding a test for this scenario? -
Thread safety of
_JSON_DECODER: The module-level_JSON_DECODERis used in_extract_json_from_deepseek_args.json.JSONDecoderinstances are generally thread-safe for read-only operations (raw_decodedoesn't mutate state AFAIK), but worth double-checking sincelite_llm.pymay be used in async/threaded contexts. -
Minor: test helper deduplication:
_DS_BEGIN_CALLSetc. are redefined as module-level constants in the test file with the same values as inlite_llm.py. Consider importing them from the source module to avoid drift — though I understand this may be intentional to keep tests independent of implementation details.
Verdict
LGTM overall. The suggestions above are non-blocking — the core logic is solid and the test coverage is thorough. Happy to approve once the questions above are addressed (or dismissed).
Closes #5024
Problem
DeepSeek-V3 emits tool calls using proprietary special tokens embedded in the content field:
When LiteLLM does not translate these into structured
tool_calls(intermittent), ADK's fallback JSON parser finds the JSON object but rejects it because the function name (analysis_input) is embedded in the tokens (<|tool▁sep|>analysis_input) rather than as anamekey inside the JSON payload.Result: tool call is silently dropped and the raw tokens appear as text content.
Solution
_parse_deepseek_tool_calls_from_text— detects the proprietary token format, extracts function name + arguments, and emits standardChatCompletionMessageToolCallobjects_extract_json_from_deepseek_argshelper — handles optional Markdown code fences (```json ```) around the arguments payload_parse_tool_calls_from_textas the first-pass parser, with fallback to generic inline JSON parsingTesting Plan
Unit Tests: Added 8 new tests covering:
_parse_tool_calls_from_textRegression: Full
test_litellm.py: 264 passed, 0 failedFiles Changed
src/google/adk/models/lite_llm.pytests/unittests/models/test_litellm.py