fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks by localai-bot · Pull Request #10658 · mudler/LocalAI

localai-bot · 2026-07-03T08:07:43Z

Two verified bugs broke OpenAI-style tool calling on the MLX backend — and any other Python backend sharing backend/python/common. Both were found and fix-verified end-to-end today on a real M4 Mac (16GB) running LocalAI v4.5.5 with the metal-mlx backend and model mlx-community/Qwen3.5-2B-MLX-8bit, driven through Dante Desktop (nib). LocalAI master still carries the same code.

Bug 1 — tool-call round-trip crashes: "Can only get item pairs from a mapping"

backend/python/common/python_utils.py → messages_to_dicts did d["tool_calls"] = json.loads(msg.tool_calls) but left each tool call's function.arguments as a JSON string (that's the OpenAI wire format). HuggingFace chat templates (e.g. Qwen3.5's) iterate arguments as a mapping (.items()), so any request whose history contains a prior assistant tool_calls message failed with:

HTTP 500 — Generation failed: Can only get item pairs from a mapping.

This broke every agent loop on the second turn. Fix: after decoding tool_calls, decode each function.arguments string back into a dict so the template sees a mapping. Idempotent when it's already a dict; invalid-JSON arguments are left untouched.

Bug 2 — reasoning leaks into content when the opening think tag is implicit

backend/python/common/mlx_utils.py → split_reasoning returned ("", text) whenever think_start was absent. But models like Qwen3.5 open the assistant turn already inside thinking — the generated text contains only the closing </think>, never the opener — so the whole chain-of-thought leaked into content. Observed:

content = "The user is asking ...\n</think>\n\nThe weather in Rome ..."

Fix: when think_start is absent but think_end is present, treat everything before think_end as reasoning and the remainder as content.

Verification

Reproduced and fixed end-to-end on LocalAI v4.5.5 / metal-mlx / mlx-community/Qwen3.5-2B-MLX-8bit (M4 Mac 16GB, via Dante Desktop/nib): tool-call emission, the round-trip with a tool result no longer 500s, and reasoning now lands in the reasoning field instead of leaking into content.

Tests

Adds platform-independent unit tests under backend/python/common (stdlib-only, no MLX/venv required, following the existing parent_watch_test.py pattern) — the canonical helper tests in backend/python/mlx/test.py::TestSharedHelpers can't run off-Mac because that module imports grpc/backend_pb2 at import time. TDD: both new assertions fail against the unpatched code and pass with the fix.

python_utils_test.py: arguments string → mapping; arguments already a dict (idempotent); invalid-JSON arguments left as string; tool call without a function key; invalid tool_calls JSON dropped.
mlx_utils_test.py: implicit opener (only </think> present); both tags (unchanged); neither tag; empty think_end with no opener match; empty text.

Run:

cd backend/python/common && python3 -m unittest python_utils_test mlx_utils_test
# Ran 13 tests — OK

🤖 Generated with Claude Code

…d split implicit reasoning blocks Two bugs broke OpenAI-style tool calling on the MLX backend (and any Python backend sharing backend/python/common), reproduced end-to-end on LocalAI v4.5.5 with the metal-mlx backend and mlx-community/Qwen3.5-2B-MLX-8bit. messages_to_dicts left each tool call's function.arguments as the raw OpenAI-wire JSON string. HuggingFace chat templates (e.g. Qwen3.5) iterate arguments as a mapping (.items()), so any request whose history contained a prior assistant tool_calls message failed with HTTP 500 "Generation failed: Can only get item pairs from a mapping." — breaking every agent loop on its second turn. Decode the string back into a dict so the template sees a mapping. split_reasoning returned ("", text) whenever the opening think tag was absent. Models like Qwen3.5 open the assistant turn already inside thinking, so the generated text carries only the closing </think>; the whole chain-of-thought leaked into content. When the opener is missing but the closer is present, treat everything before the closer as reasoning. Adds platform-independent unit tests under backend/python/common (stdlib-only, no MLX/venv required, following parent_watch_test.py). Assisted-by: Claude Code:claude-opus-4-8

mudler approved these changes Jul 3, 2026

View reviewed changes

mudler merged commit 7a3583b into master Jul 3, 2026
58 of 60 checks passed

mudler deleted the fix/mlx-tool-call-roundtrip branch July 3, 2026 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks#10658

fix(python-backends): parse tool-call arguments for chat templates and split implicit reasoning blocks#10658
mudler merged 1 commit into
masterfrom
fix/mlx-tool-call-roundtrip

localai-bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jul 3, 2026

Bug 1 — tool-call round-trip crashes: "Can only get item pairs from a mapping"

Bug 2 — reasoning leaks into content when the opening think tag is implicit

Verification

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants