Description
_continue_generate() in ms_agent/llm/openai_llm.py creates invalid conversation history when a truncated assistant response contains tool_calls. The method appends the partial message (with dangling tool_calls) to the message history without executing the tools first, then makes another API call. Providers that strictly validate the OpenAI spec reject the resulting conversation state.
Command
The deep_research/v2 pipeline with any LLM that generates long responses (tested with deepseek-v4-pro). The reporter sub-agent generates a response containing both text content and tool_calls; when the response hits finish_reason: length, the continue-generation path corrupts the conversation.
What happened
The DeepSeek API returns:
openai.BadRequestError: Error code: 400 - {
'error': {
'message': "An assistant message with 'tool_calls' must be followed by tool messages
responding to each 'tool_call_id'. (insufficient tool messages following
tool_calls message)",
'type': 'invalid_request_error'
}
}
The error retries 3 times (all fail identically), then the sub-agent crashes with RuntimeError: Sub-agent reporter_tool failed.
What was expected
When an assistant message has tool_calls, those tools should be executed and their responses appended to the conversation history BEFORE any subsequent LLM calls. The continue-gen path should exit early and let the normal tool execution loop handle the tool_calls.
Root cause
In openai_llm.py, _continue_generate (lines 504-541) and _stream_continue_generate (lines 274-364) check finish_reason but never check whether new_message.tool_calls is non-empty:
# _continue_generate, line 524-535:
new_message = self._format_output_message(completion)
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
completion = self._call_llm_for_continue_gen(
messages, new_message, tools, **kwargs)
_call_llm_for_continue_gen (line 487-502) appends new_message (with its tool_calls) to messages, then calls _call_llm. The API receives:
assistant: {"role": "assistant", "content": "I'll write the report...",
"tool_calls": [{"id": "call_abc", "function": {"name": "write_file", ...}}]} # APPENDED
# NO tool response with tool_call_id="call_abc"
# Next: user/system message, or another assistant message
DeepSeek validates the entire message list and rejects it.
Affected code
ms_agent/llm/openai_llm.py - _continue_generate() (lines 504-541)
ms_agent/llm/openai_llm.py - _stream_continue_generate() (lines 274-364)
ms_agent/llm/openai_llm.py - _call_llm_for_continue_gen() (lines 487-502)
Reproduction
The bug triggers reliably when ALL of these conditions hold:
- The model generates a response that includes both content text and
tool_calls
- The response exceeds the model's
max_tokens, causing finish_reason: length
- The continue-generation logic fires (
max_continue_runs not yet exhausted)
This is most likely to occur with agents that produce long mixed text+tool responses (report writers, code generators with tool calls mid-response).
Suggested fix
Before entering the continue-gen path, check if the truncated message has tool_calls. If it does, return the message as-is and let the normal step() loop handle tool execution. The tool calls will be executed, responses appended, and the next LLM call will have a valid conversation.
# In _continue_generate and _stream_continue_generate:
new_message = self._format_output_message(completion)
if new_message.tool_calls:
# Let tool execution handle this - don't try to continue
return new_message
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
# safe to continue - no dangling tool_calls
...
Workaround
None. The bug is in the core continue-generation logic and cannot be worked around via config. Any agent that generates long tool-calling responses will eventually hit it.
Versions / Dependencies
- MS-Agent: v0.11.0 (PyPI, installed via
pip install 'ms-agent[research]')
- Python: 3.11
- OS: Linux (Docker
python:3.11-slim)
- OpenAI SDK: 2.33.0
- LLM: deepseek-v4-pro (via
openai_base_url: https://api.deepseek.com/v1)
Description
_continue_generate()inms_agent/llm/openai_llm.pycreates invalid conversation history when a truncated assistant response containstool_calls. The method appends the partial message (with dangling tool_calls) to the message history without executing the tools first, then makes another API call. Providers that strictly validate the OpenAI spec reject the resulting conversation state.Command
The
deep_research/v2pipeline with any LLM that generates long responses (tested withdeepseek-v4-pro). The reporter sub-agent generates a response containing both text content and tool_calls; when the response hitsfinish_reason: length, the continue-generation path corrupts the conversation.What happened
The DeepSeek API returns:
The error retries 3 times (all fail identically), then the sub-agent crashes with
RuntimeError: Sub-agent reporter_tool failed.What was expected
When an assistant message has
tool_calls, those tools should be executed and their responses appended to the conversation history BEFORE any subsequent LLM calls. The continue-gen path should exit early and let the normal tool execution loop handle the tool_calls.Root cause
In
openai_llm.py,_continue_generate(lines 504-541) and_stream_continue_generate(lines 274-364) checkfinish_reasonbut never check whethernew_message.tool_callsis non-empty:_call_llm_for_continue_gen(line 487-502) appendsnew_message(with itstool_calls) tomessages, then calls_call_llm. The API receives:DeepSeek validates the entire message list and rejects it.
Affected code
ms_agent/llm/openai_llm.py-_continue_generate()(lines 504-541)ms_agent/llm/openai_llm.py-_stream_continue_generate()(lines 274-364)ms_agent/llm/openai_llm.py-_call_llm_for_continue_gen()(lines 487-502)Reproduction
The bug triggers reliably when ALL of these conditions hold:
tool_callsmax_tokens, causingfinish_reason: lengthmax_continue_runsnot yet exhausted)This is most likely to occur with agents that produce long mixed text+tool responses (report writers, code generators with tool calls mid-response).
Suggested fix
Before entering the continue-gen path, check if the truncated message has
tool_calls. If it does, return the message as-is and let the normalstep()loop handle tool execution. The tool calls will be executed, responses appended, and the next LLM call will have a valid conversation.Workaround
None. The bug is in the core continue-generation logic and cannot be worked around via config. Any agent that generates long tool-calling responses will eventually hit it.
Versions / Dependencies
pip install 'ms-agent[research]')python:3.11-slim)openai_base_url: https://api.deepseek.com/v1)