Skip to content

[framework] _continue_generate corrupts conversation when truncated response contains tool_calls #909

@lobstersyrup

Description

@lobstersyrup

Description

_continue_generate() in ms_agent/llm/openai_llm.py creates invalid conversation history when a truncated assistant response contains tool_calls. The method appends the partial message (with dangling tool_calls) to the message history without executing the tools first, then makes another API call. Providers that strictly validate the OpenAI spec reject the resulting conversation state.

Command

The deep_research/v2 pipeline with any LLM that generates long responses (tested with deepseek-v4-pro). The reporter sub-agent generates a response containing both text content and tool_calls; when the response hits finish_reason: length, the continue-generation path corrupts the conversation.

What happened

The DeepSeek API returns:

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "An assistant message with 'tool_calls' must be followed by tool messages
                responding to each 'tool_call_id'. (insufficient tool messages following
                tool_calls message)",
    'type': 'invalid_request_error'
  }
}

The error retries 3 times (all fail identically), then the sub-agent crashes with RuntimeError: Sub-agent reporter_tool failed.

What was expected

When an assistant message has tool_calls, those tools should be executed and their responses appended to the conversation history BEFORE any subsequent LLM calls. The continue-gen path should exit early and let the normal tool execution loop handle the tool_calls.

Root cause

In openai_llm.py, _continue_generate (lines 504-541) and _stream_continue_generate (lines 274-364) check finish_reason but never check whether new_message.tool_calls is non-empty:

# _continue_generate, line 524-535:
new_message = self._format_output_message(completion)
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    completion = self._call_llm_for_continue_gen(
        messages, new_message, tools, **kwargs)

_call_llm_for_continue_gen (line 487-502) appends new_message (with its tool_calls) to messages, then calls _call_llm. The API receives:

assistant: {"role": "assistant", "content": "I'll write the report...",
            "tool_calls": [{"id": "call_abc", "function": {"name": "write_file", ...}}]}  # APPENDED
# NO tool response with tool_call_id="call_abc"
# Next: user/system message, or another assistant message

DeepSeek validates the entire message list and rejects it.

Affected code

  • ms_agent/llm/openai_llm.py - _continue_generate() (lines 504-541)
  • ms_agent/llm/openai_llm.py - _stream_continue_generate() (lines 274-364)
  • ms_agent/llm/openai_llm.py - _call_llm_for_continue_gen() (lines 487-502)

Reproduction

The bug triggers reliably when ALL of these conditions hold:

  1. The model generates a response that includes both content text and tool_calls
  2. The response exceeds the model's max_tokens, causing finish_reason: length
  3. The continue-generation logic fires (max_continue_runs not yet exhausted)

This is most likely to occur with agents that produce long mixed text+tool responses (report writers, code generators with tool calls mid-response).

Suggested fix

Before entering the continue-gen path, check if the truncated message has tool_calls. If it does, return the message as-is and let the normal step() loop handle tool execution. The tool calls will be executed, responses appended, and the next LLM call will have a valid conversation.

# In _continue_generate and _stream_continue_generate:
new_message = self._format_output_message(completion)
if new_message.tool_calls:
    # Let tool execution handle this - don't try to continue
    return new_message
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    # safe to continue - no dangling tool_calls
    ...

Workaround

None. The bug is in the core continue-generation logic and cannot be worked around via config. Any agent that generates long tool-calling responses will eventually hit it.

Versions / Dependencies

  • MS-Agent: v0.11.0 (PyPI, installed via pip install 'ms-agent[research]')
  • Python: 3.11
  • OS: Linux (Docker python:3.11-slim)
  • OpenAI SDK: 2.33.0
  • LLM: deepseek-v4-pro (via openai_base_url: https://api.deepseek.com/v1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions