[framework] _continue_generate corrupts conversation when truncated response contains tool_calls

## Description

`_continue_generate()` in `ms_agent/llm/openai_llm.py` creates invalid conversation history when a truncated assistant response contains `tool_calls`. The method appends the partial message (with dangling tool_calls) to the message history without executing the tools first, then makes another API call. Providers that strictly validate the OpenAI spec reject the resulting conversation state.

### Command

The `deep_research/v2` pipeline with any LLM that generates long responses (tested with `deepseek-v4-pro`). The reporter sub-agent generates a response containing both text content and tool_calls; when the response hits `finish_reason: length`, the continue-generation path corrupts the conversation.

### What happened

The DeepSeek API returns:

```
openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': "An assistant message with 'tool_calls' must be followed by tool messages
                responding to each 'tool_call_id'. (insufficient tool messages following
                tool_calls message)",
    'type': 'invalid_request_error'
  }
}
```

The error retries 3 times (all fail identically), then the sub-agent crashes with `RuntimeError: Sub-agent reporter_tool failed`.

### What was expected

When an assistant message has `tool_calls`, those tools should be executed and their responses appended to the conversation history BEFORE any subsequent LLM calls. The continue-gen path should exit early and let the normal tool execution loop handle the tool_calls.

### Root cause

In `openai_llm.py`, `_continue_generate` (lines 504-541) and `_stream_continue_generate` (lines 274-364) check `finish_reason` but never check whether `new_message.tool_calls` is non-empty:

```python
# _continue_generate, line 524-535:
new_message = self._format_output_message(completion)
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    completion = self._call_llm_for_continue_gen(
        messages, new_message, tools, **kwargs)
```

`_call_llm_for_continue_gen` (line 487-502) appends `new_message` (with its `tool_calls`) to `messages`, then calls `_call_llm`. The API receives:

```
assistant: {"role": "assistant", "content": "I'll write the report...",
            "tool_calls": [{"id": "call_abc", "function": {"name": "write_file", ...}}]}  # APPENDED
# NO tool response with tool_call_id="call_abc"
# Next: user/system message, or another assistant message
```

DeepSeek validates the entire message list and rejects it.

### Affected code

- `ms_agent/llm/openai_llm.py` - `_continue_generate()` (lines 504-541)
- `ms_agent/llm/openai_llm.py` - `_stream_continue_generate()` (lines 274-364)
- `ms_agent/llm/openai_llm.py` - `_call_llm_for_continue_gen()` (lines 487-502)

### Reproduction

The bug triggers reliably when ALL of these conditions hold:

1. The model generates a response that includes both content text and `tool_calls`
2. The response exceeds the model's `max_tokens`, causing `finish_reason: length`
3. The continue-generation logic fires (`max_continue_runs` not yet exhausted)

This is most likely to occur with agents that produce long mixed text+tool responses (report writers, code generators with tool calls mid-response).

### Suggested fix

Before entering the continue-gen path, check if the truncated message has `tool_calls`. If it does, return the message as-is and let the normal `step()` loop handle tool execution. The tool calls will be executed, responses appended, and the next LLM call will have a valid conversation.

```python
# In _continue_generate and _stream_continue_generate:
new_message = self._format_output_message(completion)
if new_message.tool_calls:
    # Let tool execution handle this - don't try to continue
    return new_message
if completion.choices[0].finish_reason in ['length', 'null'] and ...:
    # safe to continue - no dangling tool_calls
    ...
```

### Workaround

None. The bug is in the core continue-generation logic and cannot be worked around via config. Any agent that generates long tool-calling responses will eventually hit it.

## Versions / Dependencies

- **MS-Agent**: v0.11.0 (PyPI, installed via `pip install 'ms-agent[research]'`)
- **Python**: 3.11
- **OS**: Linux (Docker `python:3.11-slim`)
- **OpenAI SDK**: 2.33.0
- **LLM**: deepseek-v4-pro (via `openai_base_url: https://api.deepseek.com/v1`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[framework] _continue_generate corrupts conversation when truncated response contains tool_calls #909

Description

Command

What happened

What was expected

Root cause

Affected code

Reproduction

Suggested fix

Workaround

Versions / Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[framework] _continue_generate corrupts conversation when truncated response contains tool_calls #909

Description

Description

Command

What happened

What was expected

Root cause

Affected code

Reproduction

Suggested fix

Workaround

Versions / Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions