fix(llm): convert per-turn instructions on the very first turn too by theomonnom · Pull Request #5828 · livekit/agents

theomonnom · 2026-05-24T06:17:26Z

Summary

convert_mid_conversation_instructions only rewrote system messages to a user-role <instructions> turn after the conversation had already seen a user/assistant turn (the seen_non_system flag).

On the very first generate_reply(instructions=...) of a session, the chat context contains:

system: agent's base prompt
system: per-turn instructions just appended by generate_reply

…and no user/assistant turn yet. The condition fails, both stay system, and the providers that require the conversation to end on a user/tool turn (Gemini, Anthropic, AWS) fall back to inject_dummy_user_message, which appends a literal {"role": "user", "parts": [{"text": "."}]}. The model — Gemini especially — sees the lone . and responds with things like "You haven't said anything yet, did you mean to ask something?"

Fix

Drop the seen_non_system tracking and apply a single rule: the first system/developer message is the preamble, every subsequent system/developer message is per-turn instructions and gets converted using the existing role + template.

This keeps:

the "only one base system prompt, nothing else" path unchanged (still falls back to the dummy ".", which is the intentional behaviour when there is genuinely nothing for the model to respond to);
the existing mid-conversation path unchanged (per-turn instructions added after the user has spoken are still rewritten);

…and adds the new case: base + per-turn before any user turn now also gets converted, so providers never have to inject the dummy.

Smoke test

A: only base system
  google: [{'role': 'user', 'parts': [{'text': '.'}]}]   # unchanged

B: base + per-turn, no user yet
  google: [{'role': 'user', 'parts': [{'text': '<instructions>\nSpeak first.\n</instructions>'}]}]   # was '.'

C: mid-conv per-turn (system after user/assistant)
  google: [...user, model, user(<instructions>)]   # unchanged

D: base + user
  google: [{'role': 'user', 'parts': [{'text': 'hi'}]}]   # unchanged

Benefits Anthropic, AWS, and Google providers (all three call convert_mid_conversation_instructions). OpenAI handles mid-conversation system messages natively and isn't affected.

Test plan

Existing tests still pass.
Manual: a voice agent that calls session.generate_reply(instructions=...) as its first action no longer responds with "you didn't say anything" under Gemini.

`convert_mid_conversation_instructions` previously only rewrote system/developer messages to a user-role `<instructions>` turn AFTER the first user or assistant turn had landed (`seen_non_system`). On the very first `generate_reply(instructions=...)` of a session, the chat context contains only the agent's base prompt plus the freshly appended per-turn instructions, both system, with no user/assistant content yet. The condition fails, both stay `system`, and providers that require the conversation to end on a `user`/`tool` turn fall back to `inject_dummy_user_message` — a literal `"."` user message the model frequently answers with "you didn't say anything". Simpler rule: the first system message is the preamble, every subsequent system message is per-turn instructions and gets converted. Same template, same target role.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

devin-ai-integration

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-05-24T17:01:50Z

+            if item.type == "message" and item.role in ("system", "developer"):
+                first_system_seen = True


🟡 Mid-conversation system message not converted when no system message precedes it

The new first_system_seen flag only starts converting system messages after encountering the first system/developer message. If a ChatContext has no leading system message but has one mid-conversation (e.g., [user, system, assistant]), the old code would convert that system message to a user-role message (preserving its positional context), but the new code treats it as the "first" system message and keeps it as-is.

Trace through the regression scenario

With [user("hello"), system("be concise"), assistant("hi")]:

Old code: user → seen_non_system=True, system → converted to user role (position preserved), assistant → kept.

New code: user → kept (not system, first_system_seen stays False), system → falls to else branch since first_system_seen is False → sets first_system_seen=True, kept as-is. assistant → kept.

Downstream formatters (google.py:33-35, anthropic.py:36-38) extract all remaining system messages into a preamble system_messages list, so the mid-conversation system message loses its positional context entirely — it gets hoisted to the preamble instead of staying inline as a user-role message.

This affects any caller that passes a ChatContext without a leading system/developer message to Google, Anthropic, or AWS LLM plugins (which all call convert_mid_conversation_instructions). In the standard Agent flow this is unlikely because update_instructions(add_if_missing=True) always inserts a system message at index 0, but direct LLM usage with custom ChatContext objects can trigger it.

Prompt for agents

The root issue is that `first_system_seen` conflates two distinct concerns: (1) identifying the preamble system message to preserve, and (2) detecting that we are past the preamble. The old code used seen_non_system which correctly handled mid-conversation system messages even when no system message appeared at the start. A possible fix is to combine both signals: keep a system message as-is only if it is both the first system message AND no non-system item has been seen yet. For example, track both `first_system_seen` and `seen_non_system`, and only skip conversion when `not first_system_seen and not seen_non_system`. Alternatively, explicitly check that a system/developer message is at position 0 or part of the leading system block. The relevant function is `convert_mid_conversation_instructions` in `livekit-agents/livekit/agents/llm/_provider_format/utils.py`.

Was this helpful? React with 👍 or 👎 to provide feedback.

chenghao-mou requested a review from a team May 24, 2026 06:17

devin-ai-integration Bot reviewed May 24, 2026

View reviewed changes

fix(llm): inline type narrowing for ChatMessage

03b6ca4

devin-ai-integration Bot reviewed May 24, 2026

View reviewed changes

toubatbrian approved these changes May 24, 2026

View reviewed changes

davidzhao approved these changes May 24, 2026

View reviewed changes

theomonnom merged commit 1445a44 into main May 24, 2026
26 checks passed

theomonnom deleted the fix/convert-instructions-no-user-yet branch May 24, 2026 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): convert per-turn instructions on the very first turn too#5828

fix(llm): convert per-turn instructions on the very first turn too#5828
theomonnom merged 2 commits into
mainfrom
fix/convert-instructions-no-user-yet

theomonnom commented May 24, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if item.type == "message" and item.role in ("system", "developer"):
		first_system_seen = True

Conversation

theomonnom commented May 24, 2026

Summary

Fix

Smoke test

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants