Skip to content

fix(llm): convert per-turn instructions on the very first turn too#5828

Merged
theomonnom merged 2 commits into
mainfrom
fix/convert-instructions-no-user-yet
May 24, 2026
Merged

fix(llm): convert per-turn instructions on the very first turn too#5828
theomonnom merged 2 commits into
mainfrom
fix/convert-instructions-no-user-yet

Conversation

@theomonnom
Copy link
Copy Markdown
Member

Summary

convert_mid_conversation_instructions only rewrote system messages to a user-role <instructions> turn after the conversation had already seen a user/assistant turn (the seen_non_system flag).

On the very first generate_reply(instructions=...) of a session, the chat context contains:

  • system: agent's base prompt
  • system: per-turn instructions just appended by generate_reply

…and no user/assistant turn yet. The condition fails, both stay system, and the providers that require the conversation to end on a user/tool turn (Gemini, Anthropic, AWS) fall back to inject_dummy_user_message, which appends a literal {"role": "user", "parts": [{"text": "."}]}. The model β€” Gemini especially β€” sees the lone . and responds with things like "You haven't said anything yet, did you mean to ask something?"

Fix

Drop the seen_non_system tracking and apply a single rule: the first system/developer message is the preamble, every subsequent system/developer message is per-turn instructions and gets converted using the existing role + template.

This keeps:

  • the "only one base system prompt, nothing else" path unchanged (still falls back to the dummy ".", which is the intentional behaviour when there is genuinely nothing for the model to respond to);
  • the existing mid-conversation path unchanged (per-turn instructions added after the user has spoken are still rewritten);

…and adds the new case: base + per-turn before any user turn now also gets converted, so providers never have to inject the dummy.

Smoke test

A: only base system
  google: [{'role': 'user', 'parts': [{'text': '.'}]}]   # unchanged

B: base + per-turn, no user yet
  google: [{'role': 'user', 'parts': [{'text': '<instructions>\nSpeak first.\n</instructions>'}]}]   # was '.'

C: mid-conv per-turn (system after user/assistant)
  google: [...user, model, user(<instructions>)]   # unchanged

D: base + user
  google: [{'role': 'user', 'parts': [{'text': 'hi'}]}]   # unchanged

Benefits Anthropic, AWS, and Google providers (all three call convert_mid_conversation_instructions). OpenAI handles mid-conversation system messages natively and isn't affected.

Test plan

  • Existing tests still pass.
  • Manual: a voice agent that calls session.generate_reply(instructions=...) as its first action no longer responds with "you didn't say anything" under Gemini.

`convert_mid_conversation_instructions` previously only rewrote
system/developer messages to a user-role `<instructions>` turn AFTER
the first user or assistant turn had landed (`seen_non_system`). On
the very first `generate_reply(instructions=...)` of a session, the
chat context contains only the agent's base prompt plus the freshly
appended per-turn instructions, both system, with no user/assistant
content yet. The condition fails, both stay `system`, and providers
that require the conversation to end on a `user`/`tool` turn fall
back to `inject_dummy_user_message` β€” a literal `"."` user message
the model frequently answers with "you didn't say anything".

Simpler rule: the first system message is the preamble, every
subsequent system message is per-turn instructions and gets
converted. Same template, same target role.
@chenghao-mou chenghao-mou requested a review from a team May 24, 2026 06:17
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

βœ… Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment on lines +49 to +50
if item.type == "message" and item.role in ("system", "developer"):
first_system_seen = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 Mid-conversation system message not converted when no system message precedes it

The new first_system_seen flag only starts converting system messages after encountering the first system/developer message. If a ChatContext has no leading system message but has one mid-conversation (e.g., [user, system, assistant]), the old code would convert that system message to a user-role message (preserving its positional context), but the new code treats it as the "first" system message and keeps it as-is.

Trace through the regression scenario

With [user("hello"), system("be concise"), assistant("hi")]:

  • Old code: user β†’ seen_non_system=True, system β†’ converted to user role (position preserved), assistant β†’ kept.
  • New code: user β†’ kept (not system, first_system_seen stays False), system β†’ falls to else branch since first_system_seen is False β†’ sets first_system_seen=True, kept as-is. assistant β†’ kept.

Downstream formatters (google.py:33-35, anthropic.py:36-38) extract all remaining system messages into a preamble system_messages list, so the mid-conversation system message loses its positional context entirely β€” it gets hoisted to the preamble instead of staying inline as a user-role message.

This affects any caller that passes a ChatContext without a leading system/developer message to Google, Anthropic, or AWS LLM plugins (which all call convert_mid_conversation_instructions). In the standard Agent flow this is unlikely because update_instructions(add_if_missing=True) always inserts a system message at index 0, but direct LLM usage with custom ChatContext objects can trigger it.

Prompt for agents
The root issue is that `first_system_seen` conflates two distinct concerns: (1) identifying the preamble system message to preserve, and (2) detecting that we are past the preamble. The old code used seen_non_system which correctly handled mid-conversation system messages even when no system message appeared at the start. A possible fix is to combine both signals: keep a system message as-is only if it is both the first system message AND no non-system item has been seen yet. For example, track both `first_system_seen` and `seen_non_system`, and only skip conversion when `not first_system_seen and not seen_non_system`. Alternatively, explicitly check that a system/developer message is at position 0 or part of the leading system block. The relevant function is `convert_mid_conversation_instructions` in `livekit-agents/livekit/agents/llm/_provider_format/utils.py`.
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

@theomonnom theomonnom merged commit 1445a44 into main May 24, 2026
26 checks passed
@theomonnom theomonnom deleted the fix/convert-instructions-no-user-yet branch May 24, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants