fix: Make judge runners non-multi-turn#1383
Merged
Merged
Conversation
Add a multiTurn parameter (default true) on provider model runners and RunnerFactory.createModel. When false, the runner does not persist the user prompt and assistant reply back into its conversation history, so each run() call starts fresh from the seeded config messages. Judges now construct their underlying runner with multiTurn=false so successive evaluate() calls on a shared Judge instance do not see each other's prompts and responses. Without this, every evaluation after the first contaminated the judge model's input with prior conversations and concurrent evaluations raced on the mutable history. Also fix Judge.evaluateMessages to render messages as "<role>: <content>" joined by newlines, preserving speaker identity in the message history section the judge model receives. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ca4aacd to
ce73d8c
Compare
Contributor
|
@launchdarkly/js-sdk-common size report |
Contributor
|
@launchdarkly/js-client-sdk size report |
Contributor
|
@launchdarkly/browser size report |
Contributor
|
@launchdarkly/js-client-sdk-common size report |
andrewklatzke
approved these changes
May 14, 2026
Merged
jsonbailey
pushed a commit
that referenced
this pull request
May 19, 2026
🤖 I have created a release *beep* *boop* --- <details><summary>browser: 0.1.21</summary> ## [0.1.21](browser-v0.1.20...browser-v0.1.21) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from 4.6.5 to 4.7.0 </details> <details><summary>jest: 1.0.16</summary> ## [1.0.16](jest-v1.0.15...jest-v1.0.16) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/react-native-client-sdk bumped from ~10.17.4 to ~10.17.5 </details> <details><summary>js-client-sdk: 4.7.0</summary> ## [4.7.0](js-client-sdk-v4.6.5...js-client-sdk-v4.7.0) (2026-05-19) ### Features * wire registerDebugOverrides through client common ([#1368](#1368)) ([9011c2a](9011c2a)) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0 </details> <details><summary>js-client-sdk-common: 1.27.0</summary> ## [1.27.0](js-client-sdk-common-v1.26.3...js-client-sdk-common-v1.27.0) (2026-05-19) ### Features * wire registerDebugOverrides through client common ([#1368](#1368)) ([9011c2a](9011c2a)) </details> <details><summary>react-native-client-sdk: 10.17.5</summary> ## [10.17.5](react-native-client-sdk-v10.17.4...react-native-client-sdk-v10.17.5) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0 </details> <details><summary>react-sdk: 4.0.2</summary> ## [4.0.2](react-sdk-v4.0.1...react-sdk-v4.0.2) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from ^4.6.5 to ^4.7.0 </details> <details><summary>server-sdk-ai: 1.0.0</summary> ## [1.0.0](server-sdk-ai-v0.20.0...server-sdk-ai-v1.0.0) (2026-05-19) ### ⚠ BREAKING CHANGES * Remove bedrock-specific tracker method ([#1385](#1385)) * Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead * Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead * Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead * Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead * Remove `ChatResponse` type and the `api/chat` module — use `RunnerResult` from `api/model` instead * Change `Judge.evaluateMessages` parameter type from `ChatResponse` to `RunnerResult` (method retained per AI SDK spec Requirement 1.1.3) * Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig` and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular) instead * Remove `LDAIConfigTracker.trackOpenAIMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-openai` instead * Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-vercel` instead * Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-openai` instead * Remove `createVercelAISDKTokenUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel` instead * Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead ### Features * Change `Judge.evaluateMessages` parameter type from `ChatResponse` to `RunnerResult` (method retained per AI SDK spec Requirement 1.1.3) ([86951b0](86951b0)) * Remove `ChatResponse` type and the `api/chat` module — use `RunnerResult` from `api/model` instead ([86951b0](86951b0)) * Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-openai` instead ([86951b0](86951b0)) * Remove `createVercelAISDKTokenUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel` instead ([86951b0](86951b0)) * Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig` and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular) instead ([86951b0](86951b0)) * Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead ([86951b0](86951b0)) * Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead ([86951b0](86951b0)) * Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead ([86951b0](86951b0)) * Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead ([86951b0](86951b0)) * Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead ([86951b0](86951b0)) * Remove `LDAIConfigTracker.trackOpenAIMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-openai` instead ([86951b0](86951b0)) * Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-vercel` instead ([86951b0](86951b0)) * Remove bedrock-specific tracker method ([#1385](#1385)) ([f7dbee8](f7dbee8)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) * Move ManagedAgentGraph alongside other managed types ([#1384](#1384)) ([22dd76d](22dd76d)) </details> <details><summary>server-sdk-ai-langchain: 0.8.0</summary> ## [0.8.0](server-sdk-ai-langchain-v0.7.0...server-sdk-ai-langchain-v0.8.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> <details><summary>server-sdk-ai-openai: 0.7.0</summary> ## [0.7.0](server-sdk-ai-openai-v0.6.0...server-sdk-ai-openai-v0.7.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> <details><summary>server-sdk-ai-vercel: 0.7.0</summary> ## [0.7.0](server-sdk-ai-vercel-v0.6.0...server-sdk-ai-vercel-v0.7.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Primarily a release/metadata PR (version bumps and changelog updates), but it includes a `@launchdarkly/server-sdk-ai` major version bump to `1.0.0`, which signals breaking API changes for downstream consumers. > > **Overview** > **Release-please version rollup.** Updates `.release-please-manifest.json`, package `version` fields, and associated `CHANGELOG.md` entries across the monorepo. > > Notable bumps include `@launchdarkly/server-sdk-ai` to **`1.0.0`** (breaking-change release per changelog) and propagation of dependency bumps (`@launchdarkly/js-client-sdk-common` to `1.27.0`, browser SDK to `4.7.0`, React Native to `10.17.5`, React SDK to `4.0.2`, and AI provider packages to `0.7.x/0.8.0`), along with updating embedded SDK/wrapper version strings (e.g., `BrowserInfo`, `PlatformInfo`, `LDReactClient`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit bebd031. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AIC-2544
Relates: https://github.com/launchdarkly/sdk-specs/pull/166
Problem
After #1371 moved conversation history into the provider model runners,
every successful
run()call mutates the runner's internal_history/_chatHistory. This breaks judges: aJudgeshares one runner acrosssuccessive
evaluate(...)calls, so each evaluation after the firstsees the previous evaluation's prompt and response in its history.
Concurrent evaluations on the same judge also race on the mutable state.
Fix
Add a
multiTurn: boolean = trueparameter on every provider modelrunner constructor (
OpenAIModelRunner,LangChainModelRunner,VercelModelRunner) and thread it through each provider's*RunnerFactory.createModeland the centralRunnerFactory.createModel.When
multiTurnistrue(the default), behavior is unchanged — therunner persists the user prompt and the assistant reply to its history
on successful calls. When
multiTurnisfalse, history is treated asread-only: every
run()starts from the seeded config messages plus thecurrent input.
LDAIClientImpl._createJudgeInstanceconstructs its runner withmultiTurn=false.createModel(chat) keeps the default.Because
multiTurndefaults totrue, this is additive (not breaking).It restores the pre-#1371 contract for judges and the new parameter is
opt-in for any other stateless use case.
Also in this PR
Judge.evaluateMessagesnow renders each message as<role>: <content>joined by newlines instead of dropping the role and joining contents
with
\r\n. The judge model now sees who said what in the messagehistory section.
Test plan
multiTurn=false(no history accumulation, internal history length stays at the
seeded length) and confirm the default still appends.
RunnerFactory.test.tsverifiesmultiTurn(defaulting totrue)is forwarded to the provider factory, and that
multiTurn=falsepasses through.
LDAIClientImpl.test.tsconfirmscreateJudgeinvokesRunnerFactory.createModelwithmultiTurn=false.Judge.test.tsadds coverage for the role-prefixedevaluateMessagesformat and verifies successiveevaluate(...)calls receive clean, independent input strings.
🤖 Generated with Claude Code
Note
Medium Risk
Touches runner construction and
RunnerFactory.createModelsignatures across all server AI providers; while defaults preserve existing chat behavior, incorrect threading of the new flag could change history semantics for callers (notably judges). Also changes judge prompt formatting, which can affect evaluation outputs.Overview
Adds a
multiTurn(defaulttrue) option toOpenAIModelRunner,LangChainModelRunner, andVercelModelRunner, and threads it through each provider*RunnerFactory.createModel, the baseAIProvider.createModel, and the centralRunnerFactory.createModel; whenfalse, successfulrun()calls no longer persist user/assistant messages to internal history.Updates judge initialization (
LDAIClientImpl.createJudge/_createJudgeInstance) to create runners withmultiTurn=false, preventing cross-evaluation history bleed/races.Judge.evaluateMessagesnow formats history as"<role>: <content>"joined by\ninstead of dropping roles.Extends unit tests across providers and SDK (
RunnerFactory,LDAIClientImpl,Judge) to cover stateless behavior and defaultmultiTurn=trueforwarding.Reviewed by Cursor Bugbot for commit ce73d8c. Bugbot is set up for automated code reviews on this repo. Configure here.