fix: Make judge runners non-multi-turn by jsonbailey · Pull Request #1383 · launchdarkly/js-core

jsonbailey · 2026-05-14T18:13:38Z

Relates: https://github.com/launchdarkly/sdk-specs/pull/166

Problem

After #1371 moved conversation history into the provider model runners,
every successful run() call mutates the runner's internal _history /
_chatHistory. This breaks judges: a Judge shares one runner across
successive evaluate(...) calls, so each evaluation after the first
sees the previous evaluation's prompt and response in its history.
Concurrent evaluations on the same judge also race on the mutable state.

Fix

Add a multiTurn: boolean = true parameter on every provider model
runner constructor (OpenAIModelRunner, LangChainModelRunner,
VercelModelRunner) and thread it through each provider's
*RunnerFactory.createModel and the central RunnerFactory.createModel.

When multiTurn is true (the default), behavior is unchanged — the
runner persists the user prompt and the assistant reply to its history
on successful calls. When multiTurn is false, history is treated as
read-only: every run() starts from the seeded config messages plus the
current input.

LDAIClientImpl._createJudgeInstance constructs its runner with
multiTurn=false. createModel (chat) keeps the default.

Because multiTurn defaults to true, this is additive (not breaking).
It restores the pre-#1371 contract for judges and the new parameter is
opt-in for any other stateless use case.

Also in this PR

Judge.evaluateMessages now renders each message as <role>: <content>
joined by newlines instead of dropping the role and joining contents
with \r\n. The judge model now sees who said what in the message
history section.

Test plan

OpenAI / LangChain / Vercel runner tests cover multiTurn=false
(no history accumulation, internal history length stays at the
seeded length) and confirm the default still appends.
RunnerFactory.test.ts verifies multiTurn (defaulting to true)
is forwarded to the provider factory, and that multiTurn=false
passes through.
LDAIClientImpl.test.ts confirms createJudge invokes
RunnerFactory.createModel with multiTurn=false.
Judge.test.ts adds coverage for the role-prefixed
evaluateMessages format and verifies successive evaluate(...)
calls receive clean, independent input strings.

🤖 Generated with Claude Code

Note

Medium Risk
Touches runner construction and RunnerFactory.createModel signatures across all server AI providers; while defaults preserve existing chat behavior, incorrect threading of the new flag could change history semantics for callers (notably judges). Also changes judge prompt formatting, which can affect evaluation outputs.

Overview
Adds a multiTurn (default true) option to OpenAIModelRunner, LangChainModelRunner, and VercelModelRunner, and threads it through each provider *RunnerFactory.createModel, the base AIProvider.createModel, and the central RunnerFactory.createModel; when false, successful run() calls no longer persist user/assistant messages to internal history.

Updates judge initialization (LDAIClientImpl.createJudge/_createJudgeInstance) to create runners with multiTurn=false, preventing cross-evaluation history bleed/races. Judge.evaluateMessages now formats history as "<role>: <content>" joined by \n instead of dropping roles.

Extends unit tests across providers and SDK (RunnerFactory, LDAIClientImpl, Judge) to cover stateless behavior and default multiTurn=true forwarding.

^{Reviewed by Cursor Bugbot for commit ce73d8c. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add a multiTurn parameter (default true) on provider model runners and RunnerFactory.createModel. When false, the runner does not persist the user prompt and assistant reply back into its conversation history, so each run() call starts fresh from the seeded config messages. Judges now construct their underlying runner with multiTurn=false so successive evaluate() calls on a shared Judge instance do not see each other's prompts and responses. Without this, every evaluation after the first contaminated the judge model's input with prior conversations and concurrent evaluations raced on the mutable history. Also fix Judge.evaluateMessages to render messages as "<role>: <content>" joined by newlines, preserving speaker identity in the message history section the judge model receives. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-14T20:05:06Z

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 26208 bytes
Compressed size limit: 29000
Uncompressed size: 128789 bytes

github-actions · 2026-05-14T20:05:26Z

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31906 bytes
Compressed size limit: 34000
Uncompressed size: 113658 bytes

github-actions · 2026-05-14T20:05:27Z

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179498 bytes
Compressed size limit: 200000
Uncompressed size: 830837 bytes

github-actions · 2026-05-14T20:05:34Z

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 38487 bytes
Compressed size limit: 39000
Uncompressed size: 211236 bytes

🤖 I have created a release *beep* *boop* --- <details><summary>browser: 0.1.21</summary> ## [0.1.21](browser-v0.1.20...browser-v0.1.21) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from 4.6.5 to 4.7.0 </details> <details><summary>jest: 1.0.16</summary> ## [1.0.16](jest-v1.0.15...jest-v1.0.16) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/react-native-client-sdk bumped from ~10.17.4 to ~10.17.5 </details> <details><summary>js-client-sdk: 4.7.0</summary> ## [4.7.0](js-client-sdk-v4.6.5...js-client-sdk-v4.7.0) (2026-05-19) ### Features * wire registerDebugOverrides through client common ([#1368](#1368)) ([9011c2a](9011c2a)) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0 </details> <details><summary>js-client-sdk-common: 1.27.0</summary> ## [1.27.0](js-client-sdk-common-v1.26.3...js-client-sdk-common-v1.27.0) (2026-05-19) ### Features * wire registerDebugOverrides through client common ([#1368](#1368)) ([9011c2a](9011c2a)) </details> <details><summary>react-native-client-sdk: 10.17.5</summary> ## [10.17.5](react-native-client-sdk-v10.17.4...react-native-client-sdk-v10.17.5) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0 </details> <details><summary>react-sdk: 4.0.2</summary> ## [4.0.2](react-sdk-v4.0.1...react-sdk-v4.0.2) (2026-05-19) ### Dependencies * The following workspace dependencies were updated * dependencies * @launchdarkly/js-client-sdk bumped from ^4.6.5 to ^4.7.0 </details> <details><summary>server-sdk-ai: 1.0.0</summary> ## [1.0.0](server-sdk-ai-v0.20.0...server-sdk-ai-v1.0.0) (2026-05-19) ### ⚠ BREAKING CHANGES * Remove bedrock-specific tracker method ([#1385](#1385)) * Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead * Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead * Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead * Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead * Remove `ChatResponse` type and the `api/chat` module — use `RunnerResult` from `api/model` instead * Change `Judge.evaluateMessages` parameter type from `ChatResponse` to `RunnerResult` (method retained per AI SDK spec Requirement 1.1.3) * Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig` and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular) instead * Remove `LDAIConfigTracker.trackOpenAIMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-openai` instead * Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-vercel` instead * Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-openai` instead * Remove `createVercelAISDKTokenUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel` instead * Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead ### Features * Change `Judge.evaluateMessages` parameter type from `ChatResponse` to `RunnerResult` (method retained per AI SDK spec Requirement 1.1.3) ([86951b0](86951b0)) * Remove `ChatResponse` type and the `api/chat` module — use `RunnerResult` from `api/model` instead ([86951b0](86951b0)) * Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-openai` instead ([86951b0](86951b0)) * Remove `createVercelAISDKTokenUsage` helper — use `getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel` instead ([86951b0](86951b0)) * Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig` and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular) instead ([86951b0](86951b0)) * Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead ([86951b0](86951b0)) * Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead ([86951b0](86951b0)) * Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead ([86951b0](86951b0)) * Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead ([86951b0](86951b0)) * Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead ([86951b0](86951b0)) * Remove `LDAIConfigTracker.trackOpenAIMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-openai` instead ([86951b0](86951b0)) * Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use `tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from `@launchdarkly/server-sdk-ai-vercel` instead ([86951b0](86951b0)) * Remove bedrock-specific tracker method ([#1385](#1385)) ([f7dbee8](f7dbee8)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) * Move ManagedAgentGraph alongside other managed types ([#1384](#1384)) ([22dd76d](22dd76d)) </details> <details><summary>server-sdk-ai-langchain: 0.8.0</summary> ## [0.8.0](server-sdk-ai-langchain-v0.7.0...server-sdk-ai-langchain-v0.8.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> <details><summary>server-sdk-ai-openai: 0.7.0</summary> ## [0.7.0](server-sdk-ai-openai-v0.6.0...server-sdk-ai-openai-v0.7.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> <details><summary>server-sdk-ai-vercel: 0.7.0</summary> ## [0.7.0](server-sdk-ai-vercel-v0.6.0...server-sdk-ai-vercel-v0.7.0) (2026-05-19) ### Features * Support conversation history directly in AI Provider model runners ([#1371](#1371)) ([b246631](b246631)) ### Bug Fixes * Make judge runners non-multi-turn ([#1383](#1383)) ([3d8f488](3d8f488)) ### Dependencies * The following workspace dependencies were updated * devDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 * peerDependencies * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0 </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  --- > [!NOTE] > **Medium Risk** > Primarily a release/metadata PR (version bumps and changelog updates), but it includes a `@launchdarkly/server-sdk-ai` major version bump to `1.0.0`, which signals breaking API changes for downstream consumers. > > **Overview** > **Release-please version rollup.** Updates `.release-please-manifest.json`, package `version` fields, and associated `CHANGELOG.md` entries across the monorepo. > > Notable bumps include `@launchdarkly/server-sdk-ai` to **`1.0.0`** (breaking-change release per changelog) and propagation of dependency bumps (`@launchdarkly/js-client-sdk-common` to `1.27.0`, browser SDK to `4.7.0`, React Native to `10.17.5`, React SDK to `4.0.2`, and AI provider packages to `0.7.x/0.8.0`), along with updating embedded SDK/wrapper version strings (e.g., `BrowserInfo`, `PlatformInfo`, `LDReactClient`). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit bebd031. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Base automatically changed from jb/message-history-in-providers to main May 14, 2026 20:00

jsonbailey force-pushed the jb/aic-2544/judge-stateless-runner branch from ca4aacd to ce73d8c Compare May 14, 2026 20:03

andrewklatzke approved these changes May 14, 2026

View reviewed changes

jsonbailey marked this pull request as ready for review May 14, 2026 20:58

jsonbailey requested a review from a team as a code owner May 14, 2026 20:58

jsonbailey merged commit 3d8f488 into main May 14, 2026
43 checks passed

jsonbailey deleted the jb/aic-2544/judge-stateless-runner branch May 14, 2026 22:41

github-actions Bot mentioned this pull request May 14, 2026

chore: release main #1372

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Make judge runners non-multi-turn#1383

fix: Make judge runners non-multi-turn#1383
jsonbailey merged 1 commit into
mainfrom
jb/aic-2544/judge-stateless-runner

jsonbailey commented May 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented May 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Also in this PR

Test plan

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented May 14, 2026 •

edited by cursor Bot

Loading