Skip to content

fix: Make judge runners non-multi-turn#1383

Merged
jsonbailey merged 1 commit into
mainfrom
jb/aic-2544/judge-stateless-runner
May 14, 2026
Merged

fix: Make judge runners non-multi-turn#1383
jsonbailey merged 1 commit into
mainfrom
jb/aic-2544/judge-stateless-runner

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented May 14, 2026

AIC-2544

Relates: https://github.com/launchdarkly/sdk-specs/pull/166

Problem

After #1371 moved conversation history into the provider model runners,
every successful run() call mutates the runner's internal _history /
_chatHistory. This breaks judges: a Judge shares one runner across
successive evaluate(...) calls, so each evaluation after the first
sees the previous evaluation's prompt and response in its history.
Concurrent evaluations on the same judge also race on the mutable state.

Fix

Add a multiTurn: boolean = true parameter on every provider model
runner constructor (OpenAIModelRunner, LangChainModelRunner,
VercelModelRunner) and thread it through each provider's
*RunnerFactory.createModel and the central RunnerFactory.createModel.

When multiTurn is true (the default), behavior is unchanged — the
runner persists the user prompt and the assistant reply to its history
on successful calls. When multiTurn is false, history is treated as
read-only: every run() starts from the seeded config messages plus the
current input.

LDAIClientImpl._createJudgeInstance constructs its runner with
multiTurn=false. createModel (chat) keeps the default.

Because multiTurn defaults to true, this is additive (not breaking).
It restores the pre-#1371 contract for judges and the new parameter is
opt-in for any other stateless use case.

Also in this PR

Judge.evaluateMessages now renders each message as <role>: <content>
joined by newlines instead of dropping the role and joining contents
with \r\n. The judge model now sees who said what in the message
history section.

Test plan

  • OpenAI / LangChain / Vercel runner tests cover multiTurn=false
    (no history accumulation, internal history length stays at the
    seeded length) and confirm the default still appends.
  • RunnerFactory.test.ts verifies multiTurn (defaulting to true)
    is forwarded to the provider factory, and that multiTurn=false
    passes through.
  • LDAIClientImpl.test.ts confirms createJudge invokes
    RunnerFactory.createModel with multiTurn=false.
  • Judge.test.ts adds coverage for the role-prefixed
    evaluateMessages format and verifies successive evaluate(...)
    calls receive clean, independent input strings.

🤖 Generated with Claude Code


Note

Medium Risk
Touches runner construction and RunnerFactory.createModel signatures across all server AI providers; while defaults preserve existing chat behavior, incorrect threading of the new flag could change history semantics for callers (notably judges). Also changes judge prompt formatting, which can affect evaluation outputs.

Overview
Adds a multiTurn (default true) option to OpenAIModelRunner, LangChainModelRunner, and VercelModelRunner, and threads it through each provider *RunnerFactory.createModel, the base AIProvider.createModel, and the central RunnerFactory.createModel; when false, successful run() calls no longer persist user/assistant messages to internal history.

Updates judge initialization (LDAIClientImpl.createJudge/_createJudgeInstance) to create runners with multiTurn=false, preventing cross-evaluation history bleed/races. Judge.evaluateMessages now formats history as "<role>: <content>" joined by \n instead of dropping roles.

Extends unit tests across providers and SDK (RunnerFactory, LDAIClientImpl, Judge) to cover stateless behavior and default multiTurn=true forwarding.

Reviewed by Cursor Bugbot for commit ce73d8c. Bugbot is set up for automated code reviews on this repo. Configure here.

Base automatically changed from jb/message-history-in-providers to main May 14, 2026 20:00
Add a multiTurn parameter (default true) on provider model runners and
RunnerFactory.createModel. When false, the runner does not persist the
user prompt and assistant reply back into its conversation history, so
each run() call starts fresh from the seeded config messages.

Judges now construct their underlying runner with multiTurn=false so
successive evaluate() calls on a shared Judge instance do not see each
other's prompts and responses. Without this, every evaluation after the
first contaminated the judge model's input with prior conversations and
concurrent evaluations raced on the mutable history.

Also fix Judge.evaluateMessages to render messages as "<role>: <content>"
joined by newlines, preserving speaker identity in the message history
section the judge model receives.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jsonbailey jsonbailey force-pushed the jb/aic-2544/judge-stateless-runner branch from ca4aacd to ce73d8c Compare May 14, 2026 20:03
@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 26208 bytes
Compressed size limit: 29000
Uncompressed size: 128789 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31906 bytes
Compressed size limit: 34000
Uncompressed size: 113658 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179498 bytes
Compressed size limit: 200000
Uncompressed size: 830837 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 38487 bytes
Compressed size limit: 39000
Uncompressed size: 211236 bytes

@jsonbailey jsonbailey marked this pull request as ready for review May 14, 2026 20:58
@jsonbailey jsonbailey requested a review from a team as a code owner May 14, 2026 20:58
@jsonbailey jsonbailey merged commit 3d8f488 into main May 14, 2026
43 checks passed
@jsonbailey jsonbailey deleted the jb/aic-2544/judge-stateless-runner branch May 14, 2026 22:41
@github-actions github-actions Bot mentioned this pull request May 14, 2026
jsonbailey pushed a commit that referenced this pull request May 19, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>browser: 0.1.21</summary>

##
[0.1.21](browser-v0.1.20...browser-v0.1.21)
(2026-05-19)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk bumped from 4.6.5 to 4.7.0
</details>

<details><summary>jest: 1.0.16</summary>

##
[1.0.16](jest-v1.0.15...jest-v1.0.16)
(2026-05-19)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
* @launchdarkly/react-native-client-sdk bumped from ~10.17.4 to ~10.17.5
</details>

<details><summary>js-client-sdk: 4.7.0</summary>

##
[4.7.0](js-client-sdk-v4.6.5...js-client-sdk-v4.7.0)
(2026-05-19)


### Features

* wire registerDebugOverrides through client common
([#1368](#1368))
([9011c2a](9011c2a))


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0
</details>

<details><summary>js-client-sdk-common: 1.27.0</summary>

##
[1.27.0](js-client-sdk-common-v1.26.3...js-client-sdk-common-v1.27.0)
(2026-05-19)


### Features

* wire registerDebugOverrides through client common
([#1368](#1368))
([9011c2a](9011c2a))
</details>

<details><summary>react-native-client-sdk: 10.17.5</summary>

##
[10.17.5](react-native-client-sdk-v10.17.4...react-native-client-sdk-v10.17.5)
(2026-05-19)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk-common bumped from 1.26.3 to 1.27.0
</details>

<details><summary>react-sdk: 4.0.2</summary>

##
[4.0.2](react-sdk-v4.0.1...react-sdk-v4.0.2)
(2026-05-19)


### Dependencies

* The following workspace dependencies were updated
  * dependencies
    * @launchdarkly/js-client-sdk bumped from ^4.6.5 to ^4.7.0
</details>

<details><summary>server-sdk-ai: 1.0.0</summary>

##
[1.0.0](server-sdk-ai-v0.20.0...server-sdk-ai-v1.0.0)
(2026-05-19)


### ⚠ BREAKING CHANGES

* Remove bedrock-specific tracker method
([#1385](#1385))
* Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead
* Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead
* Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead
* Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead
* Remove `ChatResponse` type and the `api/chat` module — use
`RunnerResult` from `api/model` instead
* Change `Judge.evaluateMessages` parameter type from `ChatResponse` to
`RunnerResult` (method retained per AI SDK spec Requirement 1.1.3)
* Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig`
and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular)
instead
* Remove `LDAIConfigTracker.trackOpenAIMetrics` — use
`tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from
`@launchdarkly/server-sdk-ai-openai` instead
* Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use
`tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from
`@launchdarkly/server-sdk-ai-vercel` instead
* Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse`
from `@launchdarkly/server-sdk-ai-openai` instead
* Remove `createVercelAISDKTokenUsage` helper — use
`getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel`
instead
* Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead

### Features

* Change `Judge.evaluateMessages` parameter type from `ChatResponse` to
`RunnerResult` (method retained per AI SDK spec Requirement 1.1.3)
([86951b0](86951b0))
* Remove `ChatResponse` type and the `api/chat` module — use
`RunnerResult` from `api/model` instead
([86951b0](86951b0))
* Remove `createOpenAiUsage` helper — use `getAIMetricsFromResponse`
from `@launchdarkly/server-sdk-ai-openai` instead
([86951b0](86951b0))
* Remove `createVercelAISDKTokenUsage` helper — use
`getAIMetricsFromResponse` from `@launchdarkly/server-sdk-ai-vercel`
instead
([86951b0](86951b0))
* Remove `evaluationMetricKeys` (plural) field from `LDAIJudgeConfig`
and `LDAIJudgeConfigDefault` — use `evaluationMetricKey` (singular)
instead
([86951b0](86951b0))
* Remove `LDAIClient.agent` — use `LDAIClient.agentConfig` instead
([86951b0](86951b0))
* Remove `LDAIClient.agents` — use `LDAIClient.agentConfigs` instead
([86951b0](86951b0))
* Remove `LDAIClient.config` — use `LDAIClient.completionConfig` instead
([86951b0](86951b0))
* Remove `LDAIClient.createChat` — use `LDAIClient.createModel` instead
([86951b0](86951b0))
* Remove `LDAIClient.initChat` — use `LDAIClient.createModel` instead
([86951b0](86951b0))
* Remove `LDAIConfigTracker.trackOpenAIMetrics` — use
`tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from
`@launchdarkly/server-sdk-ai-openai` instead
([86951b0](86951b0))
* Remove `LDAIConfigTracker.trackVercelAISDKGenerateTextMetrics` — use
`tracker.trackMetricsOf(getAIMetricsFromResponse, fn)` from
`@launchdarkly/server-sdk-ai-vercel` instead
([86951b0](86951b0))
* Remove bedrock-specific tracker method
([#1385](#1385))
([f7dbee8](f7dbee8))


### Bug Fixes

* Make judge runners non-multi-turn
([#1383](#1383))
([3d8f488](3d8f488))
* Move ManagedAgentGraph alongside other managed types
([#1384](#1384))
([22dd76d](22dd76d))
</details>

<details><summary>server-sdk-ai-langchain: 0.8.0</summary>

##
[0.8.0](server-sdk-ai-langchain-v0.7.0...server-sdk-ai-langchain-v0.8.0)
(2026-05-19)


### Features

* Support conversation history directly in AI Provider model runners
([#1371](#1371))
([b246631](b246631))


### Bug Fixes

* Make judge runners non-multi-turn
([#1383](#1383))
([3d8f488](3d8f488))


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
</details>

<details><summary>server-sdk-ai-openai: 0.7.0</summary>

##
[0.7.0](server-sdk-ai-openai-v0.6.0...server-sdk-ai-openai-v0.7.0)
(2026-05-19)


### Features

* Support conversation history directly in AI Provider model runners
([#1371](#1371))
([b246631](b246631))


### Bug Fixes

* Make judge runners non-multi-turn
([#1383](#1383))
([3d8f488](3d8f488))


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
</details>

<details><summary>server-sdk-ai-vercel: 0.7.0</summary>

##
[0.7.0](server-sdk-ai-vercel-v0.6.0...server-sdk-ai-vercel-v0.7.0)
(2026-05-19)


### Features

* Support conversation history directly in AI Provider model runners
([#1371](#1371))
([b246631](b246631))


### Bug Fixes

* Make judge runners non-multi-turn
([#1383](#1383))
([3d8f488](3d8f488))


### Dependencies

* The following workspace dependencies were updated
  * devDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
  * peerDependencies
    * @launchdarkly/server-sdk-ai bumped from ^0.20.0 to ^1.0.0
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Primarily a release/metadata PR (version bumps and changelog updates),
but it includes a `@launchdarkly/server-sdk-ai` major version bump to
`1.0.0`, which signals breaking API changes for downstream consumers.
> 
> **Overview**
> **Release-please version rollup.** Updates
`.release-please-manifest.json`, package `version` fields, and
associated `CHANGELOG.md` entries across the monorepo.
> 
> Notable bumps include `@launchdarkly/server-sdk-ai` to **`1.0.0`**
(breaking-change release per changelog) and propagation of dependency
bumps (`@launchdarkly/js-client-sdk-common` to `1.27.0`, browser SDK to
`4.7.0`, React Native to `10.17.5`, React SDK to `4.0.2`, and AI
provider packages to `0.7.x/0.8.0`), along with updating embedded
SDK/wrapper version strings (e.g., `BrowserInfo`, `PlatformInfo`,
`LDReactClient`).
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
bebd031. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants