fix: build judge input as string; strip legacy judge config messages by jsonbailey · Pull Request #165 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-05-05T19:40:26Z

Summary

String input instead of message list: Judge.evaluate() now calls runner.run(input_str) with a plain string ("MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n...") instead of constructing and passing a List[LDMessage].
Backwards-compatible legacy message stripping: A new _strip_legacy_judge_messages() helper transparently removes the old assistant/user template messages ({{message_history}} / {{response_to_evaluate}}) that appeared in older judge configs. System messages are never stripped.
Runner.run narrowed to str: The Runner protocol and all four runner implementations (OpenAIModelRunner, LangChainModelRunner, OpenAIAgentRunner, LangChainAgentRunner) now declare input: str. The _coerce_input list-handling path in the model runners has been removed.
messages guard removed: The early-exit guard that returned an error when messages was None or empty has been removed — configs without messages (new-style) are now handled gracefully.
chevron / _interpolate_message removed from judge: The Mustache interpolation code is gone from judge/__init__.py; chevron remains a dependency because it is still used in client.py.

Why

Passing a message list through the runner added unnecessary complexity and required all runners to handle both str and List[LDMessage] inputs. Moving to a single string interface simplifies the runner contract, aligns with the direction of new judge configs (system-message-only), and removes the Mustache templating requirement from the judge path.

Test plan

TestStripLegacyJudgeMessages — unit tests for the helper covering: strip assistant/user template messages, preserve system messages even with template vars, pass-through non-template messages, empty list, new-style system-only config
TestJudgeEvaluate::test_evaluate_passes_string_input_to_runner — verifies runner.run() receives a str, not a list
TestJudgeEvaluate::test_evaluate_string_input_format — verifies exact "MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n..." format
TestJudgeEvaluate::test_evaluate_legacy_config_strips_template_messages — verifies legacy configs still produce string input
TestJudgeEvaluate::test_evaluate_succeeds_when_messages_is_none — verifies no early-exit error for None messages

🤖 Generated with Claude Code

Note

Medium Risk
Medium risk because it narrows the Runner.run contract to str and changes how prompts/config messages are assembled, which can break callers that passed List[LDMessage] and alter judge/model prompting behavior.

Overview
Standardizes runner inputs to plain strings. The Runner protocol and OpenAI/LangChain model + agent runners now declare run(input: str, ...), removing support for passing List[LDMessage] and deleting the _coerce_input paths.

Moves config prompts into runner construction. OpenAIRunnerFactory/LangChainRunnerFactory now pass config.messages into their model runners, which prepend these config messages internally; ManagedModel.run() correspondingly passes only the user prompt string.

Simplifies judge prompting and adds legacy-config cleanup. Judge.evaluate() now builds a single formatted evaluation string (MESSAGE HISTORY / RESPONSE TO EVALUATE) instead of interpolating Mustache templates into message lists, and the client strips legacy non-system template messages via new _strip_legacy_judge_messages; tests were updated/added to cover the new behavior.

^{Reviewed by Cursor Bugbot for commit 133bc10. Bugbot is set up for automated code reviews on this repo. Configure here.}

…se_to_evaluate messages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 133bc10. Configure here.}

🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-server-sdk-ai: 0.19.0</summary> ## [0.19.0](launchdarkly-server-sdk-ai-0.18.0...launchdarkly-server-sdk-ai-0.19.0) (2026-05-05) ### ⚠ BREAKING CHANGES * StructuredResponse replaced by RunnerResult with new "parsed" property * AgentResult replaced by RunnerResult and Managed Result * Removed ModelRunner and AgentRunner protocols * Removed invoke_method, invoke_structured_model from AIProvider base class. * ModelResponse was replaced by RunnerResult * Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() ([#148](#148)) * Swap track_metrics_of parameter order to match spec ([#144](#144)) ### Features * Add evaluations support to ManagedAgent.run() ([#153](#153)) ([442f46a](442f46a)) * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Add ManagedGraphResult, GraphMetricSummary, and AgentGraphRunnerResult types ([#151](#151)) ([301e24c](301e24c)) * Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() ([#148](#148)) ([88d4ddc](88d4ddc)) * Add root-level tools map with customParameters to AI Config types ([#141](#141)) ([f17c535](f17c535)) * bake sampling_rate into Judge at construction; simplify Evaluator to List[Judge] ([#159](#159)) ([86c79e6](86c79e6)) * Update LangChain runners to implement Runner protocol returning RunnerResult ([#150](#150)) ([62a8e25](62a8e25)) ### Bug Fixes * Add runtime DeprecationWarnings to deprecated methods ([#145](#145)) ([2189b81](2189b81)) * AgentResult replaced by RunnerResult and Managed Result ([fbb0b4b](fbb0b4b)) * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) * Fall back to model.parameters.tools when root tools absent ([#146](#146)) ([2c30d75](2c30d75)) * Graph tracking refactor — ManagedAgentGraph drives tracking for new runner shape ([#154](#154)) ([20a5020](20a5020)) * ModelResponse was replaced by RunnerResult ([fbb0b4b](fbb0b4b)) * parse model.parameters.tools as list ([#160](#160)) ([fb53e99](fb53e99)) * reference correct PyPI package names in provider load error messages ([#164](#164)) ([48761c9](48761c9)) * Removed invoke_method, invoke_structured_model from AIProvider base class. ([fbb0b4b](fbb0b4b)) * Removed ModelRunner and AgentRunner protocols ([fbb0b4b](fbb0b4b)) * Replace done_callback with coroutine chain for judge tracking ([#147](#147)) ([1e1f36b](1e1f36b)) * StructuredResponse replaced by RunnerResult with new "parsed" property ([fbb0b4b](fbb0b4b)) * Swap track_metrics_of parameter order to match spec ([#144](#144)) ([53db736](53db736)) </details> <details><summary>launchdarkly-server-sdk-ai-langchain: 0.6.0</summary> ## [0.6.0](launchdarkly-server-sdk-ai-langchain-0.5.0...launchdarkly-server-sdk-ai-langchain-0.6.0) (2026-05-05) ### Features * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Migrate LangGraph runner to AgentGraphRunnerResult; clean up legacy shape detection ([#156](#156)) ([efa8e00](efa8e00)) * Support conversation history directly in AI Provider model runners ([#166](#166)) ([4bb3e78](4bb3e78)) * Update LangChain runners to implement Runner protocol returning RunnerResult ([#150](#150)) ([62a8e25](62a8e25)) ### Bug Fixes * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) </details> <details><summary>launchdarkly-server-sdk-ai-openai: 0.5.0</summary> ## [0.5.0](launchdarkly-server-sdk-ai-openai-0.4.0...launchdarkly-server-sdk-ai-openai-0.5.0) (2026-05-05) ### Features * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Support conversation history directly in AI Provider model runners ([#166](#166)) ([4bb3e78](4bb3e78)) * Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics ([#155](#155)) ([388b7af](388b7af)) * Update OpenAI runners to implement Runner protocol returning RunnerResult ([#149](#149)) ([382e662](382e662)) ### Bug Fixes * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  --- > [!NOTE] > **Medium Risk** > Primarily a release/version bump, but it publishes **breaking API changes** (move to unified `Runner.run()`/`RunnerResult` and removal of `invoke_*` methods), which can break downstream integrations. > > **Overview** > Cuts a new release across the core SDK and provider packages: `launchdarkly-server-sdk-ai` to `0.19.0`, LangChain provider to `0.6.0`, and OpenAI provider to `0.5.0`, updating the release manifest and package metadata accordingly. > > Changelogs document the shipped breaking API surface changes (notably removing `invoke_model()`/`invoke_structured_model()` in favor of `run(...)` and standardizing returns on `RunnerResult`) plus accompanying feature/fix entries; the core package version constants/docs (`__version__`, `PROVENANCE.md`) are updated to match. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a20d7a5. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey and others added 2 commits May 5, 2026 14:40

fix: build judge input as string; strip legacy message_history/respon…

775aaca

…se_to_evaluate messages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor: move legacy judge message stripping to client._judge_config()

efaa5b7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey marked this pull request as ready for review May 5, 2026 20:10

jsonbailey requested a review from a team as a code owner May 5, 2026 20:10

jsonbailey and others added 2 commits May 5, 2026 15:10

fix: runners prepend config messages; managed layer passes plain str …

1e8194f

…to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: update runner tests to pass str to run() instead of list[LDMessage]

133bc10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey mentioned this pull request May 5, 2026

fix: Prevent context attributes from influencing judge template parsing #129

Closed

3 tasks

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread packages/sdk/server-ai/src/ldai/managed_model.py

keelerm84 approved these changes May 5, 2026

View reviewed changes

jsonbailey merged commit e6942a6 into main May 5, 2026
45 checks passed

jsonbailey deleted the jb/fix-judge-string-input branch May 5, 2026 20:45

github-actions Bot mentioned this pull request May 5, 2026

chore: release main #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: build judge input as string; strip legacy judge config messages#165

fix: build judge input as string; strip legacy judge config messages#165
jsonbailey merged 4 commits intomainfrom
jb/fix-judge-string-input

jsonbailey commented May 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented May 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented May 5, 2026 •

edited by cursor Bot

Loading