Skip to content

fix: build judge input as string; strip legacy judge config messages#165

Merged
jsonbailey merged 4 commits intomainfrom
jb/fix-judge-string-input
May 5, 2026
Merged

fix: build judge input as string; strip legacy judge config messages#165
jsonbailey merged 4 commits intomainfrom
jb/fix-judge-string-input

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented May 5, 2026

Summary

  • String input instead of message list: Judge.evaluate() now calls runner.run(input_str) with a plain string ("MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n...") instead of constructing and passing a List[LDMessage].
  • Backwards-compatible legacy message stripping: A new _strip_legacy_judge_messages() helper transparently removes the old assistant/user template messages ({{message_history}} / {{response_to_evaluate}}) that appeared in older judge configs. System messages are never stripped.
  • Runner.run narrowed to str: The Runner protocol and all four runner implementations (OpenAIModelRunner, LangChainModelRunner, OpenAIAgentRunner, LangChainAgentRunner) now declare input: str. The _coerce_input list-handling path in the model runners has been removed.
  • messages guard removed: The early-exit guard that returned an error when messages was None or empty has been removed — configs without messages (new-style) are now handled gracefully.
  • chevron / _interpolate_message removed from judge: The Mustache interpolation code is gone from judge/__init__.py; chevron remains a dependency because it is still used in client.py.

Why

Passing a message list through the runner added unnecessary complexity and required all runners to handle both str and List[LDMessage] inputs. Moving to a single string interface simplifies the runner contract, aligns with the direction of new judge configs (system-message-only), and removes the Mustache templating requirement from the judge path.

Test plan

  • TestStripLegacyJudgeMessages — unit tests for the helper covering: strip assistant/user template messages, preserve system messages even with template vars, pass-through non-template messages, empty list, new-style system-only config
  • TestJudgeEvaluate::test_evaluate_passes_string_input_to_runner — verifies runner.run() receives a str, not a list
  • TestJudgeEvaluate::test_evaluate_string_input_format — verifies exact "MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n..." format
  • TestJudgeEvaluate::test_evaluate_legacy_config_strips_template_messages — verifies legacy configs still produce string input
  • TestJudgeEvaluate::test_evaluate_succeeds_when_messages_is_none — verifies no early-exit error for None messages

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it narrows the Runner.run contract to str and changes how prompts/config messages are assembled, which can break callers that passed List[LDMessage] and alter judge/model prompting behavior.

Overview
Standardizes runner inputs to plain strings. The Runner protocol and OpenAI/LangChain model + agent runners now declare run(input: str, ...), removing support for passing List[LDMessage] and deleting the _coerce_input paths.

Moves config prompts into runner construction. OpenAIRunnerFactory/LangChainRunnerFactory now pass config.messages into their model runners, which prepend these config messages internally; ManagedModel.run() correspondingly passes only the user prompt string.

Simplifies judge prompting and adds legacy-config cleanup. Judge.evaluate() now builds a single formatted evaluation string (MESSAGE HISTORY / RESPONSE TO EVALUATE) instead of interpolating Mustache templates into message lists, and the client strips legacy non-system template messages via new _strip_legacy_judge_messages; tests were updated/added to cover the new behavior.

Reviewed by Cursor Bugbot for commit 133bc10. Bugbot is set up for automated code reviews on this repo. Configure here.

jsonbailey and others added 2 commits May 5, 2026 14:40
…se_to_evaluate messages

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review May 5, 2026 20:10
@jsonbailey jsonbailey requested a review from a team as a code owner May 5, 2026 20:10
jsonbailey and others added 2 commits May 5, 2026 15:10
…to run()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 133bc10. Configure here.

Comment thread packages/sdk/server-ai/src/ldai/managed_model.py
@jsonbailey jsonbailey merged commit e6942a6 into main May 5, 2026
45 checks passed
@jsonbailey jsonbailey deleted the jb/fix-judge-string-input branch May 5, 2026 20:45
@github-actions github-actions Bot mentioned this pull request May 5, 2026
jsonbailey added a commit that referenced this pull request May 6, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>launchdarkly-server-sdk-ai: 0.19.0</summary>

##
[0.19.0](launchdarkly-server-sdk-ai-0.18.0...launchdarkly-server-sdk-ai-0.19.0)
(2026-05-05)


### ⚠ BREAKING CHANGES

* StructuredResponse replaced by RunnerResult with new "parsed" property
* AgentResult replaced by RunnerResult and Managed Result
* Removed ModelRunner and AgentRunner protocols
* Removed invoke_method, invoke_structured_model from AIProvider base
class.
* ModelResponse was replaced by RunnerResult
* Add ManagedResult, RunnerResult, and Runner protocol; rename invoke()
to run()
([#148](#148))
* Swap track_metrics_of parameter order to match spec
([#144](#144))

### Features

* Add evaluations support to ManagedAgent.run()
([#153](#153))
([442f46a](442f46a))
* Add judge evaluation support to agent graphs
([#142](#142))
([3d5a6a9](3d5a6a9))
* Add ManagedGraphResult, GraphMetricSummary, and AgentGraphRunnerResult
types
([#151](#151))
([301e24c](301e24c))
* Add ManagedResult, RunnerResult, and Runner protocol; rename invoke()
to run()
([#148](#148))
([88d4ddc](88d4ddc))
* Add root-level tools map with customParameters to AI Config types
([#141](#141))
([f17c535](f17c535))
* bake sampling_rate into Judge at construction; simplify Evaluator to
List[Judge]
([#159](#159))
([86c79e6](86c79e6))
* Update LangChain runners to implement Runner protocol returning
RunnerResult
([#150](#150))
([62a8e25](62a8e25))


### Bug Fixes

* Add runtime DeprecationWarnings to deprecated methods
([#145](#145))
([2189b81](2189b81))
* AgentResult replaced by RunnerResult and Managed Result
([fbb0b4b](fbb0b4b))
* build judge input as string; strip legacy judge config messages
([#165](#165))
([e6942a6](e6942a6))
* Fall back to model.parameters.tools when root tools absent
([#146](#146))
([2c30d75](2c30d75))
* Graph tracking refactor — ManagedAgentGraph drives tracking for new
runner shape
([#154](#154))
([20a5020](20a5020))
* ModelResponse was replaced by RunnerResult
([fbb0b4b](fbb0b4b))
* parse model.parameters.tools as list
([#160](#160))
([fb53e99](fb53e99))
* reference correct PyPI package names in provider load error messages
([#164](#164))
([48761c9](48761c9))
* Removed invoke_method, invoke_structured_model from AIProvider base
class.
([fbb0b4b](fbb0b4b))
* Removed ModelRunner and AgentRunner protocols
([fbb0b4b](fbb0b4b))
* Replace done_callback with coroutine chain for judge tracking
([#147](#147))
([1e1f36b](1e1f36b))
* StructuredResponse replaced by RunnerResult with new "parsed" property
([fbb0b4b](fbb0b4b))
* Swap track_metrics_of parameter order to match spec
([#144](#144))
([53db736](53db736))
</details>

<details><summary>launchdarkly-server-sdk-ai-langchain: 0.6.0</summary>

##
[0.6.0](launchdarkly-server-sdk-ai-langchain-0.5.0...launchdarkly-server-sdk-ai-langchain-0.6.0)
(2026-05-05)


### Features

* Add judge evaluation support to agent graphs
([#142](#142))
([3d5a6a9](3d5a6a9))
* Migrate LangGraph runner to AgentGraphRunnerResult; clean up legacy
shape detection
([#156](#156))
([efa8e00](efa8e00))
* Support conversation history directly in AI Provider model runners
([#166](#166))
([4bb3e78](4bb3e78))
* Update LangChain runners to implement Runner protocol returning
RunnerResult
([#150](#150))
([62a8e25](62a8e25))


### Bug Fixes

* build judge input as string; strip legacy judge config messages
([#165](#165))
([e6942a6](e6942a6))
</details>

<details><summary>launchdarkly-server-sdk-ai-openai: 0.5.0</summary>

##
[0.5.0](launchdarkly-server-sdk-ai-openai-0.4.0...launchdarkly-server-sdk-ai-openai-0.5.0)
(2026-05-05)


### Features

* Add judge evaluation support to agent graphs
([#142](#142))
([3d5a6a9](3d5a6a9))
* Support conversation history directly in AI Provider model runners
([#166](#166))
([4bb3e78](4bb3e78))
* Update OpenAI graph runner to return AgentGraphRunnerResult with
GraphMetrics
([#155](#155))
([388b7af](388b7af))
* Update OpenAI runners to implement Runner protocol returning
RunnerResult
([#149](#149))
([382e662](382e662))


### Bug Fixes

* build judge input as string; strip legacy judge config messages
([#165](#165))
([e6942a6](e6942a6))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Primarily a release/version bump, but it publishes **breaking API
changes** (move to unified `Runner.run()`/`RunnerResult` and removal of
`invoke_*` methods), which can break downstream integrations.
> 
> **Overview**
> Cuts a new release across the core SDK and provider packages:
`launchdarkly-server-sdk-ai` to `0.19.0`, LangChain provider to `0.6.0`,
and OpenAI provider to `0.5.0`, updating the release manifest and
package metadata accordingly.
> 
> Changelogs document the shipped breaking API surface changes (notably
removing `invoke_model()`/`invoke_structured_model()` in favor of
`run(...)` and standardizing returns on `RunnerResult`) plus
accompanying feature/fix entries; the core package version
constants/docs (`__version__`, `PROVENANCE.md`) are updated to match.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
a20d7a5. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: jsonbailey <jbailey@launchdarkly.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants