fix: build judge input as string; strip legacy judge config messages#165
Merged
jsonbailey merged 4 commits intomainfrom May 5, 2026
Merged
fix: build judge input as string; strip legacy judge config messages#165jsonbailey merged 4 commits intomainfrom
jsonbailey merged 4 commits intomainfrom
Conversation
…se_to_evaluate messages Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3 tasks
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 133bc10. Configure here.
keelerm84
approved these changes
May 5, 2026
Merged
jsonbailey
added a commit
that referenced
this pull request
May 6, 2026
🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-server-sdk-ai: 0.19.0</summary> ## [0.19.0](launchdarkly-server-sdk-ai-0.18.0...launchdarkly-server-sdk-ai-0.19.0) (2026-05-05) ### ⚠ BREAKING CHANGES * StructuredResponse replaced by RunnerResult with new "parsed" property * AgentResult replaced by RunnerResult and Managed Result * Removed ModelRunner and AgentRunner protocols * Removed invoke_method, invoke_structured_model from AIProvider base class. * ModelResponse was replaced by RunnerResult * Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() ([#148](#148)) * Swap track_metrics_of parameter order to match spec ([#144](#144)) ### Features * Add evaluations support to ManagedAgent.run() ([#153](#153)) ([442f46a](442f46a)) * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Add ManagedGraphResult, GraphMetricSummary, and AgentGraphRunnerResult types ([#151](#151)) ([301e24c](301e24c)) * Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() ([#148](#148)) ([88d4ddc](88d4ddc)) * Add root-level tools map with customParameters to AI Config types ([#141](#141)) ([f17c535](f17c535)) * bake sampling_rate into Judge at construction; simplify Evaluator to List[Judge] ([#159](#159)) ([86c79e6](86c79e6)) * Update LangChain runners to implement Runner protocol returning RunnerResult ([#150](#150)) ([62a8e25](62a8e25)) ### Bug Fixes * Add runtime DeprecationWarnings to deprecated methods ([#145](#145)) ([2189b81](2189b81)) * AgentResult replaced by RunnerResult and Managed Result ([fbb0b4b](fbb0b4b)) * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) * Fall back to model.parameters.tools when root tools absent ([#146](#146)) ([2c30d75](2c30d75)) * Graph tracking refactor — ManagedAgentGraph drives tracking for new runner shape ([#154](#154)) ([20a5020](20a5020)) * ModelResponse was replaced by RunnerResult ([fbb0b4b](fbb0b4b)) * parse model.parameters.tools as list ([#160](#160)) ([fb53e99](fb53e99)) * reference correct PyPI package names in provider load error messages ([#164](#164)) ([48761c9](48761c9)) * Removed invoke_method, invoke_structured_model from AIProvider base class. ([fbb0b4b](fbb0b4b)) * Removed ModelRunner and AgentRunner protocols ([fbb0b4b](fbb0b4b)) * Replace done_callback with coroutine chain for judge tracking ([#147](#147)) ([1e1f36b](1e1f36b)) * StructuredResponse replaced by RunnerResult with new "parsed" property ([fbb0b4b](fbb0b4b)) * Swap track_metrics_of parameter order to match spec ([#144](#144)) ([53db736](53db736)) </details> <details><summary>launchdarkly-server-sdk-ai-langchain: 0.6.0</summary> ## [0.6.0](launchdarkly-server-sdk-ai-langchain-0.5.0...launchdarkly-server-sdk-ai-langchain-0.6.0) (2026-05-05) ### Features * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Migrate LangGraph runner to AgentGraphRunnerResult; clean up legacy shape detection ([#156](#156)) ([efa8e00](efa8e00)) * Support conversation history directly in AI Provider model runners ([#166](#166)) ([4bb3e78](4bb3e78)) * Update LangChain runners to implement Runner protocol returning RunnerResult ([#150](#150)) ([62a8e25](62a8e25)) ### Bug Fixes * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) </details> <details><summary>launchdarkly-server-sdk-ai-openai: 0.5.0</summary> ## [0.5.0](launchdarkly-server-sdk-ai-openai-0.4.0...launchdarkly-server-sdk-ai-openai-0.5.0) (2026-05-05) ### Features * Add judge evaluation support to agent graphs ([#142](#142)) ([3d5a6a9](3d5a6a9)) * Support conversation history directly in AI Provider model runners ([#166](#166)) ([4bb3e78](4bb3e78)) * Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics ([#155](#155)) ([388b7af](388b7af)) * Update OpenAI runners to implement Runner protocol returning RunnerResult ([#149](#149)) ([382e662](382e662)) ### Bug Fixes * build judge input as string; strip legacy judge config messages ([#165](#165)) ([e6942a6](e6942a6)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Primarily a release/version bump, but it publishes **breaking API changes** (move to unified `Runner.run()`/`RunnerResult` and removal of `invoke_*` methods), which can break downstream integrations. > > **Overview** > Cuts a new release across the core SDK and provider packages: `launchdarkly-server-sdk-ai` to `0.19.0`, LangChain provider to `0.6.0`, and OpenAI provider to `0.5.0`, updating the release manifest and package metadata accordingly. > > Changelogs document the shipped breaking API surface changes (notably removing `invoke_model()`/`invoke_structured_model()` in favor of `run(...)` and standardizing returns on `RunnerResult`) plus accompanying feature/fix entries; the core package version constants/docs (`__version__`, `PROVENANCE.md`) are updated to match. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a20d7a5. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Judge.evaluate()now callsrunner.run(input_str)with a plain string ("MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n...") instead of constructing and passing aList[LDMessage]._strip_legacy_judge_messages()helper transparently removes the old assistant/user template messages ({{message_history}}/{{response_to_evaluate}}) that appeared in older judge configs. System messages are never stripped.Runner.runnarrowed tostr: TheRunnerprotocol and all four runner implementations (OpenAIModelRunner,LangChainModelRunner,OpenAIAgentRunner,LangChainAgentRunner) now declareinput: str. The_coerce_inputlist-handling path in the model runners has been removed.messagesguard removed: The early-exit guard that returned an error whenmessageswasNoneor empty has been removed — configs without messages (new-style) are now handled gracefully.chevron/_interpolate_messageremoved from judge: The Mustache interpolation code is gone fromjudge/__init__.py;chevronremains a dependency because it is still used inclient.py.Why
Passing a message list through the runner added unnecessary complexity and required all runners to handle both
strandList[LDMessage]inputs. Moving to a single string interface simplifies the runner contract, aligns with the direction of new judge configs (system-message-only), and removes the Mustache templating requirement from the judge path.Test plan
TestStripLegacyJudgeMessages— unit tests for the helper covering: strip assistant/user template messages, preserve system messages even with template vars, pass-through non-template messages, empty list, new-style system-only configTestJudgeEvaluate::test_evaluate_passes_string_input_to_runner— verifiesrunner.run()receives astr, not a listTestJudgeEvaluate::test_evaluate_string_input_format— verifies exact"MESSAGE HISTORY:\n...\n\nRESPONSE TO EVALUATE:\n..."formatTestJudgeEvaluate::test_evaluate_legacy_config_strips_template_messages— verifies legacy configs still produce string inputTestJudgeEvaluate::test_evaluate_succeeds_when_messages_is_none— verifies no early-exit error forNonemessages🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because it narrows the
Runner.runcontract tostrand changes how prompts/config messages are assembled, which can break callers that passedList[LDMessage]and alter judge/model prompting behavior.Overview
Standardizes runner inputs to plain strings. The
Runnerprotocol and OpenAI/LangChain model + agent runners now declarerun(input: str, ...), removing support for passingList[LDMessage]and deleting the_coerce_inputpaths.Moves config prompts into runner construction.
OpenAIRunnerFactory/LangChainRunnerFactorynow passconfig.messagesinto their model runners, which prepend these config messages internally;ManagedModel.run()correspondingly passes only the user prompt string.Simplifies judge prompting and adds legacy-config cleanup.
Judge.evaluate()now builds a single formatted evaluation string (MESSAGE HISTORY/RESPONSE TO EVALUATE) instead of interpolating Mustache templates into message lists, and the client strips legacy non-system template messages via new_strip_legacy_judge_messages; tests were updated/added to cover the new behavior.Reviewed by Cursor Bugbot for commit 133bc10. Bugbot is set up for automated code reviews on this repo. Configure here.