feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132
Merged
jsonbailey merged 2 commits intomainfrom Apr 15, 2026
Merged
feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132jsonbailey merged 2 commits intomainfrom
jsonbailey merged 2 commits intomainfrom
Conversation
BREAKING CHANGE: `JudgeResponse` and `EvalScore` are removed. Replace with the new flat `JudgeResult` dataclass. `track_judge_response` and `track_eval_scores` on `LDAIConfigTracker` are removed; use `track_judge_result` instead. - Replace `JudgeResponse` + nested `EvalScore` dict with a flat `JudgeResult` dataclass (`score`, `reasoning`, `metric_key`, `judge_config_key`, `success`, `sampled`, `error_message`) - Add `sampled: bool` to distinguish skipped-by-sampling-rate from failure - Rename `error` → `error_message` - Rename `track_judge_response` → `track_judge_result` on `LDAIConfigTracker`; remove `track_eval_scores` - Remove `track_judge_response` from `AIGraphTracker` (judges are node-level only) - `Judge.evaluate()` always returns a `JudgeResult` (never `None`); builds the result progressively so `judge_config_key` is always set - Simplify `_parse_evaluation_response` to return `(score, reasoning)` tuple Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
andrewklatzke
approved these changes
Apr 14, 2026
Merged
4 tasks
jsonbailey
added a commit
that referenced
this pull request
Apr 22, 2026
🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-server-sdk-ai: 0.18.0</summary> ## [0.18.0](launchdarkly-server-sdk-ai-0.17.0...launchdarkly-server-sdk-ai-0.18.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) * Flatten JudgeResponse and EvalScore into new JudgeResult ([#132](#132)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Flatten JudgeResponse and EvalScore into new JudgeResult ([#132](#132)) ([af4e463](af4e463)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> <details><summary>launchdarkly-server-sdk-ai-langchain: 0.5.0</summary> ## [0.5.0](launchdarkly-server-sdk-ai-langchain-0.4.1...launchdarkly-server-sdk-ai-langchain-0.5.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> <details><summary>launchdarkly-server-sdk-ai-openai: 0.4.0</summary> ## [0.4.0](launchdarkly-server-sdk-ai-openai-0.3.0...launchdarkly-server-sdk-ai-openai-0.4.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Release-only changes, but they publish new versions that include breaking API updates (tracker lifecycle changes, `track_latency` rename, judge result flattening) that can impact downstream consumers. > > **Overview** > Publishes new releases for `launchdarkly-server-sdk-ai` (**0.18.0**) and the LangChain/OpenAI provider packages (**0.5.0** / **0.4.0**), updating the release manifest, package versions, and changelogs. > > Updates provider dependencies to require `launchdarkly-server-sdk-ai>=0.18.0`, and refreshes release documentation (`PROVENANCE.md`) and `ldai.__version__` to match the new SDK version. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit eecee01. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
JudgeResponse+ nestedEvalScoredict with a flatJudgeResultdataclass — one judge produces one result, so the dict was unnecessary abstractionsampled: boolto cleanly distinguish skipped-by-sampling-rate from failure (previously returnedNone)Judge.evaluate()always returns aJudgeResult— neverNone;judge_config_keyis always set on every return pathtrack_judge_response→track_judge_resultonLDAIConfigTracker; removestrack_eval_scorestrack_judge_responsefromAIGraphTracker— judges are node-level only (spec updated in launchdarkly/sdk-specs#147)Breaking Changes
JudgeResponseremoved — useJudgeResultEvalScoreremoved — fields are now inline onJudgeResult(score,reasoning,metric_key)errorfield renamed toerror_messagetrack_judge_responseremoved — usetrack_judge_resulttrack_eval_scoresremovedJudge.evaluate()returnsJudgeResultinstead ofOptional[JudgeResponse]Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Breaking API change to judge evaluation return types and tracking hooks; may impact downstream consumers expecting
None/dict-shaped evaluations or calling removed tracker methods.Overview
Simplifies judge evaluation results and tracking. Replaces
JudgeResponse+ nestedEvalScoredict with a single flatJudgeResult(score/reasoning/metric key), and updates exports soJudgeResultis the public type.Judge.evaluate()/evaluate_messages()now always return aJudgeResult(neverNone), usingsampled=Trueto represent sampling skips anderror_messagefor failures.ModelResponse.evaluationsandManagedModel’s judge dispatch/tracking are updated accordingly, andLDAIConfigTrackerconsolidates judge metric emission intotrack_judge_resultwhile removing the oldtrack_eval_scores/track_judge_responsepaths; tests are updated to match the new contract.Reviewed by Cursor Bugbot for commit 7e23fa2. Bugbot is set up for automated code reviews on this repo. Configure here.