feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore by jsonbailey · Pull Request #132 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-04-14T14:36:49Z

Summary

Replaces JudgeResponse + nested EvalScore dict with a flat JudgeResult dataclass — one judge produces one result, so the dict was unnecessary abstraction
Adds sampled: bool to cleanly distinguish skipped-by-sampling-rate from failure (previously returned None)
Judge.evaluate() always returns a JudgeResult — never None; judge_config_key is always set on every return path
Renames track_judge_response → track_judge_result on LDAIConfigTracker; removes track_eval_scores
Removes track_judge_response from AIGraphTracker — judges are node-level only (spec updated in launchdarkly/sdk-specs#147)

Breaking Changes

JudgeResponse removed — use JudgeResult
EvalScore removed — fields are now inline on JudgeResult (score, reasoning, metric_key)
error field renamed to error_message
track_judge_response removed — use track_judge_result
track_eval_scores removed
Judge.evaluate() returns JudgeResult instead of Optional[JudgeResponse]

Test plan

All 114 server-ai tests passing

🤖 Generated with Claude Code

Note

Medium Risk
Breaking API change to judge evaluation return types and tracking hooks; may impact downstream consumers expecting None/dict-shaped evaluations or calling removed tracker methods.

Overview
Simplifies judge evaluation results and tracking. Replaces JudgeResponse + nested EvalScore dict with a single flat JudgeResult (score/reasoning/metric key), and updates exports so JudgeResult is the public type.

Judge.evaluate()/evaluate_messages() now always return a JudgeResult (never None), using sampled=True to represent sampling skips and error_message for failures. ModelResponse.evaluations and ManagedModel’s judge dispatch/tracking are updated accordingly, and LDAIConfigTracker consolidates judge metric emission into track_judge_result while removing the old track_eval_scores/track_judge_response paths; tests are updated to match the new contract.

^{Reviewed by Cursor Bugbot for commit 7e23fa2. Bugbot is set up for automated code reviews on this repo. Configure here.}

BREAKING CHANGE: `JudgeResponse` and `EvalScore` are removed. Replace with the new flat `JudgeResult` dataclass. `track_judge_response` and `track_eval_scores` on `LDAIConfigTracker` are removed; use `track_judge_result` instead. - Replace `JudgeResponse` + nested `EvalScore` dict with a flat `JudgeResult` dataclass (`score`, `reasoning`, `metric_key`, `judge_config_key`, `success`, `sampled`, `error_message`) - Add `sampled: bool` to distinguish skipped-by-sampling-rate from failure - Rename `error` → `error_message` - Rename `track_judge_response` → `track_judge_result` on `LDAIConfigTracker`; remove `track_eval_scores` - Remove `track_judge_response` from `AIGraphTracker` (judges are node-level only) - `Judge.evaluate()` always returns a `JudgeResult` (never `None`); builds the result progressively so `judge_config_key` is always set - Simplify `_parse_evaluation_response` to return `(score, reasoning)` tuple Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-server-sdk-ai: 0.18.0</summary> ## [0.18.0](launchdarkly-server-sdk-ai-0.17.0...launchdarkly-server-sdk-ai-0.18.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) * Flatten JudgeResponse and EvalScore into new JudgeResult ([#132](#132)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Flatten JudgeResponse and EvalScore into new JudgeResult ([#132](#132)) ([af4e463](af4e463)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> <details><summary>launchdarkly-server-sdk-ai-langchain: 0.5.0</summary> ## [0.5.0](launchdarkly-server-sdk-ai-langchain-0.4.1...launchdarkly-server-sdk-ai-langchain-0.5.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> <details><summary>launchdarkly-server-sdk-ai-openai: 0.4.0</summary> ## [0.4.0](launchdarkly-server-sdk-ai-openai-0.3.0...launchdarkly-server-sdk-ai-openai-0.4.0) (2026-04-21) ### ⚠ BREAKING CHANGES * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ### Features * Add per-execution runId, at-most-once tracking, and cross-process tracker resumption ([#133](#133)) ([68685cd](68685cd)) * Move graph_key to AIConfigTracker instantiation ([#134](#134)) ([20fff24](20fff24)) * rename track_latency to track_duration on AIGraphTracker ([#138](#138)) ([05758a7](05758a7)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  --- > [!NOTE] > **Medium Risk** > Release-only changes, but they publish new versions that include breaking API updates (tracker lifecycle changes, `track_latency` rename, judge result flattening) that can impact downstream consumers. > > **Overview** > Publishes new releases for `launchdarkly-server-sdk-ai` (**0.18.0**) and the LangChain/OpenAI provider packages (**0.5.0** / **0.4.0**), updating the release manifest, package versions, and changelogs. > > Updates provider dependencies to require `launchdarkly-server-sdk-ai>=0.18.0`, and refreshes release documentation (`PROVENANCE.md`) and `ldai.__version__` to match the new SDK version. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit eecee01. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>  --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey and others added 2 commits April 14, 2026 09:36

Merge branch 'main' into jb/aic-2200/simplify-judge-response

7e23fa2

jsonbailey marked this pull request as ready for review April 14, 2026 15:27

jsonbailey requested a review from a team as a code owner April 14, 2026 15:27

andrewklatzke approved these changes Apr 14, 2026

View reviewed changes

jsonbailey merged commit af4e463 into main Apr 15, 2026
47 checks passed

jsonbailey deleted the jb/aic-2200/simplify-judge-response branch April 15, 2026 21:50

github-actions Bot mentioned this pull request Apr 15, 2026

chore: release main #136

Merged

jsonbailey mentioned this pull request Apr 16, 2026

feat!: Flatten JudgeResponse and EvalScore into new LDJudgeResult launchdarkly/js-core#1284

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132

feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132
jsonbailey merged 2 commits intomainfrom
jb/aic-2200/simplify-judge-response

jsonbailey commented Apr 14, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented Apr 14, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented Apr 14, 2026 •

edited by cursor Bot

Loading