Skip to content

feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132

Merged
jsonbailey merged 2 commits intomainfrom
jb/aic-2200/simplify-judge-response
Apr 15, 2026
Merged

feat!: Rename JudgeResponse to JudgeResult and flatten EvalScore#132
jsonbailey merged 2 commits intomainfrom
jb/aic-2200/simplify-judge-response

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented Apr 14, 2026

Summary

  • Replaces JudgeResponse + nested EvalScore dict with a flat JudgeResult dataclass — one judge produces one result, so the dict was unnecessary abstraction
  • Adds sampled: bool to cleanly distinguish skipped-by-sampling-rate from failure (previously returned None)
  • Judge.evaluate() always returns a JudgeResult — never None; judge_config_key is always set on every return path
  • Renames track_judge_responsetrack_judge_result on LDAIConfigTracker; removes track_eval_scores
  • Removes track_judge_response from AIGraphTracker — judges are node-level only (spec updated in launchdarkly/sdk-specs#147)

Breaking Changes

  • JudgeResponse removed — use JudgeResult
  • EvalScore removed — fields are now inline on JudgeResult (score, reasoning, metric_key)
  • error field renamed to error_message
  • track_judge_response removed — use track_judge_result
  • track_eval_scores removed
  • Judge.evaluate() returns JudgeResult instead of Optional[JudgeResponse]

Test plan

  • All 114 server-ai tests passing

🤖 Generated with Claude Code


Note

Medium Risk
Breaking API change to judge evaluation return types and tracking hooks; may impact downstream consumers expecting None/dict-shaped evaluations or calling removed tracker methods.

Overview
Simplifies judge evaluation results and tracking. Replaces JudgeResponse + nested EvalScore dict with a single flat JudgeResult (score/reasoning/metric key), and updates exports so JudgeResult is the public type.

Judge.evaluate()/evaluate_messages() now always return a JudgeResult (never None), using sampled=True to represent sampling skips and error_message for failures. ModelResponse.evaluations and ManagedModel’s judge dispatch/tracking are updated accordingly, and LDAIConfigTracker consolidates judge metric emission into track_judge_result while removing the old track_eval_scores/track_judge_response paths; tests are updated to match the new contract.

Reviewed by Cursor Bugbot for commit 7e23fa2. Bugbot is set up for automated code reviews on this repo. Configure here.

jsonbailey and others added 2 commits April 14, 2026 09:36
BREAKING CHANGE: `JudgeResponse` and `EvalScore` are removed. Replace with the
new flat `JudgeResult` dataclass. `track_judge_response` and `track_eval_scores`
on `LDAIConfigTracker` are removed; use `track_judge_result` instead.

- Replace `JudgeResponse` + nested `EvalScore` dict with a flat `JudgeResult`
  dataclass (`score`, `reasoning`, `metric_key`, `judge_config_key`, `success`,
  `sampled`, `error_message`)
- Add `sampled: bool` to distinguish skipped-by-sampling-rate from failure
- Rename `error` → `error_message`
- Rename `track_judge_response` → `track_judge_result` on `LDAIConfigTracker`;
  remove `track_eval_scores`
- Remove `track_judge_response` from `AIGraphTracker` (judges are node-level only)
- `Judge.evaluate()` always returns a `JudgeResult` (never `None`); builds the
  result progressively so `judge_config_key` is always set
- Simplify `_parse_evaluation_response` to return `(score, reasoning)` tuple

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review April 14, 2026 15:27
@jsonbailey jsonbailey requested a review from a team as a code owner April 14, 2026 15:27
@jsonbailey jsonbailey merged commit af4e463 into main Apr 15, 2026
47 checks passed
@jsonbailey jsonbailey deleted the jb/aic-2200/simplify-judge-response branch April 15, 2026 21:50
@github-actions github-actions Bot mentioned this pull request Apr 15, 2026
jsonbailey added a commit that referenced this pull request Apr 22, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>launchdarkly-server-sdk-ai: 0.18.0</summary>

##
[0.18.0](launchdarkly-server-sdk-ai-0.17.0...launchdarkly-server-sdk-ai-0.18.0)
(2026-04-21)


### ⚠ BREAKING CHANGES

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))
* Flatten JudgeResponse and EvalScore into new JudgeResult
([#132](#132))

### Features

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
([68685cd](68685cd))
* Flatten JudgeResponse and EvalScore into new JudgeResult
([#132](#132))
([af4e463](af4e463))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))
([20fff24](20fff24))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
([05758a7](05758a7))
</details>

<details><summary>launchdarkly-server-sdk-ai-langchain: 0.5.0</summary>

##
[0.5.0](launchdarkly-server-sdk-ai-langchain-0.4.1...launchdarkly-server-sdk-ai-langchain-0.5.0)
(2026-04-21)


### ⚠ BREAKING CHANGES

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))

### Features

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
([68685cd](68685cd))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))
([20fff24](20fff24))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
([05758a7](05758a7))
</details>

<details><summary>launchdarkly-server-sdk-ai-openai: 0.4.0</summary>

##
[0.4.0](launchdarkly-server-sdk-ai-openai-0.3.0...launchdarkly-server-sdk-ai-openai-0.4.0)
(2026-04-21)


### ⚠ BREAKING CHANGES

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))

### Features

* Add per-execution runId, at-most-once tracking, and cross-process
tracker resumption
([#133](#133))
([68685cd](68685cd))
* Move graph_key to AIConfigTracker instantiation
([#134](#134))
([20fff24](20fff24))
* rename track_latency to track_duration on AIGraphTracker
([#138](#138))
([05758a7](05758a7))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Release-only changes, but they publish new versions that include
breaking API updates (tracker lifecycle changes, `track_latency` rename,
judge result flattening) that can impact downstream consumers.
> 
> **Overview**
> Publishes new releases for `launchdarkly-server-sdk-ai` (**0.18.0**)
and the LangChain/OpenAI provider packages (**0.5.0** / **0.4.0**),
updating the release manifest, package versions, and changelogs.
> 
> Updates provider dependencies to require
`launchdarkly-server-sdk-ai>=0.18.0`, and refreshes release
documentation (`PROVENANCE.md`) and `ldai.__version__` to match the new
SDK version.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
eecee01. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: jsonbailey <jbailey@launchdarkly.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants