fix: Replace done_callback with coroutine chain for judge tracking by jsonbailey · Pull Request #147 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-04-28T22:50:50Z

Summary

Replaces add_done_callback in ManagedModel._track_judge_results with a proper _run_and_track coroutine wrapped in asyncio.create_task, so that awaiting response.evaluations guarantees both evaluation and tracker.track_judge_result() calls complete in sequence
The old callback approach ran outside the asyncio scheduler and gave no ordering guarantees; the new approach makes the chain explicit and testable
Adds test_managed_model.py with 5 tests covering: return-before-resolve, collect results, tracking fires inside the awaited chain, failed results skip tracking, noop evaluator returns empty list

Test plan

All 138 existing tests pass
5 new test_managed_model.py tests added and passing
e2e validation via hello-python-ai agent

🤖 Generated with Claude Code

Note

Medium Risk
Changes async task orchestration for judge evaluation/tracking, which can affect ordering, exception propagation, and background task behavior. New tests reduce regression risk but subtle event-loop edge cases remain possible.

Overview
Fixes the judge-evaluation tracking chain so that awaiting ModelResponse.evaluations now guarantees judge results are tracked in-sequence with evaluation completion.

ManagedModel._track_judge_results replaces the prior add_done_callback approach with an awaited coroutine wrapper (_run_and_track) scheduled via asyncio.create_task, adds warning logs for failed judge results and tracking exceptions, and introduces a focused test_managed_model.py suite covering non-blocking invoke(), result collection, tracking timing, and failure/noop behavior.

^{Reviewed by Cursor Bugbot for commit 7ae2024. Bugbot is set up for automated code reviews on this repo. Configure here.}

`_track_judge_results` previously used `add_done_callback` to fire `track_judge_result()` after evaluation completed, but callbacks run outside the asyncio task scheduler and can execute at unpredictable times. Replace with a single `_run_and_track` coroutine wrapped in a new `asyncio.create_task`, so that awaiting `response.evaluations` guarantees both evaluation and tracker calls complete in sequence. Add `test_managed_model.py` covering: invoke() returns before evaluations resolve; awaiting evaluations collects results; tracking fires inside the awaited chain (not before); failed judge results do not trigger tracking; noop evaluator returns an empty list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Wrap tracker.track_judge_result() in try/except inside _run_and_track so a tracking failure (e.g., shutdown LD client) does not propagate through the wrapper task and destroy the evaluation results that were successfully computed. Restores the resilience of the previous add_done_callback approach, which asyncio handled with isolation. Also log a warning when a judge evaluation fails (r.success is False) so failures are visible rather than silently skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2f0fd53. Configure here.}

… tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jsonbailey marked this pull request as ready for review April 28, 2026 22:57

jsonbailey requested a review from a team as a code owner April 28, 2026 22:57

jsonbailey force-pushed the jb/aic-2174/evaluations branch from a2402ea to 381cf75 Compare April 28, 2026 22:57

jsonbailey changed the title ~~fix(ldai): Replace done_callback with coroutine chain for judge tracking~~ fix: Replace done_callback with coroutine chain for judge tracking Apr 28, 2026

cursor Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread packages/sdk/server-ai/src/ldai/managed_model.py

This was referenced Apr 28, 2026

feat: Add evaluations support to ManagedAgent.run() #153

Draft

feat: Graph tracking refactor — ManagedAgentGraph drives tracking for new runner shape #154

Draft

jsonbailey force-pushed the jb/aic-2174/evaluations branch from a997b91 to d0b3436 Compare April 29, 2026 13:18

jsonbailey force-pushed the jb/aic-2174/evaluations branch from d0b3436 to e56f69a Compare April 29, 2026 13:21

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/sdk/server-ai/src/ldai/managed_model.py Outdated

jsonbailey and others added 2 commits April 29, 2026 08:35

fix: log warning when judge result tracking fails in ManagedModel

ba11503

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: use consistent warning message for judge tracking failures

2f0fd53

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread packages/sdk/server-ai/tests/test_managed_model.py Outdated

fix: remove unused _make_ai_completion_config helper and dead call in…

7ae2024

… tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

keelerm84 approved these changes Apr 29, 2026

View reviewed changes

jsonbailey merged commit 1e1f36b into main Apr 29, 2026
46 checks passed

jsonbailey deleted the jb/aic-2174/evaluations branch April 29, 2026 16:14

github-actions Bot mentioned this pull request Apr 29, 2026

chore: release main #143

Open

jsonbailey mentioned this pull request Apr 29, 2026

feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run() #148

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Replace done_callback with coroutine chain for judge tracking#147

fix: Replace done_callback with coroutine chain for judge tracking#147
jsonbailey merged 5 commits intomainfrom
jb/aic-2174/evaluations

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading