fix: Replace done_callback with coroutine chain for judge tracking by jsonbailey · Pull Request #147 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-04-28T22:50:50Z

Summary

Replaces add_done_callback in ManagedModel._track_judge_results with a proper _run_and_track coroutine wrapped in asyncio.create_task, so that awaiting response.evaluations guarantees both evaluation and tracker.track_judge_result() calls complete in sequence
The old callback approach ran outside the asyncio scheduler and gave no ordering guarantees; the new approach makes the chain explicit and testable
Adds test_managed_model.py with 5 tests covering: return-before-resolve, collect results, tracking fires inside the awaited chain, failed results skip tracking, noop evaluator returns empty list

Test plan

All 138 existing tests pass
5 new test_managed_model.py tests added and passing
e2e validation via hello-python-ai agent

🤖 Generated with Claude Code

Note

Medium Risk
Changes async scheduling for judge evaluations/tracking, which can affect ordering and side effects (metrics emission) during ManagedModel.invoke and when consumers await response.evaluations. Covered by new tests but still touches concurrency behavior.

Overview
Fixes judge-result tracking determinism. ManagedModel._track_judge_results now wraps evaluator execution in an awaited coroutine chain and returns a new asyncio task, so awaiting response.evaluations reliably includes both evaluation completion and tracker.track_judge_result() side effects (instead of relying on a add_done_callback).

Adds a focused test_managed_model.py suite validating that invoke() returns before evaluations finish, that awaiting evaluations yields results, and that tracking fires only for successful judge results (including a noop-evaluator case).

^{Reviewed by Cursor Bugbot for commit 381cf75. Bugbot is set up for automated code reviews on this repo. Configure here.}

`_track_judge_results` previously used `add_done_callback` to fire `track_judge_result()` after evaluation completed, but callbacks run outside the asyncio task scheduler and can execute at unpredictable times. Replace with a single `_run_and_track` coroutine wrapped in a new `asyncio.create_task`, so that awaiting `response.evaluations` guarantees both evaluation and tracker calls complete in sequence. Add `test_managed_model.py` covering: invoke() returns before evaluations resolve; awaiting evaluations collects results; tracking fires inside the awaited chain (not before); failed judge results do not trigger tracking; noop evaluator returns an empty list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 381cf75. Configure here.}

cursor · 2026-04-28T23:04:12Z

+            for r in results:
                if r.success:
                    tracker.track_judge_result(r)
+            return results


Missing exception handling loses evaluation results on tracking failure

Medium Severity

The _run_and_track coroutine has no try/except around the tracker.track_judge_result(r) call. If tracking raises (e.g., the LD client is shut down or track() fails internally), the exception propagates through the wrapper task, and callers awaiting response.evaluations receive an exception instead of the List[JudgeResult]. The old add_done_callback approach was inherently resilient to this — asyncio catches callback exceptions and logs them without affecting the task's result. The new inline approach loses that isolation, meaning a tracking failure now destroys the evaluation results that were successfully computed.

^{Reviewed by Cursor Bugbot for commit 381cf75. Configure here.}

jsonbailey marked this pull request as ready for review April 28, 2026 22:57

jsonbailey requested a review from a team as a code owner April 28, 2026 22:57

jsonbailey force-pushed the jb/aic-2174/evaluations branch from a2402ea to 381cf75 Compare April 28, 2026 22:57

jsonbailey changed the title ~~fix(ldai): Replace done_callback with coroutine chain for judge tracking~~ fix: Replace done_callback with coroutine chain for judge tracking Apr 28, 2026

cursor Bot reviewed Apr 28, 2026

View reviewed changes

This was referenced Apr 28, 2026

feat: Add evaluations support to ManagedAgent.run() #153

Draft

feat: Graph tracking refactor — ManagedAgentGraph drives tracking for new runner shape #154

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Replace done_callback with coroutine chain for judge tracking#147

fix: Replace done_callback with coroutine chain for judge tracking#147
jsonbailey wants to merge 1 commit intomainfrom
jb/aic-2174/evaluations

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jsonbailey commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Missing exception handling loses evaluation results on tracking failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading