Skip to content

fix: Replace done_callback with coroutine chain for judge tracking#147

Merged
jsonbailey merged 5 commits intomainfrom
jb/aic-2174/evaluations
Apr 29, 2026
Merged

fix: Replace done_callback with coroutine chain for judge tracking#147
jsonbailey merged 5 commits intomainfrom
jb/aic-2174/evaluations

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented Apr 28, 2026

Summary

  • Replaces add_done_callback in ManagedModel._track_judge_results with a proper _run_and_track coroutine wrapped in asyncio.create_task, so that awaiting response.evaluations guarantees both evaluation and tracker.track_judge_result() calls complete in sequence
  • The old callback approach ran outside the asyncio scheduler and gave no ordering guarantees; the new approach makes the chain explicit and testable
  • Adds test_managed_model.py with 5 tests covering: return-before-resolve, collect results, tracking fires inside the awaited chain, failed results skip tracking, noop evaluator returns empty list

Test plan

  • All 138 existing tests pass
  • 5 new test_managed_model.py tests added and passing
  • e2e validation via hello-python-ai agent

🤖 Generated with Claude Code


Note

Medium Risk
Changes async task orchestration for judge evaluation/tracking, which can affect ordering, exception propagation, and background task behavior. New tests reduce regression risk but subtle event-loop edge cases remain possible.

Overview
Fixes the judge-evaluation tracking chain so that awaiting ModelResponse.evaluations now guarantees judge results are tracked in-sequence with evaluation completion.

ManagedModel._track_judge_results replaces the prior add_done_callback approach with an awaited coroutine wrapper (_run_and_track) scheduled via asyncio.create_task, adds warning logs for failed judge results and tracking exceptions, and introduces a focused test_managed_model.py suite covering non-blocking invoke(), result collection, tracking timing, and failure/noop behavior.

Reviewed by Cursor Bugbot for commit 7ae2024. Bugbot is set up for automated code reviews on this repo. Configure here.

`_track_judge_results` previously used `add_done_callback` to fire
`track_judge_result()` after evaluation completed, but callbacks run
outside the asyncio task scheduler and can execute at unpredictable
times. Replace with a single `_run_and_track` coroutine wrapped in a
new `asyncio.create_task`, so that awaiting `response.evaluations`
guarantees both evaluation and tracker calls complete in sequence.

Add `test_managed_model.py` covering: invoke() returns before
evaluations resolve; awaiting evaluations collects results; tracking
fires inside the awaited chain (not before); failed judge results do
not trigger tracking; noop evaluator returns an empty list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review April 28, 2026 22:57
@jsonbailey jsonbailey requested a review from a team as a code owner April 28, 2026 22:57
@jsonbailey jsonbailey force-pushed the jb/aic-2174/evaluations branch from a2402ea to 381cf75 Compare April 28, 2026 22:57
@jsonbailey jsonbailey changed the title fix(ldai): Replace done_callback with coroutine chain for judge tracking fix: Replace done_callback with coroutine chain for judge tracking Apr 28, 2026
Comment thread packages/sdk/server-ai/src/ldai/managed_model.py
Wrap tracker.track_judge_result() in try/except inside _run_and_track
so a tracking failure (e.g., shutdown LD client) does not propagate
through the wrapper task and destroy the evaluation results that were
successfully computed. Restores the resilience of the previous
add_done_callback approach, which asyncio handled with isolation.

Also log a warning when a judge evaluation fails (r.success is False)
so failures are visible rather than silently skipped.
@jsonbailey jsonbailey force-pushed the jb/aic-2174/evaluations branch from d0b3436 to e56f69a Compare April 29, 2026 13:21
Comment thread packages/sdk/server-ai/src/ldai/managed_model.py Outdated
jsonbailey and others added 2 commits April 29, 2026 08:35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2f0fd53. Configure here.

Comment thread packages/sdk/server-ai/tests/test_managed_model.py Outdated
… tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey merged commit 1e1f36b into main Apr 29, 2026
46 checks passed
@jsonbailey jsonbailey deleted the jb/aic-2174/evaluations branch April 29, 2026 16:14
@github-actions github-actions Bot mentioned this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants