feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run()#148
Merged
jsonbailey merged 15 commits intomainfrom May 1, 2026
Merged
feat!: Add ManagedResult, RunnerResult, and Runner protocol; rename invoke() to run()#148jsonbailey merged 15 commits intomainfrom
jsonbailey merged 15 commits intomainfrom
Conversation
b0ca696 to
d403590
Compare
a564649 to
bd4cd68
Compare
bd4cd68 to
45441da
Compare
a997b91 to
d0b3436
Compare
45441da to
27bcfc0
Compare
d0b3436 to
e56f69a
Compare
27bcfc0 to
ff47ec2
Compare
jsonbailey
commented
Apr 29, 2026
369242d to
b8d3fad
Compare
b8d3fad to
4e28ae6
Compare
adfd9f0 to
b4d15df
Compare
keelerm84
approved these changes
Apr 30, 2026
2 tasks
…nvoke() to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The new track_tool_calls method at line 413 (with summary storage and dedup guard) was being shadowed by the older method at line 559 (which only fired per-tool events). Merge them into a single method that both stores to the summary and fires per-tool events.
Previously, metrics_extractor(result) was called twice — once in the public track_metrics_of/track_metrics_of_async to read duration_ms, and again inside _track_from_metrics_extractor to track success, tokens, and tool calls. Extract metrics once in the public method and pass the resulting metrics + elapsed_ms into the private helper, which now also handles the duration tracking.
ManagedModel and ManagedAgent now require a Runner. The compat shims (_invoke_runner, isinstance(result, RunnerResult) branches, Union type annotations) are removed; result handling is direct on RunnerResult fields. The deprecated ManagedModel.invoke() is preserved for backwards compat but now delegates to run() and adapts the ManagedResult into the legacy ModelResponse shape. ModelRunner and AgentRunner protocol definitions remain in place so downstream provider packages that import them continue to work.
- Drop the inconsistent 'if metrics else None' guard on reported_ms; the next line already dereferences metrics.success unconditionally. - Use 'is not None' for tool_calls so an explicit empty list still triggers tracking (preserves the distinction between 'not tracked' and 'tracked with no calls').
Drop the deprecated invoke() method from the managed layer along with its dedicated test class and the warnings/LDAIMetrics/ModelResponse imports that were only needed by it. Type definitions in providers/ remain so downstream provider packages keep building.
…unner] The factory's downstream consumers (ManagedModel, ManagedAgent) now take Runner; aligning the factory's return types lets us drop the type: ignore comments at the ManagedModel/ManagedAgent call sites. Provider package PRs will update their concrete implementations to match. Judge still takes ModelRunner, so its call site picks up the type: ignore[arg-type] in its place — that's resolved later in the cleanup PR when Judge migrates to Runner.
Move the metrics_extractor call inside _track_from_metrics_extractor so extraction errors are caught and logged without bubbling up. When extraction fails or returns None, only the wall-clock duration is tracked — success/error is left untouched since the underlying model call itself succeeded. Also tighten the tool_calls check to access metrics.tool_calls directly, mirroring how metrics.usage is accessed.
- Judge now accepts Runner instead of ModelRunner - evaluate() calls runner.run(output_type=...) instead of invoke_structured_model - response.parsed replaces StructuredResponse.data; None guard added - evaluate_messages() accepts RunnerResult instead of ModelResponse - Tests updated to use RunnerResult and mock_runner.run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ics]], remove defensive getattr
cc792ec to
e2e2b6e
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e2e2b6e. Configure here.
…odel ManagedModel.run() calls self._model_runner.run(), not invoke_model. The previous mocks were dead code that never exercised the runner. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Introduces the new managed-layer return type `ManagedResult`, the unified `Runner` protocol, and extends `LDAIMetricSummary` with `tool_calls`, `duration_ms` (renamed from `duration`), and `resumption_token`.
Stack
This is part of the AIC-2388 stacked PR series. Targets `main` (PR #147 merged).
Order: PR 7 ✅ → PR 8 (this) → PR 8-openai → PR 8-langchain → Cleanup → PR 9 → PR 10 → PR 11 → PR 11-openai → PR 11-langchain → PR 12
Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Medium risk due to a breaking API surface change (
invoke()removed/renamed torun()) and refactors across managed model/agent and judge evaluation paths that could impact integrations and metrics tracking.Overview
Introduces a unified runner interface and new result types. Adds a
Runnerprotocol with a singlerun()method returningRunnerResult, plus a managed-layerManagedResultthat bundlescontent, aggregatedLDAIMetricSummary, optionalparsedstructured output, and optional async judgeevaluations.Updates managed and judge APIs to the new runner/result model.
ManagedModel.invoke()is replaced byManagedModel.run()(and the client examples updated),ManagedAgent.run()now returnsManagedResult, andJudgeswitches frominvoke_structured_model/StructuredResponsetoRunner.run(..., output_type=...)and reads structured output fromRunnerResult.parsed.Expands tracking/metrics payloads.
LDAIMetricsgainstool_callsandduration_ms(included into_dict()),LDAIMetricSummaryaddstool_calls,duration_ms(with deprecateddurationalias), and eagerly capturesresumption_token;LDAIConfigTracker.track_metrics_of(_async)now supports optional metrics extraction, prefersmetrics.duration_msover wall-clock time, and tracks tool-call events once per execution.Reviewed by Cursor Bugbot for commit 5925da6. Bugbot is set up for automated code reviews on this repo. Configure here.