feat: introduce ManagedResult, RunnerResult, and LDAIMetricSummary#1332
feat: introduce ManagedResult, RunnerResult, and LDAIMetricSummary#1332jsonbailey wants to merge 2 commits intonext-ai-releasefrom
Conversation
46ab0a4 to
c751ce6
Compare
fe6948b to
192315f
Compare
… (AIC-2388) Adds RunnerProtocol.test.ts to verify that the Runner and AgentGraphRunner interfaces can be implemented as plain objects. The Runner, AgentGraphRunner interfaces, AIProvider deprecation, and providers/index.ts re-exports landed in the parent PR (#1332). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@cursor review |
…IC-2388) Adds RunnerResult (provider-level result type without evaluations), ManagedResult (managed-layer result with async evaluations promise), and LDAIMetricSummary (flat metric summary including resumptionToken). Adds toolCalls and durationMs to LDAIMetrics. TrackedChat.run() replaces invoke() returning ManagedResult with LDAIMetricSummary built from tracker. Adds createModel() to LDAIClient/LDAIClientImpl as the preferred replacement for createChat(). Updates chat-judge example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… conversation management - Add `Runner` and `AgentGraphRunner` interfaces in api/providers/Runner.ts. Runner.run takes a prompt string + optional output schema and returns a RunnerResult. AgentGraphRunner.run takes a string and returns an AgentGraphRunnerResult. Re-export both from api/providers/index.ts. - Add the supporting `GraphMetrics` and `AgentGraphRunnerResult` types to api/graph/types.ts so AgentGraphRunner has its result shape on this branch. - Rename `TrackedChat` -> `ManagedModel` (file + class). The constructor now takes a `Runner` instead of an `AIProvider`. The class is stateless: it owns no conversation history, and `run(prompt)` forwards the prompt directly to the runner. Drop `invoke()`, `_evaluateWithJudges`, `appendMessages`, `getMessages`, `getJudges`, `getProvider`, and the internal `messages` field. - Update `LDAIClientImpl.createModel` to construct a `ManagedModel` with a `Runner`. The factory still produces a (deprecated) `AIProvider`, so a small `runnerFromAIProvider` adapter wraps it: it prepends the AIConfig's configured messages to the user prompt to preserve existing system-prompt behavior under the stateless contract. - Mark `createChat` `@deprecated` (now delegates to `createModel`); keep `initChat` deprecated. Update the `LDAIClient` interface accordingly. - Mark the `AIProvider` abstract class `@deprecated` in favor of `Runner`. - Update `tracked-chat` and `chat-observability` examples to call `createModel` + `model.run()` instead of `createChat` + `chat.invoke()`. - Rewrite the test suite for the stateless ManagedModel: prompt is passed through verbatim, no history is retained, ManagedResult is built from the RunnerResult plus the tracker's resumption token. Drop the old tests for `appendMessages`/`getMessages`/`invoke`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e163f7d to
c906f79
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e163f7d. Configure here.
| }; | ||
|
|
||
| // Evaluations are wired in a follow-up PR. For now, resolve empty. | ||
| const evaluations: Promise<LDJudgeResult[]> = Promise.resolve([]); |
There was a problem hiding this comment.
Evaluator built but never called in run()
High Severity
createModel() builds an Evaluator (initializing judges via async network calls), attaches it to configWithEvaluator, and passes that config to ManagedModel. However, ManagedModel.run() ignores this.aiConfig.evaluator entirely and hardcodes evaluations to Promise.resolve([]). Since the old TrackedChat with working judge evaluations is deleted in this same PR, judge evaluations are silently non-functional. The chat-judge example will always print empty results despite judges being configured.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e163f7d. Configure here.
| * Resumption token for deferred feedback association. | ||
| */ | ||
| resumptionToken?: string; | ||
| } |
There was a problem hiding this comment.
Conflicting LDAIMetricSummary interfaces with same name
Medium Severity
Two incompatible interfaces named LDAIMetricSummary now exist. The original in config/LDAIConfigTracker.ts has tokens?: LDTokenUsage and success?: boolean; the new one in model/types.ts has usage?: LDTokenUsage and success: boolean (required). The new one is publicly exported, but LDAIConfigTracker.getSummary() returns the old one — users importing LDAIMetricSummary to type the return of getSummary() will get a type mismatch.
Reviewed by Cursor Bugbot for commit e163f7d. Configure here.


Summary
RunnerResultinterface (provider-level result: content, metrics, raw?, parsed? — no evaluations)ManagedResultinterface (managed-layer result withevaluations: Promise<JudgeResult[]>)LDAIMetricSummary(flat summary: success, usage?, toolCalls?, durationMs?, resumptionToken?)toolCalls?anddurationMs?fields toLDAIMetricsTrackedChat.run()replaces/supplementsinvoke(), returningManagedResultwith metric summary built from trackercreateModel()toLDAIClientandLDAIClientImplas the preferred replacement forcreateChat()chat-judgeexample to usecreateModel()andrun()Test plan
chat-judgeexample updated to use new API🤖 Generated with Claude Code
Note
Medium Risk
Medium risk due to API shape changes (
createModel/createChat,TrackedChatremoval) and new provider abstraction (Runner) that can affect downstream integrations and prompt/message handling.Overview
Introduces a new managed invocation layer via
ManagedModel.run()that returns aManagedResultwith flattenedLDAIMetricSummaryand an asynchronousevaluationspromise, alongside provider-levelRunner/RunnerResulttypes.Replaces the stateful
TrackedChatAPI with stateless model execution:LDAIClient.createModel()is added as the preferred entry point, whilecreateChat/initChatare deprecated shims; examples are updated to usecreateModel+run. Judge execution is split into a standaloneEvaluator(parallel, best-effort) and configs now carry an internalevaluatorreference for the managed layer.Extends metrics to include optional
toolCallsanddurationMs, and adds initial graph runner result/metrics types to support the new runner protocol.Reviewed by Cursor Bugbot for commit e163f7d. Bugbot is set up for automated code reviews on this repo. Configure here.