feat: add Evaluator class for judge orchestration by jsonbailey · Pull Request #1331 · launchdarkly/js-core

jsonbailey · 2026-04-28T22:54:17Z

Summary

Adds Evaluator class that wraps judges and JudgeConfiguration
Evaluator.noop() static factory returns a no-op evaluator (resolves to [])
evaluate(input, output) runs all configured judges in parallel; missing judge key → warning + skip (not error)
Evaluator does NOT call tracker.trackJudgeResult — that belongs in the managed layer
Attaches Evaluator to LDAICompletionConfig and LDAIAgentConfig (populated in createChat/createAgent)

Test plan

All 188 existing tests pass
New Evaluator.test.ts covers noop(), judge evaluation, missing judge warns+skips, error handling, no tracker calls

🤖 Generated with Claude Code

Note

Medium Risk
Introduces new judge orchestration and changes how createChat/createJudge construct and wire judges, which can affect when/if evaluations run and how sampling is applied. Main risk is behavioral drift in judge execution paths due to the new evaluator attachment and updated Judge sampling semantics.

Overview
Adds an internal Evaluator abstraction that runs all configured Judge instances in parallel and returns an array of LDJudgeResults (with a noop() mode that always returns []).

Refactors LDAIClientImpl.createChat to build and attach an Evaluator onto the returned LDAICompletionConfig (and updates config types to carry evaluator), instead of returning a populated judges map to TrackedChat; judge instance creation is centralized via a new _createJudgeInstance helper.

Updates Judge to support a constructor-level default sampling rate (exposed via sampleRate) and changes evaluate/evaluateMessages so samplingRate is optional and falls back to the constructor default. Adds/updates Jest coverage for Evaluator, the new Judge sampling behavior, and the updated createJudge constructor arguments.

^{Reviewed by Cursor Bugbot for commit 12f3657. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-04-28T22:55:45Z

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 25623 bytes
Compressed size limit: 29000
Uncompressed size: 125843 bytes

github-actions · 2026-04-28T22:55:56Z

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179311 bytes
Compressed size limit: 200000
Uncompressed size: 830815 bytes

github-actions · 2026-04-28T22:56:07Z

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31866 bytes
Compressed size limit: 34000
Uncompressed size: 113634 bytes

github-actions · 2026-04-28T22:56:15Z

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 38473 bytes
Compressed size limit: 39000
Uncompressed size: 211104 bytes

…-1657) Introduces `Evaluator` wrapping judges and JudgeConfiguration. The evaluator runs all configured judges in parallel, warns+skips on missing judge keys, and intentionally does NOT call tracker.trackJudgeResult — that responsibility belongs in the managed layer. Attaches Evaluator to LDAICompletionConfig and LDAIAgentConfig via createChat/createAgent. Adds Evaluator.noop() static factory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Include judgeConfigKey in error LDJudgeResults emitted from the catch block so error results are attributable to a specific judge config. - Switch from Promise.allSettled to Promise.all in Evaluator.evaluate; the map callback already catches internally so allSettled never sees rejections. Filter out null returns from the missing-judge path. - Mark the Evaluator class @internal and remove it from the public api/judge re-exports. It is consumed only by the managed layer; tests import via the source path. - Mark the evaluator property on LDAICompletionConfig and LDAIAgentConfig as @internal so it is excluded from the published API surface. - Simplify _initializeJudges to return Map<string, Judge> directly, eliminating the Record-then-convert step in _buildEvaluator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…r to Judge[] - `Judge` constructor now takes a `sampleRate: number = 1.0` between the provider and the optional logger. The rate is stored as `private readonly _sampleRate` with a public `get sampleRate()` getter, in line with the rest of the file's `_xxx` field convention. - `Judge.evaluate` and `Judge.evaluateMessages` now take an optional `samplingRate?: number`; when omitted (or `undefined`) the constructor default is used. Falls through with `samplingRate ?? this._sampleRate` so an explicit `0` overrides correctly. - `Evaluator` is now constructed with just `(judges: Judge[], logger?)` — the `Map<string, Judge>` and `LDJudgeConfiguration` parameters are gone. `evaluate()` iterates the array and calls `judge.evaluate(input, output)` with no per-call rate; sampling is the judge's concern. The missing-judge warn-and-skip path is gone since the array can only contain judges that were successfully created. - `LDAIClientImpl._initializeJudges` is removed. The judge-construction body of `createJudge` is extracted into a private `_createJudgeInstance` helper that does NOT emit `TRACK_USAGE_CREATE_JUDGE`. The public `createJudge` emits the usage event then delegates — matching the existing `completionConfig` / `_completionConfig` and `judgeConfig` / `_judgeConfig` split. The public `createJudge` (and the `LDAIClient` interface) gain an optional `sampleRate?: number = 1.0` so callers can bake the default rate at construction. `_buildEvaluator` now inlines the per-config `Promise.all`, calls `_createJudgeInstance` per judge (no per-judge usage event), and constructs `new Evaluator(judges, this._logger)`. - Drop the legacy `judges: Record<string, Judge>` build in `createChat`. The `TrackedChat` constructor still accepts a `judges` param (its default-empty `{}`), but the client no longer populates it; the deprecated `invoke()` path will see no judges. The `run()` path is unaffected — it relies on the `evaluator` attached to the config. - Update `Evaluator.test.ts` to construct with `Judge[]`. Add new `Judge.test.ts` cases covering sampleRate defaulting, the per-call override (including explicit `0`), and the `undefined` fallthrough. - Update the `LDAIClientImpl.test.ts` `createJudge` expectation to assert the new constructor arity (with the default `1.0`). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…orts Judge.evaluate() never throws — it has its own internal try/catch and always returns an LDJudgeResult. Simplify Evaluator.evaluate() to plain Promise.all, remove the now-unused _logger param from the constructor, and drop the Evaluator export from the public judge index. Update LDAIClientImpl and tests to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-05-01T15:56:44Z

+    // Attach the evaluator to the config for use by the managed layer
+    const configWithEvaluator: LDAICompletionConfig = { ...config, evaluator };
+
+    return new TrackedChat(configWithEvaluator, provider, {}, this._logger);


🔴 TrackedChat receives empty judges record, breaking all judge evaluations

createChat now passes an empty object {} for the judges parameter of TrackedChat, while attaching an evaluator to the config instead. However, TrackedChat._evaluateWithJudges() (TrackedChat.ts:85) still looks up judges from this.judges[judgeConfig.key], which will always return undefined since the record is empty. This means every judge evaluation triggered by TrackedChat.invoke() will fail with the error "Judge configuration is not enabled for ..." instead of actually running the judge. The evaluator on the config is never consumed by TrackedChat.

Prompt for agents

The PR introduces an Evaluator abstraction and attaches it to the config via configWithEvaluator, but TrackedChat was not updated to use it. TrackedChat._evaluateWithJudges() (TrackedChat.ts:77-115) still relies on the this.judges record passed to its constructor, which is now always empty {}. Two approaches to fix this: 1. Update TrackedChat to use the evaluator from the config (this.aiConfig.evaluator) instead of the old this.judges record. This would mean replacing the _evaluateWithJudges method to delegate to the evaluator, and updating the invoke() method accordingly. Note that TrackedChat currently also calls tracker.trackJudgeResult for each result, which still needs to happen. 2. Alternatively, keep populating the judges record as before (reverting the empty {} change) while also attaching the evaluator for other consumers. But this would mean duplicating the judge instances. Approach 1 is cleaner and aligns with the PR's intent. The key files are: - packages/sdk/server-ai/src/LDAIClientImpl.ts (createChat method, lines 296-332) - packages/sdk/server-ai/src/api/chat/TrackedChat.ts (constructor, invoke, _evaluateWithJudges)

Was this helpful? React with 👍 or 👎 to provide feedback.

This is intentional and will be resolved in a subsequent commit

jsonbailey changed the title ~~feat(server-sdk-ai): add Evaluator class for judge orchestration~~ feat: add Evaluator class for judge orchestration Apr 28, 2026

jsonbailey force-pushed the jb/aic-2174/server-ai branch from 46ab0a4 to c751ce6 Compare April 28, 2026 23:12

jsonbailey marked this pull request as ready for review April 28, 2026 23:23

jsonbailey requested a review from a team as a code owner April 28, 2026 23:23

This comment was marked as resolved.

Sign in to view

joker23 reviewed Apr 29, 2026

View reviewed changes

jsonbailey and others added 2 commits April 30, 2026 09:57

This comment was marked as resolved.

Sign in to view

jsonbailey requested a review from joker23 May 1, 2026 15:53

devin-ai-integration Bot reviewed May 1, 2026

View reviewed changes

joker23 approved these changes May 1, 2026

View reviewed changes

jsonbailey changed the base branch from main to next-ai-release May 1, 2026 16:14

jsonbailey merged commit f26c4cc into next-ai-release May 1, 2026
43 checks passed

jsonbailey deleted the jb/aic-2174/server-ai branch May 1, 2026 16:20

jsonbailey mentioned this pull request May 1, 2026

chore: Next AI SDK Release #1347

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Evaluator class for judge orchestration#1331

feat: add Evaluator class for judge orchestration#1331
jsonbailey merged 4 commits intonext-ai-releasefrom
jb/aic-2174/server-ai

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 1, 2026

Uh oh!

jsonbailey May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented Apr 28, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

jsonbailey May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented Apr 28, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading