Skip to content

feat: simplify evaluation schema to flat score/reasoning shape#1286

Merged
jsonbailey merged 6 commits intofeat/ai-sdk-next-releasefrom
jb/aic-2253/simplify-eval-schema
Apr 17, 2026
Merged

feat: simplify evaluation schema to flat score/reasoning shape#1286
jsonbailey merged 6 commits intofeat/ai-sdk-next-releasefrom
jb/aic-2253/simplify-eval-schema

Conversation

@jsonbailey
Copy link
Copy Markdown
Contributor

@jsonbailey jsonbailey commented Apr 16, 2026

Summary

  • Removed the metric key from the structured output schema. EvaluationSchemaBuilder.build() no longer takes an evaluationMetricKey parameter. Since there is only ever a single evaluation metric key per judge config, it does not need to be embedded in the schema sent to the LLM.
  • Flattened the schema to a top-level {score, reasoning} shape. The old nested structure ({evaluations: {metricKey: {score, reasoning}}}) is replaced with a simple {score: number, reasoning: string} object. This is easier for LLMs to produce correctly and matches the Python SDK (fix: Remove evaluation metric key from schema which failed on some LLMs python-server-sdk-ai#105).
  • Updated parsing in Judge.ts. _parseEvaluationResponse now reads score and reasoning directly from the top-level response data. The metric key is still sourced from the judge config's evaluationMetricKey and used to key the result — it just no longer appears in the schema or LLM response.

Test plan

  • All 144 existing tests pass (yarn workspace @launchdarkly/server-sdk-ai test)
  • Lint passes (yarn workspace @launchdarkly/server-sdk-ai lint)
  • Test mocks updated to use new flat response shape
  • _parseEvaluationResponse unit tests updated for simplified signature and data shape

🤖 Generated with Claude Code


Note

Medium Risk
Changes the structured response contract and parsing for judge evaluations; any callers/providers still emitting the old nested evaluations shape will now fail evaluation parsing.

Overview
Simplifies judge structured-output handling by switching the expected/provider schema from nested evaluations[metricKey]{score,reasoning} to a flat top-level {score, reasoning} object, and removes the dynamic EvaluationSchemaBuilder entirely.

Judge.evaluate now always invokes the provider with the static schema and parses score/reasoning directly; failures log a more specific "Could not parse evaluation response" warning. Tests are updated to use the new response shape and to assert the new warning behavior for missing/malformed responses.

Reviewed by Cursor Bugbot for commit 013a80d. Bugbot is set up for automated code reviews on this repo. Configure here.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 25623 bytes
Compressed size limit: 29000
Uncompressed size: 125843 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk size report
This is the brotli compressed size of the ESM build.
Compressed size: 31655 bytes
Compressed size limit: 34000
Uncompressed size: 112792 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/browser size report
This is the brotli compressed size of the ESM build.
Compressed size: 179375 bytes
Compressed size limit: 200000
Uncompressed size: 829982 bytes

@github-actions
Copy link
Copy Markdown
Contributor

@launchdarkly/js-client-sdk-common size report
This is the brotli compressed size of the ESM build.
Compressed size: 37169 bytes
Compressed size limit: 38000
Uncompressed size: 204305 bytes

jsonbailey and others added 2 commits April 16, 2026 16:07
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete EvaluationSchemaBuilder.ts and define EVALUATION_SCHEMA as a
module-level const in Judge.ts. Remove per-field warnings from
_parseEvaluationResponse (keep it pure) and emit a single warning in
evaluate() that includes the judge key and raw response data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey marked this pull request as ready for review April 16, 2026 21:55
@jsonbailey jsonbailey requested a review from a team as a code owner April 16, 2026 21:55
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d81b202. Configure here.

Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
jsonbailey and others added 2 commits April 17, 2026 09:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
configKey is already present in tracker.getTrackData().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@joker23 joker23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only nits

Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
Comment thread packages/sdk/server-ai/src/api/judge/Judge.ts Outdated
Address review nits: narrow EVALUATION_SCHEMA type with as const
instead of Record<string, unknown>, and add Array.isArray check
in _parseEvaluationResponse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsonbailey jsonbailey merged commit 524c99e into feat/ai-sdk-next-release Apr 17, 2026
44 checks passed
@jsonbailey jsonbailey deleted the jb/aic-2253/simplify-eval-schema branch April 17, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants