feat: simplify evaluation schema to flat score/reasoning shape#1286
feat: simplify evaluation schema to flat score/reasoning shape#1286jsonbailey merged 6 commits intofeat/ai-sdk-next-releasefrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@launchdarkly/js-sdk-common size report |
|
@launchdarkly/js-client-sdk size report |
|
@launchdarkly/browser size report |
|
@launchdarkly/js-client-sdk-common size report |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete EvaluationSchemaBuilder.ts and define EVALUATION_SCHEMA as a module-level const in Judge.ts. Remove per-field warnings from _parseEvaluationResponse (keep it pure) and emit a single warning in evaluate() that includes the judge key and raw response data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d81b202. Configure here.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
configKey is already present in tracker.getTrackData(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review nits: narrow EVALUATION_SCHEMA type with as const instead of Record<string, unknown>, and add Array.isArray check in _parseEvaluationResponse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Summary
EvaluationSchemaBuilder.build()no longer takes anevaluationMetricKeyparameter. Since there is only ever a single evaluation metric key per judge config, it does not need to be embedded in the schema sent to the LLM.{score, reasoning}shape. The old nested structure ({evaluations: {metricKey: {score, reasoning}}}) is replaced with a simple{score: number, reasoning: string}object. This is easier for LLMs to produce correctly and matches the Python SDK (fix: Remove evaluation metric key from schema which failed on some LLMs python-server-sdk-ai#105).Judge.ts._parseEvaluationResponsenow readsscoreandreasoningdirectly from the top-level response data. The metric key is still sourced from the judge config'sevaluationMetricKeyand used to key the result — it just no longer appears in the schema or LLM response.Test plan
yarn workspace @launchdarkly/server-sdk-ai test)yarn workspace @launchdarkly/server-sdk-ai lint)_parseEvaluationResponseunit tests updated for simplified signature and data shape🤖 Generated with Claude Code
Note
Medium Risk
Changes the structured response contract and parsing for judge evaluations; any callers/providers still emitting the old nested
evaluationsshape will now fail evaluation parsing.Overview
Simplifies judge structured-output handling by switching the expected/provider schema from nested
evaluations[metricKey]{score,reasoning}to a flat top-level{score, reasoning}object, and removes the dynamicEvaluationSchemaBuilderentirely.Judge.evaluatenow always invokes the provider with the static schema and parsesscore/reasoningdirectly; failures log a more specific "Could not parse evaluation response" warning. Tests are updated to use the new response shape and to assert the new warning behavior for missing/malformed responses.Reviewed by Cursor Bugbot for commit 013a80d. Bugbot is set up for automated code reviews on this repo. Configure here.