test(evals): add claude-code agent runner by denolfe · Pull Request #16609 · payloadcms/payload

denolfe · 2026-05-13T19:25:22Z

Overview

Adds a Claude Code CLI agent runner to test/evals/ alongside the existing direct-LLM runner. Real agent invocations exercise the Payload skill the way users actually consume it: progressive disclosure of SKILL.md + reference/*.md from a sandboxed workdir, rather than the entire skill being injected into a system prompt.

Four eval lanes are now selectable via EVAL_VARIANT:

Variant	Runner	Skill
`skill` (default)	direct LLM	system prompt
`baseline`	direct LLM	none
`agent-claude-code`	Claude Code CLI	`.claude/skills/payload/` in workdir
`agent-claude-code-baseline`	Claude Code CLI	none

Key Changes

Dispatcher-based runner indirection (test/evals/runner/)
- runCodegenEval becomes a RunnerKind-keyed dispatcher over Record<RunnerKind, CodegenRunner>.
- Existing LLM body extracted to runner/llm.ts behind the new CodegenRunner type.
- New runner/claudeCode.ts wraps the claude CLI with lazy init, p-limit concurrency, a process-group-killable timeout, and 'error'/'exit' resolution guards so spawn failures (missing binary, auth, hang) surface as actionable errors instead of test-worker timeouts.
Sandboxed workdir per case (test/evals/runner/workdir.ts)
- Each case gets a fresh os.tmpdir()/payload-eval-*/ with git init (fixed local identity) and an embedded skill tree copied verbatim from tools/claude-plugin/skills/payload/.
- Defensive asserts refuse to run if the workdir escapes os.tmpdir() or lands under $HOME.
- getSkillTreeHash walks the source tree (sorted) so skill content changes invalidate cached results.
Sandboxed claude invocation
- Agent spawns with CLAUDE_CONFIG_DIR overridden to a per-process empty sandbox dir, blocking the developer's global CLAUDE.md, installed skills, settings, and hooks from contaminating the eval.
- Auth probe at first agent-kind invocation, with a credentials-file fallback for ~/.claude/.credentials.json setups. Authentication failures surface the CLI's actual stderr/stdout instead of a generic message.
Cache + result type extensions (test/evals/cache.ts, test/evals/types.ts)
- codegenKey keyed on runnerKind, modelId (which encodes agentModel/version for agent runs), skillInstall, and a conditional skill-tree hash for runs that depend on skill content.
- EvalResult gains required runnerKind plus optional skillInstall, agentLog (truncated), agentExitCode.
- loadSkillContext stays LLM-only; agents see the live filesystem tree.
Variant taxonomy + dashboard surfacing (test/evals/variant.ts, dashboard components)
- Shared getVariant(result) classifies cache entries into one of four lanes, with explicit fallback for unknown RunnerKind values.
- Variant widened to four values: agent-baseline, agent-skill, baseline, skill.
- CompareTable buckets agent rows into the existing skill/baseline columns (badge distinguishes lane in list view).
Scripts + docs (package.json, test/evals/README.md)
- 14 new scripts mirroring the existing :baseline pattern: test:eval:agent, test:eval:agent:baseline, and per-suite variants.
- README documents the four variants, required env vars (OPENAI_API_KEY for scorer, ANTHROPIC_API_KEY for agent), and optional knobs (EVAL_AGENT_MODEL, EVAL_AGENT_CONCURRENCY, EVAL_KEEP_WORKDIR, EVAL_NO_CACHE).

Design Decisions

Sandbox via CLAUDE_CONFIG_DIR, not --bare. The --bare flag would force ANTHROPIC_API_KEY-only auth and skip keychain. CLAUDE_CONFIG_DIR redirection is more invasive (it breaks macOS keychain auth, forcing API-key use for agent lanes) but produces a cleaner sandbox: no user skills, no global CLAUDE.md, no plugin marketplace, no hooks. The trade-off is documented; agent runs require ANTHROPIC_API_KEY set in the shell.

Single-file readback, multi-file deferred. The MVP runner enforces "modify only payload.config.ts" via a prompt suffix and reads back only that file. Multi-file agent edits (e.g. extracting a Collection into its own file) would require validators and the scorer to operate on a tree rather than a string, which is a separate phase.

LLM scorer kept for agent runs. Agent invocations produce a real config diff that still needs grading. Building build-success scoring would require a Payload-specific oracle the project doesn't have, so the existing scoreConfigChange is reused. Consequence: both OPENAI_API_KEY and ANTHROPIC_API_KEY are needed for agent variants.

Per-process concurrency cap. Agent runs are heavy (~30–120s, external process). pLimit(EVAL_AGENT_CONCURRENCY ?? 2) at module scope prevents the suite from forking dozens of claude processes. Vitest's eval project has fileParallelism: false, so the module-level limiter is process-wide.

Verbatim skill install over concatenation. The LLM runner injects a concatenated SKILL.md + reference/*.md blob via system prompt because the model has no tool access. The agent runner instead copies the skill directory tree verbatim into workdir/.claude/skills/payload/, letting the agent discover and read reference files through its own Read tool. The cache key uses a separate getSkillTreeHash (not the LLM concatenation) so both runners invalidate on any skill change.

runnerKind required on EvalResult. Optional discriminants made downstream code coerce via ?? 'llm' at every read. Making it required tightens the new-cache contract; read sites that consume legacy entries keep the default-coercion for backward compatibility.

agentModel/agentVersion not separate cache-key fields. modelId for agent runs is claude-code/<agentModel>/<version>, so version and model changes invalidate via modelId alone. Adding them separately would create silent divergence risk.

Overall Flow

sequenceDiagram
    participant Spec as eval.*.spec.ts
    participant Variant as variantOptions.ts
    participant Case as runCodegenCase
    participant Dispatch as runCodegenEval
    participant Agent as claudeCodeRunner
    participant Workdir as workdir.ts
    participant CLI as claude (CLI)

    Spec->>Variant: resolveVariantOptions()
    Variant-->>Spec: { kind, skillInstall, agentModel, ... }
    Spec->>Case: runCodegenCase(testCase, label, opts)
    Case->>Case: codegenKey({ runnerKind, modelId, skillInstall, ... })
    Case->>Dispatch: runCodegenEval(instruction, starter, opts)

    alt kind === 'llm'
        Dispatch->>Dispatch: llmRunner.run (unchanged)
    else kind === 'claude-code'
        Dispatch->>Agent: claudeCodeRunner.run
        Agent->>Agent: ensureInit (lazy, memoized)
        Note over Agent: First call: create sandbox CLAUDE_CONFIG_DIR,<br/>capture version, auth-probe (creds-copy fallback)
        Agent->>Workdir: materialize → gitInit → installSkill
        Agent->>CLI: spawn(claude --print --model <m> --dangerously-skip-permissions)
        Note over CLI: env: { CLAUDE_CONFIG_DIR=<sandbox> }<br/>cwd: <workdir>
        CLI-->>Agent: stdout/stderr + exit code
        Agent->>Workdir: readEntry(workdir)
        Agent->>Workdir: cleanup(workdir)
        Agent-->>Dispatch: { modifiedConfig, agentLog, agentExitCode }
    end

    Dispatch-->>Case: CodegenRunnerResult
    Case->>Case: validateConfigTypes (tsc) → evaluateAssertions → scoreConfigChange (OpenAI)
    Case-->>Spec: EvalResult (cached for next run)

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1214639058034049

…a import.meta

… runnerKind on EvalResult

- claudeCode: surface SIGKILL errors in agent log instead of silently swallowing - variant: explicit fallback prevents future RunnerKind values from silently bucketing as skill - CompareTable: drop truthy-modelId-always-replaces bug; first-write-wins per lane - cache: drop redundant agentModel/agentVersion params; modelId already encodes them - cache.spec: real skillHash invalidation tests via vi.spyOn mock - globalSetup: import RunnerKind/SkillInstallMode instead of inline literal unions - claudeCode: drop fossil path comment

github-actions · 2026-05-13T19:35:12Z

📦 esbuild Bundle Analysis for payload

This analysis was generated by esbuild-bundle-analyzer. 🤖

Meta File	Out File	Size (raw)	Note
packages/next/meta_index.json	esbuild/index.js	990.17 KB	✅ No change
packages/payload/meta_index.json	esbuild/index.js	1.41 MB	✅ No change
packages/payload/meta_shared.json	esbuild/exports/shared.js	192.60 KB	✅ No change
packages/richtext-lexical/meta_client.json	esbuild/exports/client_optimized/index.js	304.18 KB	✅ No change
packages/ui/meta_client.json	esbuild/exports/client_optimized/index.js	1.24 MB	✅ -54 B (-0.0%)
packages/ui/meta_shared.json	esbuild/exports/shared_optimized/index.js	16.11 KB	✅ No change

Largest paths

These visualization shows top 20 largest paths in the bundle.

Meta file: packages/next/meta_index.json, Out file: esbuild/index.js

Path	Size
../../node_modules	${{\color{Goldenrod}{ ████████████████████▍ }}}$ 81.9%, 807.52 KB
dist/views/Version	${{\color{Goldenrod}{ █▎ }}}$ 5.2%, 51.49 KB
dist/views/Dashboard	${{\color{Goldenrod}{ ▌ }}}$ 2.2%, 21.38 KB
dist/views/Document	${{\color{Goldenrod}{ ▍ }}}$ 1.7%, 16.66 KB
dist/views/List	${{\color{Goldenrod}{ ▍ }}}$ 1.6%, 15.35 KB
dist/elements/Nav	${{\color{Goldenrod}{ ▎ }}}$ 1.1%, 10.86 KB
dist/views/Root	${{\color{Goldenrod}{ ▎ }}}$ 1.0%, 9.90 KB
dist/views/Versions	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 6.17 KB
dist/views/API	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 6.13 KB
dist/views/Account	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 6.06 KB
dist/elements/DocumentHeader	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 4.71 KB
dist/views/Login	${{\color{Goldenrod}{ }}}$ 0.4%, 4.40 KB
dist/layouts/Root	${{\color{Goldenrod}{ }}}$ 0.4%, 3.53 KB
dist/views/ForgotPassword	${{\color{Goldenrod}{ }}}$ 0.3%, 3.13 KB
dist/views/CreateFirstUser	${{\color{Goldenrod}{ }}}$ 0.3%, 2.81 KB
dist/templates/Default	${{\color{Goldenrod}{ }}}$ 0.3%, 2.64 KB
dist/views/ResetPassword	${{\color{Goldenrod}{ }}}$ 0.2%, 2.40 KB
dist/views/Logout	${{\color{Goldenrod}{ }}}$ 0.2%, 1.94 KB
dist/views/Verify	${{\color{Goldenrod}{ }}}$ 0.1%, 1.29 KB
dist/views/NotFound	${{\color{Goldenrod}{ }}}$ 0.1%, 1.21 KB
(other)	${{\color{Goldenrod}{ ████▌ }}}$ 18.1%, 177.97 KB

Meta file: packages/payload/meta_index.json, Out file: esbuild/index.js

Path	Size
../../node_modules	${{\color{Goldenrod}{ █████████████████ }}}$ 68.4%, 959.38 KB
dist/fields/hooks	${{\color{Goldenrod}{ ▊ }}}$ 3.1%, 44.07 KB
dist/collections/operations	${{\color{Goldenrod}{ ▋ }}}$ 2.9%, 40.23 KB
dist/versions/migrations	${{\color{Goldenrod}{ ▎ }}}$ 1.3%, 18.50 KB
dist/auth/operations	${{\color{Goldenrod}{ ▎ }}}$ 1.1%, 15.63 KB
dist/fields/config	${{\color{Goldenrod}{ ▎ }}}$ 1.0%, 13.85 KB
dist/globals/operations	${{\color{Goldenrod}{ ▎ }}}$ 1.0%, 13.40 KB
dist/utilities/configToJSONSchema.js	${{\color{Goldenrod}{ ▏ }}}$ 0.9%, 13.13 KB
dist/queues/operations	${{\color{Goldenrod}{ ▏ }}}$ 0.9%, 12.63 KB
dist/fields/validations.js	${{\color{Goldenrod}{ ▏ }}}$ 0.8%, 10.57 KB
dist/collections/config	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 9.53 KB
dist/bin/generateImportMap	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 9.44 KB
dist/config/orderable	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.92 KB
dist/uploads/fetchAPI-multipart	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.80 KB
dist/index.js	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.77 KB
dist/hierarchy/utils	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 7.65 KB
dist/database/migrations	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 7.54 KB
dist/collections/endpoints	${{\color{Goldenrod}{ }}}$ 0.4%, 6.23 KB
dist/auth/strategies	${{\color{Goldenrod}{ }}}$ 0.4%, 5.50 KB
dist/config/sanitize.js	${{\color{Goldenrod}{ }}}$ 0.4%, 5.39 KB
(other)	${{\color{Goldenrod}{ ███████▉ }}}$ 31.6%, 444.02 KB

Meta file: packages/payload/meta_shared.json, Out file: esbuild/exports/shared.js

Path	Size
../../node_modules	${{\color{Goldenrod}{ ███████████████████▉ }}}$ 79.5%, 150.12 KB
dist/fields/validations.js	${{\color{Goldenrod}{ █▍ }}}$ 5.6%, 10.57 KB
dist/config/orderable	${{\color{Goldenrod}{ ▍ }}}$ 1.7%, 3.13 KB
dist/fields/baseFields	${{\color{Goldenrod}{ ▍ }}}$ 1.5%, 2.79 KB
dist/utilities/deepCopyObject.js	${{\color{Goldenrod}{ ▎ }}}$ 1.3%, 2.54 KB
dist/auth/cookies.js	${{\color{Goldenrod}{ ▏ }}}$ 0.8%, 1.55 KB
dist/utilities/flattenTopLevelFields.js	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 1.42 KB
dist/fields/config	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 1.37 KB
dist/utilities/getVersionsConfig.js	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 1.04 KB
dist/utilities/flattenAllFields.js	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 943 B
dist/utilities/unflatten.js	${{\color{Goldenrod}{ }}}$ 0.4%, 779 B
dist/utilities/sanitizeUserDataForEmail.js	${{\color{Goldenrod}{ }}}$ 0.4%, 713 B
dist/utilities/getFieldPermissions.js	${{\color{Goldenrod}{ }}}$ 0.3%, 651 B
dist/collections/config	${{\color{Goldenrod}{ }}}$ 0.3%, 570 B
dist/bin/generateImportMap	${{\color{Goldenrod}{ }}}$ 0.3%, 561 B
dist/auth/sessions.js	${{\color{Goldenrod}{ }}}$ 0.3%, 525 B
dist/fields/getFieldPaths.js	${{\color{Goldenrod}{ }}}$ 0.3%, 485 B
dist/utilities/appendDateTimezoneSelectFields.js	${{\color{Goldenrod}{ }}}$ 0.2%, 451 B
dist/utilities/getSafeRedirect.js	${{\color{Goldenrod}{ }}}$ 0.2%, 423 B
dist/utilities/deepMerge.js	${{\color{Goldenrod}{ }}}$ 0.2%, 413 B
(other)	${{\color{Goldenrod}{ █████▏ }}}$ 20.5%, 38.70 KB

Meta file: packages/richtext-lexical/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Path	Size
dist/features/blocks	${{\color{Goldenrod}{ ███ }}}$ 12.4%, 37.38 KB
dist/lexical/ui	${{\color{Goldenrod}{ ██▊ }}}$ 11.3%, 34.16 KB
dist/lexical/plugins	${{\color{Goldenrod}{ ██▋ }}}$ 10.9%, 32.88 KB
dist/features/experimental_table	${{\color{Goldenrod}{ ██▎ }}}$ 9.0%, 27.16 KB
dist/packages/@lexical	${{\color{Goldenrod}{ █▌ }}}$ 6.3%, 18.99 KB
dist/features/link	${{\color{Goldenrod}{ █▌ }}}$ 6.2%, 18.81 KB
dist/features/toolbars	${{\color{Goldenrod}{ █▍ }}}$ 5.5%, 16.59 KB
dist/features/upload	${{\color{Goldenrod}{ █▏ }}}$ 4.7%, 14.11 KB
dist/features/textState	${{\color{Goldenrod}{ ▉ }}}$ 3.7%, 11.08 KB
dist/features/relationship	${{\color{Goldenrod}{ ▊ }}}$ 3.1%, 9.40 KB
dist/lexical/utils	${{\color{Goldenrod}{ ▋ }}}$ 2.9%, 8.79 KB
dist/features/converters	${{\color{Goldenrod}{ ▋ }}}$ 2.8%, 8.36 KB
dist/features/debug	${{\color{Goldenrod}{ ▋ }}}$ 2.5%, 7.40 KB
dist/utilities/fieldsDrawer	${{\color{Goldenrod}{ ▌ }}}$ 2.4%, 7.29 KB
dist/lexical/config	${{\color{Goldenrod}{ ▍ }}}$ 1.7%, 5.08 KB
dist/features/lists	${{\color{Goldenrod}{ ▍ }}}$ 1.7%, 5.00 KB
dist/features/format	${{\color{Goldenrod}{ ▎ }}}$ 1.2%, 3.46 KB
dist/lexical/LexicalEditor.js	${{\color{Goldenrod}{ ▎ }}}$ 1.1%, 3.23 KB
dist/features/horizontalRule	${{\color{Goldenrod}{ ▎ }}}$ 1.1%, 3.18 KB
dist/field/Field.js	${{\color{Goldenrod}{ ▏ }}}$ 0.9%, 2.84 KB
(other)	${{\color{Goldenrod}{ █████████████████████▉ }}}$ 87.6%, 263.57 KB

Meta file: packages/ui/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Path	Size
../../node_modules	${{\color{Goldenrod}{ ███████████▊ }}}$ 47.1%, 579.26 KB
dist/elements/Hierarchy	${{\color{Goldenrod}{ ▉ }}}$ 3.6%, 44.30 KB
dist/elements/BulkUpload	${{\color{Goldenrod}{ ▌ }}}$ 2.3%, 28.33 KB
dist/views/HierarchyList	${{\color{Goldenrod}{ ▍ }}}$ 1.5%, 18.79 KB
dist/elements/Table	${{\color{Goldenrod}{ ▍ }}}$ 1.5%, 18.25 KB
dist/views/Edit	${{\color{Goldenrod}{ ▎ }}}$ 1.4%, 17.38 KB
dist/elements/WhereBuilder	${{\color{Goldenrod}{ ▎ }}}$ 1.4%, 17.36 KB
dist/forms/Form	${{\color{Goldenrod}{ ▎ }}}$ 1.3%, 15.92 KB
dist/fields/Relationship	${{\color{Goldenrod}{ ▎ }}}$ 1.3%, 15.83 KB
dist/fields/Blocks	${{\color{Goldenrod}{ ▎ }}}$ 1.2%, 15.13 KB
dist/fields/Upload	${{\color{Goldenrod}{ ▎ }}}$ 1.2%, 14.44 KB
dist/elements/QueryPresets	${{\color{Goldenrod}{ ▏ }}}$ 0.8%, 10.36 KB
dist/elements/PublishButton	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 9.07 KB
dist/elements/HTMLDiff	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 8.38 KB
dist/views/List	${{\color{Goldenrod}{ ▏ }}}$ 0.7%, 8.03 KB
dist/fields/Array	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.77 KB
dist/elements/ReactSelect	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.74 KB
dist/elements/LivePreview	${{\color{Goldenrod}{ ▏ }}}$ 0.6%, 7.04 KB
dist/elements/Upload	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 6.67 KB
dist/elements/RelationshipTable	${{\color{Goldenrod}{ ▏ }}}$ 0.5%, 6.22 KB
(other)	${{\color{Goldenrod}{ █████████████▏ }}}$ 52.9%, 649.37 KB

Meta file: packages/ui/meta_shared.json, Out file: esbuild/exports/shared_optimized/index.js

Path	Size
dist/graphics/Logo	${{\color{Goldenrod}{ █████ }}}$ 20.2%, 3.12 KB
../../node_modules	${{\color{Goldenrod}{ ████▎ }}}$ 17.1%, 2.65 KB
dist/graphics/Icon	${{\color{Goldenrod}{ ██▍ }}}$ 9.8%, 1.52 KB
dist/utilities/formatDocTitle	${{\color{Goldenrod}{ ██▏ }}}$ 8.6%, 1.32 KB
dist/providers/TableColumns	${{\color{Goldenrod}{ █▍ }}}$ 5.6%, 866 B
dist/utilities/getGlobalData.js	${{\color{Goldenrod}{ █▏ }}}$ 4.9%, 762 B
dist/utilities/api.js	${{\color{Goldenrod}{ █▏ }}}$ 4.9%, 756 B
dist/utilities/groupNavItems.js	${{\color{Goldenrod}{ █▏ }}}$ 4.7%, 734 B
dist/elements/Translation	${{\color{Goldenrod}{ ▊ }}}$ 3.2%, 493 B
dist/utilities/handleTakeOver.js	${{\color{Goldenrod}{ ▋ }}}$ 2.8%, 440 B
dist/utilities/traverseForLocalizedFields.js	${{\color{Goldenrod}{ ▋ }}}$ 2.6%, 399 B
dist/elements/withMergedProps	${{\color{Goldenrod}{ ▌ }}}$ 2.2%, 339 B
dist/utilities/getNavGroups.js	${{\color{Goldenrod}{ ▌ }}}$ 2.2%, 338 B
dist/utilities/getVisibleEntities.js	${{\color{Goldenrod}{ ▌ }}}$ 2.1%, 329 B
dist/elements/WithServerSideProps	${{\color{Goldenrod}{ ▍ }}}$ 1.5%, 232 B
dist/utilities/handleGoBack.js	${{\color{Goldenrod}{ ▎ }}}$ 1.2%, 180 B
dist/fields/mergeFieldStyles.js	${{\color{Goldenrod}{ ▎ }}}$ 1.0%, 159 B
dist/utilities/handleBackToDashboard.js	${{\color{Goldenrod}{ ▎ }}}$ 1.0%, 152 B
dist/forms/Form	${{\color{Goldenrod}{ ▏ }}}$ 0.9%, 147 B
dist/utilities/abortAndIgnore.js	${{\color{Goldenrod}{ ▏ }}}$ 0.9%, 146 B
(other)	${{\color{Goldenrod}{ ███████████████████▉ }}}$ 79.8%, 12.36 KB

Details

Next to the size is how much the size has increased or decreased compared with the base branch of this PR.

‼️: Size increased by 20% or more. Special attention should be given to this.
⚠️: Size increased in acceptable range (lower than 20%).
✅: No change or even downsized.
🗑️: The out file is deleted: not found in base branch.
🆕: The out file is newly found: will be added to base branch.

denolfe added 17 commits May 6, 2026 15:13

test(evals): refactor runCodegenEval into a dispatcher

a9c7fec

test(evals): add workdir helpers for agent runner

6cf7fd1

test(evals): extend types for agent runner

78dc101

test(evals): extend codegen cache key for agent runs

a85b16b

feat(evals): add claude-code agent runner

9de9b56

feat(evals): wire claude-code variant through runCodegenCase

262babf

chore: add test:eval:agent scripts

48fab14

feat(evals): show runner kind + skill install in dashboard

4eb956c

docs(evals): document agent variants and env knobs

4dab218

chore(evals): remove dead cache backfill block

353d4bd

fix(evals): teardown snapshot supports agent variants

52a4e18

fix(evals): surface claude stderr/stdout when auth probe fails

29567a4

fix(evals): handle spawn errors, retry init, resolve workdir paths vi…

0de5c18

…a import.meta

docs(evals): clarify required env vars, runner options JSDoc, require…

b6f27e4

… runnerKind on EvalResult

test(evals): extract getVariant + add cache key tests

f5b8a45

chore: cleanup

47fb152

github-actions Bot added the created-by: Payload team label May 13, 2026

denolfe changed the title ~~feat(evals): add claude-code agent runner~~ test(evals): add claude-code agent runner May 13, 2026

denolfe marked this pull request as ready for review May 14, 2026 17:44

denolfe merged commit 834e80c into main May 14, 2026
168 of 171 checks passed

denolfe deleted the ai/evals-agent-runners branch May 14, 2026 18:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(evals): add claude-code agent runner#16609

test(evals): add claude-code agent runner#16609
denolfe merged 17 commits into
mainfrom
ai/evals-agent-runners

denolfe commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Meta file: packages/next/meta_index.json, Out file: esbuild/index.js

Meta file: packages/payload/meta_index.json, Out file: esbuild/index.js

Meta file: packages/payload/meta_shared.json, Out file: esbuild/exports/shared.js

Meta file: packages/richtext-lexical/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Meta file: packages/ui/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Meta file: packages/ui/meta_shared.json, Out file: esbuild/exports/shared_optimized/index.js

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

denolfe commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Changes

Design Decisions

Overall Flow

Uh oh!

github-actions Bot commented May 13, 2026

📦 esbuild Bundle Analysis for payload

Meta file: packages/next/meta_index.json, Out file: esbuild/index.js

Meta file: packages/payload/meta_index.json, Out file: esbuild/index.js

Meta file: packages/payload/meta_shared.json, Out file: esbuild/exports/shared.js

Meta file: packages/richtext-lexical/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Meta file: packages/ui/meta_client.json, Out file: esbuild/exports/client_optimized/index.js

Meta file: packages/ui/meta_shared.json, Out file: esbuild/exports/shared_optimized/index.js

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

denolfe commented May 13, 2026 •

edited

Loading