You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trims test/evals/datasets/ so the eval suites measure knowledge a developer applies while building an application with Payload, not knowledge a Payload-monorepo contributor needs. Also adds shorthand npm scripts for running individual eval suites.
Key Changes
Trimmed conventions/qa.ts from 10 cases to 1
Dropped 9 cases lifted from CLAUDE.md (types vs interfaces, boolean naming, function vs class, translation paths, afterEach cleanup, conventional-commits, dev-server flags, auto-login creds, single-object-parameter convention).
Kept the payload.logger.error shape case, the only one that describes a call shape a Payload consumer writes in their own code.
Removed plugins/official/qa.ts
11 reference-doc QA cases ("what does plugin X do") testing recall, not application. The borderline MCP-config case is already covered by plugins/official/codegen.ts via real code generation.
eval.official-plugins.spec.ts updated to drop the QA registration; codegen registration unchanged.
Corrected the audience map in EvalDashboard/audience.ts
negative retagged from maintainers to users. Six of seven retained negative cases are dev-facing (debugging your own broken config); the map can't split sub-arrays, so users is the better representative tag.
Removed three category keys (commits, structure, testing) that no longer appear in any dataset after the conventions trim.
Added test:eval:<suite> shorthand scripts
One per suite (building-plugins, collections, config, conventions, fields, graphql, local-api, negative, official-plugins, rest-api). Each delegates to the :skill variant, matching the project-wide default.
Design Decisions
The dividing line is "would a developer consuming payload from npm encounter this?" If no, the case is contributor-only and removed.
Three pre-existing categories were intentionally kept in scope but untouched:
negative/codegen.tsnegativeInvalidInstructionDataset is an eval-pipeline self-test (it verifies tsc rejects bad types) and is preserved as-is.
plugins/qa.ts and plugins/codegen.ts stay because developers may colocate plugins inside their own project structure.
Other dead audience-map keys ('access-control', admin, 'building-plugins', conventions, hooks, 'official-plugins', translations) were dead before this audit and were left to keep the diff focused.
conventions/qa.ts and eval.conventions.spec.ts are kept rather than deleted so the surviving coding-category case still runs as a registered suite.
To see the specific tasks where the Asana app for GitHub is being used, see below:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Trims
test/evals/datasets/so the eval suites measure knowledge a developer applies while building an application with Payload, not knowledge a Payload-monorepo contributor needs. Also adds shorthand npm scripts for running individual eval suites.Key Changes
Trimmed
conventions/qa.tsfrom 10 cases to 1CLAUDE.md(types vs interfaces, boolean naming, function vs class, translation paths,afterEachcleanup, conventional-commits, dev-server flags, auto-login creds, single-object-parameter convention).payload.logger.errorshape case, the only one that describes a call shape a Payload consumer writes in their own code.Removed
plugins/official/qa.tsplugins/official/codegen.tsvia real code generation.eval.official-plugins.spec.tsupdated to drop the QA registration; codegen registration unchanged.Corrected the audience map in
EvalDashboard/audience.tsnegativeretagged frommaintainerstousers. Six of seven retainednegativecases are dev-facing (debugging your own broken config); the map can't split sub-arrays, sousersis the better representative tag.commits,structure,testing) that no longer appear in any dataset after the conventions trim.Added
test:eval:<suite>shorthand scriptsbuilding-plugins,collections,config,conventions,fields,graphql,local-api,negative,official-plugins,rest-api). Each delegates to the:skillvariant, matching the project-wide default.Design Decisions
The dividing line is "would a developer consuming
payloadfrom npm encounter this?" If no, the case is contributor-only and removed.Three pre-existing categories were intentionally kept in scope but untouched:
negative/codegen.tsnegativeInvalidInstructionDatasetis an eval-pipeline self-test (it verifiestscrejects bad types) and is preserved as-is.plugins/qa.tsandplugins/codegen.tsstay because developers may colocate plugins inside their own project structure.'access-control',admin,'building-plugins',conventions,hooks,'official-plugins',translations) were dead before this audit and were left to keep the diff focused.conventions/qa.tsandeval.conventions.spec.tsare kept rather than deleted so the survivingcoding-category case still runs as a registered suite.