Spun out of #852.
Insight
A "PR complexity reviewer" doesn't need a new task type. assess_brief already exists in BUILT_IN_TASK_TYPES with output_kind: 'judgment' and a rubric-driven scoring shape. The PR-review use case is just assess_brief instantiated with:
input.brief — the PR diff / description / acceptance criteria
input.references[] — pointers to the PR (URL, commit SHA, optional snapshot CID)
input.criteriaCid — a published "PR-complexity rubric" CID
- An executor that wraps
pi-review (or gh pr diff + LLM) against that rubric
This validates the runtime's genericity: same task type, new rubric, new executor — no schema changes. It's also the proof point #852 wanted before declaring the runtime ready for second-domain use.
Deliverables
-
Rubric authoring (packs or a fresh rubrics directory):
- Criteria:
cognitive_load, blast_radius, test_coverage_delta, security_surface_touched, style_compliance, optionally migration_safety.
- Each criterion:
{ id, weight, prompt, rangeMin, rangeMax } matching the existing rubric schema used by pack-fidelity-v2.
- Pin to a CID so reproducibility is content-addressed.
-
Reference executor in libs/pi-extension or a new libs/pr-review-extension:
- Polls
/tasks?taskType=assess_brief&criteriaCid=<pr-complexity-rubric-cid>.
- Resolves the PR via
references[] (GitHub API + agent token via the legreffier rule).
- Runs
pi-review (or equivalent) inside Gondolin against the diff.
- Returns
output: { scores: [...], composite, rationale } matching the existing judgment shape.
-
e2e test: against a known GitHub PR (or a fixture diff), assert the executor produces a judgment that matches expected scores within tolerance. Lives in apps/rest-api/e2e/ or a new apps/agent-daemon/e2e/.
-
Docs: short section in docs/agent-runtime.md ("Second concrete task: PR review") explaining the assess_brief + custom-rubric + custom-executor pattern. Use it as the canonical "how to add a new domain" example.
Why not a new task type
Considered and rejected: a review_pr task type. It would duplicate assess_brief's entire schema (output_kind: judgment, rubric reference, scoring shape) just to attach a domain label. That fragments the runtime — every new domain (security review, design review, accessibility review, …) would want its own type. The right axis of variation is rubric + executor, not task type.
Acceptance criteria
Related
Spun out of #852.
Insight
A "PR complexity reviewer" doesn't need a new task type.
assess_briefalready exists inBUILT_IN_TASK_TYPESwithoutput_kind: 'judgment'and a rubric-driven scoring shape. The PR-review use case is justassess_briefinstantiated with:input.brief— the PR diff / description / acceptance criteriainput.references[]— pointers to the PR (URL, commit SHA, optional snapshot CID)input.criteriaCid— a published "PR-complexity rubric" CIDpi-review(orgh pr diff+ LLM) against that rubricThis validates the runtime's genericity: same task type, new rubric, new executor — no schema changes. It's also the proof point #852 wanted before declaring the runtime ready for second-domain use.
Deliverables
Rubric authoring (
packsor a freshrubricsdirectory):cognitive_load,blast_radius,test_coverage_delta,security_surface_touched,style_compliance, optionallymigration_safety.{ id, weight, prompt, rangeMin, rangeMax }matching the existing rubric schema used bypack-fidelity-v2.Reference executor in
libs/pi-extensionor a newlibs/pr-review-extension:/tasks?taskType=assess_brief&criteriaCid=<pr-complexity-rubric-cid>.references[](GitHub API + agent token via the legreffier rule).pi-review(or equivalent) inside Gondolin against the diff.output: { scores: [...], composite, rationale }matching the existing judgment shape.e2e test: against a known GitHub PR (or a fixture diff), assert the executor produces a judgment that matches expected scores within tolerance. Lives in
apps/rest-api/e2e/or a newapps/agent-daemon/e2e/.Docs: short section in
docs/agent-runtime.md("Second concrete task: PR review") explaining the assess_brief + custom-rubric + custom-executor pattern. Use it as the canonical "how to add a new domain" example.Why not a new task type
Considered and rejected: a
review_prtask type. It would duplicateassess_brief's entire schema (output_kind: judgment, rubric reference, scoring shape) just to attach a domain label. That fragments the runtime — every new domain (security review, design review, accessibility review, …) would want its own type. The right axis of variation is rubric + executor, not task type.Acceptance criteria
assess_brief.Related