feat(tasks): PR complexity reviewer as an assess_brief specialization (rubric + reference executor)

Spun out of [#852](https://github.com/getlarge/themoltnet/issues/852).

## Insight

A "PR complexity reviewer" doesn't need a new task type. `assess_brief` already exists in `BUILT_IN_TASK_TYPES` with `output_kind: 'judgment'` and a rubric-driven scoring shape. The PR-review use case is just `assess_brief` instantiated with:

- `input.brief` — the PR diff / description / acceptance criteria
- `input.references[]` — pointers to the PR (URL, commit SHA, optional snapshot CID)
- `input.criteriaCid` — a published "PR-complexity rubric" CID
- An executor that wraps `pi-review` (or `gh pr diff` + LLM) against that rubric

This validates the runtime's genericity: same task type, new rubric, new executor — no schema changes. It's also the proof point #852 wanted before declaring the runtime ready for second-domain use.

## Deliverables

1. **Rubric authoring** (`packs` or a fresh `rubrics` directory):
   - Criteria: `cognitive_load`, `blast_radius`, `test_coverage_delta`, `security_surface_touched`, `style_compliance`, optionally `migration_safety`.
   - Each criterion: `{ id, weight, prompt, rangeMin, rangeMax }` matching the existing rubric schema used by `pack-fidelity-v2`.
   - Pin to a CID so reproducibility is content-addressed.

2. **Reference executor** in `libs/pi-extension` or a new `libs/pr-review-extension`:
   - Polls `/tasks?taskType=assess_brief&criteriaCid=<pr-complexity-rubric-cid>`.
   - Resolves the PR via `references[]` (GitHub API + agent token via the legreffier rule).
   - Runs `pi-review` (or equivalent) inside Gondolin against the diff.
   - Returns `output: { scores: [...], composite, rationale }` matching the existing judgment shape.

3. **e2e test**: against a known GitHub PR (or a fixture diff), assert the executor produces a judgment that matches expected scores within tolerance. Lives in `apps/rest-api/e2e/` or a new `apps/agent-daemon/e2e/`.

4. **Docs**: short section in `docs/agent-runtime.md` ("Second concrete task: PR review") explaining the assess_brief + custom-rubric + custom-executor pattern. Use it as the canonical "how to add a new domain" example.

## Why not a new task type

Considered and rejected: a `review_pr` task type. It would duplicate `assess_brief`'s entire schema (output_kind: judgment, rubric reference, scoring shape) just to attach a domain label. That fragments the runtime — every new domain (security review, design review, accessibility review, …) would want its own type. The right axis of variation is **rubric + executor**, not task type.

## Acceptance criteria

- [ ] PR-complexity rubric authored, CID-pinned, committed alongside existing rubrics.
- [ ] Reference executor wraps a real PR-review path (pi-review or equivalent) against `assess_brief`.
- [ ] e2e test validates one round trip end-to-end (claim → execute → complete with scores).
- [ ] Doc explains the pattern as a template for future domain-specific assessments.

## Related

- #836 PR-review sidecar eval (this is the runtime side; #836 is the eval-format side)
- #943 Proctored evals on the task runtime (this PR's executor will eventually be a proctored target)
- #947 pi-extension: honor reporter.cancelSignal (the executor needs to honor this when wrapping pi)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tasks): PR complexity reviewer as an assess_brief specialization (rubric + reference executor) #951

Insight

Deliverables

Why not a new task type

Acceptance criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat(tasks): PR complexity reviewer as an assess_brief specialization (rubric + reference executor) #951

Description

Insight

Deliverables

Why not a new task type

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions