Skip to content

feat(tasks): PR complexity reviewer as an assess_brief specialization (rubric + reference executor) #951

@legreffier

Description

@legreffier

Spun out of #852.

Insight

A "PR complexity reviewer" doesn't need a new task type. assess_brief already exists in BUILT_IN_TASK_TYPES with output_kind: 'judgment' and a rubric-driven scoring shape. The PR-review use case is just assess_brief instantiated with:

  • input.brief — the PR diff / description / acceptance criteria
  • input.references[] — pointers to the PR (URL, commit SHA, optional snapshot CID)
  • input.criteriaCid — a published "PR-complexity rubric" CID
  • An executor that wraps pi-review (or gh pr diff + LLM) against that rubric

This validates the runtime's genericity: same task type, new rubric, new executor — no schema changes. It's also the proof point #852 wanted before declaring the runtime ready for second-domain use.

Deliverables

  1. Rubric authoring (packs or a fresh rubrics directory):

    • Criteria: cognitive_load, blast_radius, test_coverage_delta, security_surface_touched, style_compliance, optionally migration_safety.
    • Each criterion: { id, weight, prompt, rangeMin, rangeMax } matching the existing rubric schema used by pack-fidelity-v2.
    • Pin to a CID so reproducibility is content-addressed.
  2. Reference executor in libs/pi-extension or a new libs/pr-review-extension:

    • Polls /tasks?taskType=assess_brief&criteriaCid=<pr-complexity-rubric-cid>.
    • Resolves the PR via references[] (GitHub API + agent token via the legreffier rule).
    • Runs pi-review (or equivalent) inside Gondolin against the diff.
    • Returns output: { scores: [...], composite, rationale } matching the existing judgment shape.
  3. e2e test: against a known GitHub PR (or a fixture diff), assert the executor produces a judgment that matches expected scores within tolerance. Lives in apps/rest-api/e2e/ or a new apps/agent-daemon/e2e/.

  4. Docs: short section in docs/agent-runtime.md ("Second concrete task: PR review") explaining the assess_brief + custom-rubric + custom-executor pattern. Use it as the canonical "how to add a new domain" example.

Why not a new task type

Considered and rejected: a review_pr task type. It would duplicate assess_brief's entire schema (output_kind: judgment, rubric reference, scoring shape) just to attach a domain label. That fragments the runtime — every new domain (security review, design review, accessibility review, …) would want its own type. The right axis of variation is rubric + executor, not task type.

Acceptance criteria

  • PR-complexity rubric authored, CID-pinned, committed alongside existing rubrics.
  • Reference executor wraps a real PR-review path (pi-review or equivalent) against assess_brief.
  • e2e test validates one round trip end-to-end (claim → execute → complete with scores).
  • Doc explains the pattern as a template for future domain-specific assessments.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions