Skip to content

chore(pipelines): raise judgment-step model from cheapest to balanced#1609

Merged
nextlevelshit merged 1 commit into
mainfrom
chore/pipeline-model-floor
Apr 30, 2026
Merged

chore(pipelines): raise judgment-step model from cheapest to balanced#1609
nextlevelshit merged 1 commit into
mainfrom
chore/pipeline-model-floor

Conversation

@nextlevelshit
Copy link
Copy Markdown
Collaborator

Summary

Real-world signal from Epic #1565 Phase 1 dispatch: cheapest tier (Haiku) produced shallow outputs on judgment-shaped work. Most visible cases:

Both rooted in cheapest-tier model lacking the nuance to refactor / scope / reason. Pipeline yamls had model: cheapest baked into many judgment steps as a cost optimization that no longer holds when output quality breaks downstream.

Change

Promoted to balanced on judgment steps; kept cheapest on summary/distill/format steps.

Pipeline Step (was cheapest) New model Why
impl-issue fetch-assess balanced issue assessment is judgment
impl-issue agent_review balanced review is judgment
impl-issue create-pr (commenter) cheapest (kept) PR body formatting = summary
impl-issue-core fetch-assess balanced same as impl-issue
audit-tests scan balanced finding extraction = judgment
audit-tests report (summarizer) cheapest (kept) distill scan into report
audit-tests agent_review balanced review is judgment
audit-architecture scan + agent_review balanced same
audit-security agent_review balanced review is judgment
audit-security report (summarizer) cheapest (kept) distill
ops-bootstrap commit (craftsman) balanced scaffolding code = judgment
ops-pr-review diff-analysis balanced analysis = judgment
ops-pr-review security llm_judge balanced judgment
ops-pr-review quality agent_review balanced judgment
plan-research analyze-topics + research-topics balanced research = judgment
plan-research fetch-issue + post-comment cheapest (kept) data fetch + format

Other change

Added golangci-lint to flake.nix devShell so the same forbidigo gate that broke CI on PR #1585 runs locally without manual install. CI ships v2.10; nixpkgs-unstable ships v2.8.0 — minor skew, acceptable.

Test plan

  • go build ./... clean
  • go test ./internal/defaults/... passes (embedded-fs parity tests)
  • nix develop -c which golangci-lint resolves
  • Next dispatch (Phase 3 issues) uses these defaults — will validate via real run

Related

Cheapest (Haiku) tier produced shallow / lazy outputs on judgment-shaped
work — most visibly the test deletion in #1582 and forbidigo-panic
patterns in #1585. Real-world signal from Epic #1565 Phase 1 dispatch:
Haiku is fine for summary/distill/format steps but not for any persona
that scans, judges, plans, implements, or reviews.

Promoted to balanced on these steps:
- impl-issue: fetch-assess + agent_review (create-pr commenter stays cheap)
- impl-issue-core: fetch-assess
- audit-tests: scan + agent_review (summarizer report stays cheap)
- audit-architecture: scan + agent_review (summarizer report stays cheap)
- audit-security: agent_review (summarizer report stays cheap)
- ops-bootstrap: commit (craftsman writes scaffolding)
- ops-pr-review: diff-analysis + security llm_judge + quality agent_review
- plan-research: analyze-topics + research-topics (fetch/post-comment stay cheap)

Kept cheapest where the persona is summary-shaped:
- summarizer
- forge.type-commenter (PR/comment formatting)
- forge.type-analyst on fetch-only steps

Also adds golangci-lint to flake.nix devShell so the same lint that gates
CI runs locally without manual install (CI uses v2.10, nixpkgs ships v2.8 —
minor skew, acceptable for now).
@nextlevelshit nextlevelshit merged commit fef4e96 into main Apr 30, 2026
10 checks passed
@nextlevelshit nextlevelshit deleted the chore/pipeline-model-floor branch April 30, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant