Skip to content

Progressive onboarding, telemetry backend, sprint contracts, scored grading, bug fixes#360

Merged
justrach merged 8 commits intomainfrom
feat/onboarding
Mar 24, 2026
Merged

Progressive onboarding, telemetry backend, sprint contracts, scored grading, bug fixes#360
justrach merged 8 commits intomainfrom
feat/onboarding

Conversation

@justrach
Copy link
Copy Markdown
Owner

@justrach justrach commented Mar 24, 2026

Summary

This is a big PR that ships multiple interconnected features:

Progressive onboarding (#358)

  • tools/list returns only onboard + get_project_state until setup is done
  • onboard tool discovers project, scans skills, writes .devswarm/onboarded marker
  • Tool self-deregisters after setup — disappears from tool list permanently

Telemetry upload to backend

  • After every run_swarm, telemetry JSON is POSTed to devswarm.codegraff.com/v1/telemetry
  • Fire-and-forget via background curl spawn (non-blocking)
  • DEVSWARM_TELEMETRY=false to disable, DEVSWARM_TELEMETRY_URL to override endpoint

Sprint contracts + scored grading (#364)

  • finder_fixer pipeline: finder → contract → fixer → verify
  • Contract: reviewer writes numbered testable acceptance criteria before fixer starts
  • Verify: reviewer scores fix on 4 axes (correctness≥8, safety≥9, completeness≥7, quality≥6)
  • reviewer_fixer preset: reviewer now uses 4-axis scoring with hard thresholds
  • review_fix_loop: scored PASS/FAIL convergence with few-shot calibrated examples

Bug fixes (#361, #362, #363)

Test plan

  • zig build test passes
  • zig build succeeds
  • CI checks

🤖 Generated with Claude Code

justrach and others added 2 commits March 24, 2026 14:41
Before onboarding: MCP tools/list returns only 2 tools (onboard + get_project_state)
After onboarding: MCP tools/list returns the full 30+ tool suite, onboard disappears

The onboard tool:
- Discovers project info (git remote)
- Scans for custom skills in .devswarm/skills/
- Counts built-in roles
- Creates .devswarm/onboarded marker file
- Returns project summary with next steps

The marker file (.devswarm/onboarded) controls tool visibility:
- Present: full tools, no onboard
- Absent: only onboard + get_project_state
- Delete to re-trigger onboarding: rm .devswarm/onboarded

Refs: #358

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After every run_swarm, telemetry JSON is POSTed to the backend via
fire-and-forget curl spawn (non-blocking, never delays the response).

Data sent: swarm_id, repo, grids (role, model, tokens_in, tokens_out,
tool_calls, wall_ms, errors per worker), cost, parallelism metrics.

Controls:
- DEVSWARM_TELEMETRY=false|0|off → disables upload (default: on)
- DEVSWARM_TELEMETRY_URL → override endpoint
- DEVSWARM_TELEMETRY_KEY → API key header

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
justrach and others added 6 commits March 24, 2026 17:07
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adopts Anthropic's harness design patterns (harness-design blog post):

review_fix_loop:
- 4-axis scoring: CORRECTNESS(8), SAFETY(9), COMPLETENESS(7), QUALITY(6)
- Hard threshold pass/fail — FAIL on ANY axis below threshold triggers iteration
- Few-shot calibrated: 2 example reviews (passing + failing) in prompt
- Convergence on PASS or NO_ISSUES_FOUND
- JSON output includes "verdict": "PASS"/"FAIL" per iteration

finder_fixer preset:
- New contract phase between finder and fixer
- Reviewer generates numbered TESTABLE acceptance criteria
- Fixer must satisfy ALL criteria (not just "fix the findings")
- Pipeline: finder → contract → fixer (was: finder → fixer)

reviewer_fixer preset:
- Reviewer now uses 4-axis scoring with thresholds
- PASS skips fixer entirely (no wasted work)
- FAIL includes specific findings the fixer must address

Refs: #364

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ontract

The finder_fixer pipeline is now:
  finder → contract → fixer → VERIFY

The verify step runs a reviewer that scores the fix on 4 axes
(correctness≥8, safety≥9, completeness≥7, quality≥6) against
the sprint contract's acceptance criteria. Returns PASS/FAIL
with explanation.

This closes the Anthropic harness pattern:
  Planner → Generator → Evaluator (with scored grading)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tor mismatch

#361: resolveWithProbe dupes model string before cfg.deinit — no more
dangling pointer when config uses full model IDs.

#362: addGrid/addWorker return !void, propagate errors. On OOM,
addGrid calls grid.deinit(self.alloc) before returning — no more
leaked GridMetrics.workers buffer.

#363: addGrid uses self.alloc (not caller-supplied alloc) — consistent
with deinit which frees with self.alloc. Dropped alloc parameter.

All fixes applied by run_task finder_fixer pipeline with sprint
contracts and scored verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@justrach justrach changed the title Progressive onboarding: tool self-deregisters after setup Progressive onboarding, telemetry backend, sprint contracts, scored grading, bug fixes Mar 24, 2026
@justrach justrach merged commit 1687e74 into main Mar 24, 2026
2 checks passed
@justrach justrach deleted the feat/onboarding branch March 24, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant