Progressive onboarding, telemetry backend, sprint contracts, scored grading, bug fixes#360
Merged
Progressive onboarding, telemetry backend, sprint contracts, scored grading, bug fixes#360
Conversation
Before onboarding: MCP tools/list returns only 2 tools (onboard + get_project_state) After onboarding: MCP tools/list returns the full 30+ tool suite, onboard disappears The onboard tool: - Discovers project info (git remote) - Scans for custom skills in .devswarm/skills/ - Counts built-in roles - Creates .devswarm/onboarded marker file - Returns project summary with next steps The marker file (.devswarm/onboarded) controls tool visibility: - Present: full tools, no onboard - Absent: only onboard + get_project_state - Delete to re-trigger onboarding: rm .devswarm/onboarded Refs: #358 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After every run_swarm, telemetry JSON is POSTed to the backend via fire-and-forget curl spawn (non-blocking, never delays the response). Data sent: swarm_id, repo, grids (role, model, tokens_in, tokens_out, tool_calls, wall_ms, errors per worker), cost, parallelism metrics. Controls: - DEVSWARM_TELEMETRY=false|0|off → disables upload (default: on) - DEVSWARM_TELEMETRY_URL → override endpoint - DEVSWARM_TELEMETRY_KEY → API key header Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 24, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adopts Anthropic's harness design patterns (harness-design blog post): review_fix_loop: - 4-axis scoring: CORRECTNESS(8), SAFETY(9), COMPLETENESS(7), QUALITY(6) - Hard threshold pass/fail — FAIL on ANY axis below threshold triggers iteration - Few-shot calibrated: 2 example reviews (passing + failing) in prompt - Convergence on PASS or NO_ISSUES_FOUND - JSON output includes "verdict": "PASS"/"FAIL" per iteration finder_fixer preset: - New contract phase between finder and fixer - Reviewer generates numbered TESTABLE acceptance criteria - Fixer must satisfy ALL criteria (not just "fix the findings") - Pipeline: finder → contract → fixer (was: finder → fixer) reviewer_fixer preset: - Reviewer now uses 4-axis scoring with thresholds - PASS skips fixer entirely (no wasted work) - FAIL includes specific findings the fixer must address Refs: #364 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ontract The finder_fixer pipeline is now: finder → contract → fixer → VERIFY The verify step runs a reviewer that scores the fix on 4 axes (correctness≥8, safety≥9, completeness≥7, quality≥6) against the sprint contract's acceptance criteria. Returns PASS/FAIL with explanation. This closes the Anthropic harness pattern: Planner → Generator → Evaluator (with scored grading) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tor mismatch #361: resolveWithProbe dupes model string before cfg.deinit — no more dangling pointer when config uses full model IDs. #362: addGrid/addWorker return !void, propagate errors. On OOM, addGrid calls grid.deinit(self.alloc) before returning — no more leaked GridMetrics.workers buffer. #363: addGrid uses self.alloc (not caller-supplied alloc) — consistent with deinit which frees with self.alloc. Dropped alloc parameter. All fixes applied by run_task finder_fixer pipeline with sprint contracts and scored verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Mar 24, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a big PR that ships multiple interconnected features:
Progressive onboarding (#358)
tools/listreturns onlyonboard+get_project_stateuntil setup is doneonboardtool discovers project, scans skills, writes.devswarm/onboardedmarkerTelemetry upload to backend
run_swarm, telemetry JSON is POSTed todevswarm.codegraff.com/v1/telemetryDEVSWARM_TELEMETRY=falseto disable,DEVSWARM_TELEMETRY_URLto override endpointSprint contracts + scored grading (#364)
finder_fixerpipeline: finder → contract → fixer → verifyreviewer_fixerpreset: reviewer now uses 4-axis scoring with hard thresholdsreview_fix_loop: scored PASS/FAIL convergence with few-shot calibrated examplesBug fixes (#361, #362, #363)
resolveWithProbe— dupe model string before cfg.deinitaddGrid/addWorkerreturn!void, cleanup on OOMaddGridusesself.allocconsistently (not caller-supplied)Test plan
zig build testpasseszig buildsucceeds🤖 Generated with Claude Code