0.3.7
What's Changed
- feat: propagate workflow output language by @ddddddddwp in #53
- fix: prevent skip-all from skipping uninstalled components in comet init by @qiansanyu in #73
- fix(skills): enforce executing-plans review gate (#41) by @ddddddddwp in #76
- feat: add auto transition config by @Ninzero in #74
- feat: token optimization, context compression beta, and anti-drift guards by @benym in #78
Added
-
Auto-transition config: Added
auto_transition(true|false) to.comet.yamland the.comet/config.yamlproject default so teams can choose whether Comet automatically advances to the next phase skill or pauses for a manual transition. Whenauto_transition: false, build/design/open/verify skills stop after meeting exit conditions and print the next manual step instead of invoking the next skill. Includes state-machine whitelist, enum validation, and schema (comet-yaml-validate.sh) coverage (#74). -
Deterministic next-step resolver: Added
comet-state next <change-name>to resolve post-guard routing from.comet.yaml(phase,workflow,auto_transition) with structured output:NEXT: auto|manual|done,SKILL: <skill-name>, andHINT(manual mode). This centralizes next-skill routing logic in scripts instead of duplicating it across skill prose. -
Workflow output language: Comet workflows now propagate the triggering user request language into OpenSpec and Superpowers steps via an explicit Output Language Rule, keeping generated proposals, designs, plans, verification reports, and archive notes readable in the user's language. Resuming an existing change preserves the dominant artifact language unless the user explicitly asks to switch (#53, #37).
-
Execution benchmark (Claude Code): Added
benchmark:execution, a benchmark harness with three test phases: L1 (design doc generation from handoff context), L2 (build a note-board module from handoff context + run tests), and L3 (full workflow — implement a dictionary module from spec, run 10 vitest tests). Invokes Claude Code (claude -p) and measures actual test pass rate, token usage, retry count, duration, and cost. Comparesoffvsbetacontext compression modes across small/medium/large tiers. Supports--phase l1|l2|l3|both|alland--dry-runfor deterministic verification. Extracted shared utilities (spawnCapture,parseClaudeJson,buildClaudeArgs, etc.) toscripts/benchmark-utils.mjs. -
Token optimization: TDD skill single load: Build skill now loads
test-driven-developmentskill once before the first task (instead of per-task), reducing ~44K tokens per 10-task workflow. Includes compaction recovery guidance to reload once on resume. -
Token optimization: brainstorming checkpoint: Design skill now writes
brainstorm-summary.mdafter user confirms design approach, providing a compaction recovery point that preserves confirmed decisions across context window compression. -
Token optimization: incremental brainstorming checkpoint: Design skill now incrementally updates
brainstorm-summary.mdduring brainstorming, preserving confirmed facts, candidate decisions, risks, testing notes, and pending questions before platform-driven context compaction can occur. -
Token optimization: active compaction gate: Design skill now requires an active context compaction gate after
brainstorm-summary.mdis finalized and before creating the Design Doc, using the host platform's native compaction mechanism when available and falling back to a manual user prompt when it is not. -
Token optimization: plan creation subagent offload: Build skill offloads
writing-plansexecution to a subagent, freeing main session context. Subagent reads Design Doc + tasks.md from files and returns the plan file path. Falls back to inline execution on subagent failure. -
Token optimization: verification skill dedup: Verify skill loads
verification-before-completiononce before the light/full branch point instead of in each branch, eliminating redundant skill content. -
Token optimization: tasks.md incremental scan: Build skill uses
grepto find unchecked tasks instead of re-reading the entiretasks.mdfile after each task completion. -
Token optimization: hash on-demand read in verify: Verify skill checks
handoff_hashbefore re-reading OpenSpec artifacts. When hash matches, onlytasks.mdis skipped (proposal.md and design.md are still read for comparison checks). Uses newcomet-handoff.sh --hash-onlyflag. -
--hash-onlyflag for comet-handoff.sh: New backward-compatible flag outputs the context hash without generating handoff files, used by verify phase for hash comparison. Validates required files exist before computing hash. -
CodeGraph integration in comet init:
comet initnow offers an optional step to install and configure CodeGraph (@colbymchenry/codegraph) for semantic code intelligence. It auto-detects supported platforms (Claude Code, Cursor, Codex, OpenCode, Gemini, Kiro, Antigravity), installs the CLI if missing, runscodegraph installfor agent wiring, and initializes the project index. Skips gracefully under--jsonmode. -
Stale PR automation: Added a scheduled and manually runnable GitHub Actions workflow that marks inactive pull requests stale after 90 days and closes them after another 30 days, helping keep long-idle review queues manageable.
-
TDD mode field: Added
tdd_mode(tdd|direct) to.comet.yamlstate machine so users choose whether to enforce TDD during build. Whentdd_mode: tdd, subagent dispatches inject an explicit TDD hard constraint, bypassing implementer-prompt.md's conditional trigger. Addresses #67. -
subagent_dispatch field: Added
subagent_dispatch(null|confirmed) to.comet.yamlstate machine, ensuringbuild_mode: subagent-driven-developmentcan only leave the build phase after the platform's real background dispatch capability is confirmed. -
Verify retry limit: Verify skill now enforces a mandatory user decision after 3 consecutive verify-fail cycles, preventing indefinite automated retry loops.
-
Manual verify_mode override: Users can override automatic verification scale assessment via
comet-state set <name> verify_mode <light|full>when the auto-detected mode doesn't fit. -
Local context compression benchmark: Added
benchmark:context, a local Codex benchmark harness that creates matchedcontext_compression: offandbetaComet fixtures, runscodex execagainst each mode, and reports token savings, spec drift rate, task completion rate, parse success, and timing. Use--dry-runfor deterministic non-Codex verification. -
Beta-gated context compression switch: Project installs now create
.comet/config.yamlwithcontext_compression: off, allowing teams to opt new changes into beta spec projection by settingcontext_compression: beta. This switch controls only the OpenSpec handoff projection path (spec-context.*); the workflow token optimizations above are default-on and do not require beta mode. -
Beta spec projection handoff:
/comet-designcan now use beta context compression to generatespec-context.jsonandspec-context.md, preserving OpenSpec requirement and scenario headings with source hashes so compact design handoffs reduce token load without weakening acceptance coverage.
Changed
- executing-plans review gate: When
build_modeisexecuting-plans, the build phase now requires loading the Superpowersrequesting-code-reviewskill and requesting code review at least once before the build→verify phase guard. CRITICAL findings must be fixed before verify; accepted non-CRITICAL findings must record acceptance rationale in a durable artifact. The build-exit checklist enforces this gate (#76, #41). - Phase advancement vs handoff wording: Chinese and English Comet skills now consistently distinguish guard-driven phase advancement (
--apply, always updatesphase) from next-skill invocation control (auto_transition). Open/design/build/verify/hotfix/tweak guidance now routes throughcomet-state nextfor auto/manual handoff. - Preset continuity wording: Hotfix and tweak guidance now explicitly documents the
auto_transition: falseexception in continuous execution mode, removing contradictory wording around "always continue" behavior. - Verify hash-skip scoped to tasks.md only: Full verification always reads
proposal.mdanddesign.mdeven when hash matches, ensuring goal-satisfaction and design-consistency checks have complete context. - Design Doc creation stays in main session: Design Doc is created inline (not offloaded to subagent) to preserve full brainstorming conversation context and prevent information loss for complex requirements.
- Subagent failure fallback: Plan creation subagent offload includes explicit degraded fallback — if the subagent fails, the main session loads
writing-plansinline. - Beta spec verbatim projection: Beta context compression now projects entire spec files verbatim (
cat) instead of filtering by English keywords (GIVEN/WHEN/THEN/AND/BUT). This eliminates language-dependent matching, ensures zero acceptance-criteria drift for Chinese or non-English specs, and removes the fragile AWK filter entirely. - JSON structural validation:
comet-guard.shnow validatesspec-context.jsonstructure (required fields:change,phase,mode,files,context_hash) and source file reference coverage, replacing the previous English-heading-based markdown check. Guard catches corrupted or incomplete JSON before phase transition. - JSON file roles:
spec-context.jsonfilesarray now includes arolefield (specfor spec files,supportingfor proposal/design/tasks), removing the language-dependentprojectionarray entirely. - --full warning in beta mode: Running
comet-handoff.shwith--fullin beta mode now emits an explicit warning instead of silently ignoring the flag. - CodeGraph step in comet update:
comet updatenow prompts to install/update CodeGraph alongside skill file updates, using the same platform detection and CLI installation flow. - Rules and hooks distribution in comet update:
comet updatenow distributes anti-drift phase guard rules and hooks to all installed platforms alongside skill files, keeping rules and hooks in sync after a Comet upgrade. - Archive confirmation gate: Chinese
/comet-archivenow pauses for explicit user confirmation before running the archive script, giving users a final chance to adjust or re-run verification before main spec merge and change archival. - English archive confirmation parity: English Comet skills now match the confirmed Chinese archive-confirmation workflow, including
/comet-archive,/comet-verify,/comet, hotfix, and tweak guidance. - Archive reopen transition: Added
comet-state transition <change-name> archive-reopenso users who decline final archive confirmation can return fromphase: archivetophase: verifyfor adjustment or re-verification without manually editing.comet.yaml. - OpenSpec clarification gate: Chinese and English
/comet-opennow require a confirmed requirements clarification summary before proposal, design, or tasks artifacts are created, preventing one Q&A turn from immediately generating a full OpenSpec change. - PRD split preflight: Chinese and English
/comet-opennow triage large PRDs before creating OpenSpec artifacts, allowing users to split independent capabilities into multiple Comet changes while keeping each accepted split on the/comet-openstate-machine path. Addresses #62. - Skill invocation wording guidance: Added repository guidance in
CLAUDE.mdrequiring new skill-trigger descriptions to use the existing "use the Skill tool to load..." wording and place context details after the skill loads. - Anti-drift phase guard rule: Added
.claude/rules/comet-phase-guard.mdthat re-injects Comet phase awareness, skill invocation requirements, script execution requirements, user confirmation gates, and context compaction recovery instructions every conversation turn, preventing long-context attention drift from breaking the 5-phase workflow. Works on all platforms as a soft reminder. - Anti-drift phase guard hook: Added
comet-hook-guard.shPreToolUse hook (configured in.claude/settings.local.json) that hard-blocks file writes when the active Comet change is inopen,design, orarchivephase, providing a platform-specific hard enforcement layer that the model cannot bypass. Whitelistsopenspec/*,docs/superpowers/*,.claude/*, and.comet/*paths. - Platform rules/hooks distribution in comet init:
comet initnow distributes the anti-drift phase guard rule and hook-guard script to all supported platforms during initialization. Platform definitions were corrected: Cline uses.clinerules/at project root (not.cline/rules/), GitHub Copilot uses.github/instructions/*.instructions.mdwithapplyTofrontmatter, Kiro uses.kiro/steering/, and Gemini CLI has no rules directory (uses GEMINI.md files). AddedrulesDir/rulesFormatto 8 platforms that were missing it, andsupportsHooks/hookFormatto 7 platforms. Hook installation supports 7 format variants: Claude Code, Gemini, Windsurf, Copilot, Qwen, Kiro, and Qoder. - Systematic debugging gate: Chinese and English build and hotfix skills now require loading Superpowers
systematic-debuggingwhen implementation-time crashes, unexpected behavior, test failures, or build failures appear, ensuring root-cause investigation and in-change regression tests happen before source fixes. - Verification-before-completion gate: Chinese and English
/comet-verifynow require loading Superpowersverification-before-completionbefore executing lightweight or full verification checks, enforcing evidence-based confirmation before any completion claims. - Platform-neutral confirmation gates: Chinese and English Comet skills and recovery messages now refer to the current platform's user input/confirmation mechanism instead of hard-coding
AskUserQuestion, preventing Codex users from being directed to a tool that may not exist while preserving blocking user decisions. - Preset upgrade path: Hotfix and tweak skills now include
set <name> phase designstep when upgrading to full workflow, preventing comet-design entry check failure after workflow switch. - Build-complete conditional field reset:
build-completetransition preservesverification_reportandbranch_statuswhen the previous verify_result wasfail, enabling verify-fail→build→build-complete re-verify cycles without data loss. - Open phase recovery granularity: Open phase recovery now distinguishes three states (all artifacts done / none done / partial) with specific recovery actions per state.
- 50% scope threshold option: Build skill now offers "continue in current change" as a third option when changes exceed 50% scope, avoiding forced change splitting.
- Worktree plan commit: Build skill now explicitly instructs committing plan files before creating a worktree when using worktree isolation.
Removed
- openspec/config.yaml: Removed unused example OpenSpec config file containing only placeholder comments.
Fixed
-
Subagent task persistence:
/comet-buildnow requires every subagent dispatch prompt to persist completed task checks in the Superpowers plan and, when mapped, the corresponding OpenSpectasks.mditem before committing. Build guard blocks unchecked Superpowers plan tasks, and build recovery reports both OpenSpec and plan progress before inspecting recent git history/diff or dispatching more work, preventing resume after interruption or context compression from re-running already completed subagent work (#79). -
skip-all skipping uninstalled components:
comet initno longer treats a previously skipped component as already installed. Choosing skip-all now only skips components that are actually present, so uninstalled OpenSpec, Superpowers, Comet, or CodeGraph components are still offered for installation instead of being silently bypassed (#73). -
Update JSON output for rules/hooks:
comet update --jsonnow includes rules and hooks distribution results alongside skill update results, with per-target error isolation so a single platform failure doesn't break the entire update output. -
Duplicate YAML fields:
replace_yaml_fieldincomet-state.shnow deduplicates all fields after replacement, keeping only the last occurrence of each key. Previously, multiplecmd_setcalls for the same field (e.g., during verify-fail → re-verify cycles) could leave duplicate lines in.comet.yaml, confusing downstream parsers. Fixes #77. -
Hook config format:
installClaudeCodeHooksand.claude/settings.local.jsonnow use the correctmatcher+hooks: [{ type, command }]array format instead of the flat{ matcher, command, description }format, fixing the/doctorschema validation error. -
Archive delta merge:
comet-archive.shnow delegates archive spec updates to OpenSpec's delta merge semantics instead of copying change specs over main specs, preventingADDED/MODIFIED/REMOVED/RENAMEDsection headings from leaking into stable specs. Addresses #69. -
Brainstorming depth: Chinese and English
/comet-designno longer tell Superpowersbrainstormingto skip context exploration, so unclear goals, scope, non-goals, acceptance scenarios, or constraints must be clarified before a Design Doc is created. -
Command injection prevention:
run_command_string()incomet-guard.shnow rejects build/verify commands containing shell metacharacters (;,|,&,$, backtick), preventing command injection through.comet.yamlcommand fields. -
Path traversal prevention:
comet-state.sh cmd_setnow validates path fields (design_doc, plan, verification_report, handoff_context, handoff_hash) for..traversal sequences before writing. -
Design guard enforcement: Design guard now requires
design_docfor full workflow (FAIL instead of WARN), preventing phase advance without a design document. -
branch_status preservation on verify-fail:
verify-failtransition no longer resetsbranch_status, keeping branch handling state across re-verify cycles. -
UTC date consistency: All scripts now use
date -u +%Y-%m-%dforcreated_at,verified_at, and archive naming, eliminating local/UTC date mismatches. -
macOS SCRIPT_DIR resolution: All scripts use portable
$(cd "$(dirname "$0")" && pwd -P)instead ofreadlink -ffor cross-platform compatibility. -
Archive directory resolution fallback:
comet-archive.sh resolve_archive_dir()now searches by*-$CHANGEpattern when the exact UTC-based path doesn't match, fixing test reliability across timezone differences. -
Temp file permissions: All
mktempcalls now setchmod 600on temporary files before writing sensitive data. -
Pipe hash error propagation: Hash computation in
comet-handoff.shandcomet-guard.shcaptures pipe output in variables before piping to hash stream, preventing silent failures underpipefail.
Tests
- Auto-transition regression: Added state-machine and skill coverage for
auto_transitioninit defaults, enum validation,.comet/config.yamlproject default propagation, schema validation, and the manual-transition vs auto-advance branching in build/design/open/verify skills (#74). comet-state nextregression: Added shell-script coverage for next-step resolution across full/hotfix/tweak workflows, manual-handoff mode, archived completion (NEXT: done), and missing.comet.yamlfailure behavior.- Skill handoff wording regression update: Updated skill-content assertions to validate next-driven handoff wording (
NEXT: auto|manual|done) and synchronized Chinese/English expectation checks. - Output language regression: Added skill coverage that Comet propagates the triggering user request language into OpenSpec and Superpowers steps across the open, design, build, verify, hotfix, tweak, and archive skills (#53).
- Review gate regression: Added skill coverage that
executing-plansbuild mode requires therequesting-code-reviewgate before the build→verify transition, plus updated init-e2e expectations (#76). - skip-all regression: Added
comet initcoverage that skip-all only skips installed components and still offers uninstalled OpenSpec/Superpowers/Comet/CodeGraph components (#73). --hash-onlyflag coverage: New tests verify correct hash output, change-directory validation, required-file validation, and no handoff file regeneration.- Context benchmark runner coverage: New tests verify benchmark token-savings math, Codex JSONL usage/verdict parsing, and dry-run report generation without invoking Codex.
- Flaky test timeout fix: Design guard test without design_doc now has explicit 20s timeout to prevent Windows bash startup flakiness.
- Chinese spec coverage: Beta handoff test uses Chinese spec content to verify verbatim projection of all content (headings, descriptions, non-keyword steps) regardless of language.
- JSON corruption detection: New test verifies guard blocks design exit when
spec-context.jsonis structurally invalid. - --full beta warning: New test verifies the warning message and confirms beta files are still generated when
--fullis passed. - Doctor CodeGraph check:
comet doctornow reports CodeGraph CLI availability and project initialization status (.codegraph/presence). - Archive confirmation regression: Added Chinese skill coverage that
/comet-archiverequires a final confirmation gate before executing the archive script. - English archive confirmation regression: Added English skill coverage for final archive confirmation, archive reopen guidance, and hotfix/tweak preset blocking points.
- Phase write guard hook coverage: 10 new tests for
comet-hook-guard.shcovering phase-based write blocking (open/design/archive block, build/verify allow), whitelist paths (openspec, docs/superpowers, .claude), archived change bypass, and no-active-change passthrough. - Archive reopen regression: Added state-machine coverage for returning an unarchived change from archive confirmation back to verification and rejecting reopen attempts after
archived: true. - Archive spec merge regression: Added shell-script coverage for archiving a delta spec without copying delta-only requirement section headings into the stable main spec.
- OpenSpec proposal regression: Added Chinese and English skill coverage for the pre-artifact clarification gate, the default ban on one-shot
openspec-propose, and preservation of the Superpowers brainstorming clarification flow. - Skill authoring regression: Added coverage that
CLAUDE.mddocuments the required skill invocation wording pattern. - Debug gate regression: Added Chinese skill safeguard coverage for systematic-debugging invocation, minimal failing-test requirements, and keeping crash verification inside the current change.
- Confirmation mechanism regression: Added coverage that Chinese workflow decision gates no longer hard-code
AskUserQuestionand that recovery output points agents to a platform-neutral confirmation mechanism. - PRD split workflow regression: Added Chinese and English skill coverage for open-phase PRD split choices,
/comet-openstate initialization, repeated-triage prevention, split completion selection, and minimal resume guidance. - tdd_mode state machine regression: Added coverage for tdd_mode init defaults (null for full, direct for hotfix), enum validation, build-exit guard, hotfix bypass, and schema validation rejection of invalid values.
- Review fix regression: Added coverage for conditional verification_report preservation on re-verify, branch_status preservation across verify-fail, path traversal rejection on design_doc, command injection rejection on build_command, and design guard enforcement for full workflow without design_doc.
- Context compression regression: Added coverage for project config defaults, change-level
context_compressionsnapshots, environment override during change initialization, beta spec projection generation, and guard rejection when beta projection misses requirement or scenario headings.
New Contributors
- @ddddddddwp made their first contribution in #53
- @qiansanyu made their first contribution in #73
- @Ninzero made their first contribution in #74
Full Changelog: 0.3.6...0.3.7