Skip to content

0.3.7

Choose a tag to compare

@benym benym released this 09 Jun 10:00
· 4 commits to master since this release

What's Changed

  • feat: propagate workflow output language by @ddddddddwp in #53
  • fix: prevent skip-all from skipping uninstalled components in comet init by @qiansanyu in #73
  • fix(skills): enforce executing-plans review gate (#41) by @ddddddddwp in #76
  • feat: add auto transition config by @Ninzero in #74
  • feat: token optimization, context compression beta, and anti-drift guards by @benym in #78

Added

  • Auto-transition config: Added auto_transition (true|false) to .comet.yaml and the .comet/config.yaml project default so teams can choose whether Comet automatically advances to the next phase skill or pauses for a manual transition. When auto_transition: false, build/design/open/verify skills stop after meeting exit conditions and print the next manual step instead of invoking the next skill. Includes state-machine whitelist, enum validation, and schema (comet-yaml-validate.sh) coverage (#74).

  • Deterministic next-step resolver: Added comet-state next <change-name> to resolve post-guard routing from .comet.yaml (phase, workflow, auto_transition) with structured output: NEXT: auto|manual|done, SKILL: <skill-name>, and HINT (manual mode). This centralizes next-skill routing logic in scripts instead of duplicating it across skill prose.

  • Workflow output language: Comet workflows now propagate the triggering user request language into OpenSpec and Superpowers steps via an explicit Output Language Rule, keeping generated proposals, designs, plans, verification reports, and archive notes readable in the user's language. Resuming an existing change preserves the dominant artifact language unless the user explicitly asks to switch (#53, #37).

  • Execution benchmark (Claude Code): Added benchmark:execution, a benchmark harness with three test phases: L1 (design doc generation from handoff context), L2 (build a note-board module from handoff context + run tests), and L3 (full workflow — implement a dictionary module from spec, run 10 vitest tests). Invokes Claude Code (claude -p) and measures actual test pass rate, token usage, retry count, duration, and cost. Compares off vs beta context compression modes across small/medium/large tiers. Supports --phase l1|l2|l3|both|all and --dry-run for deterministic verification. Extracted shared utilities (spawnCapture, parseClaudeJson, buildClaudeArgs, etc.) to scripts/benchmark-utils.mjs.

  • Token optimization: TDD skill single load: Build skill now loads test-driven-development skill once before the first task (instead of per-task), reducing ~44K tokens per 10-task workflow. Includes compaction recovery guidance to reload once on resume.

  • Token optimization: brainstorming checkpoint: Design skill now writes brainstorm-summary.md after user confirms design approach, providing a compaction recovery point that preserves confirmed decisions across context window compression.

  • Token optimization: incremental brainstorming checkpoint: Design skill now incrementally updates brainstorm-summary.md during brainstorming, preserving confirmed facts, candidate decisions, risks, testing notes, and pending questions before platform-driven context compaction can occur.

  • Token optimization: active compaction gate: Design skill now requires an active context compaction gate after brainstorm-summary.md is finalized and before creating the Design Doc, using the host platform's native compaction mechanism when available and falling back to a manual user prompt when it is not.

  • Token optimization: plan creation subagent offload: Build skill offloads writing-plans execution to a subagent, freeing main session context. Subagent reads Design Doc + tasks.md from files and returns the plan file path. Falls back to inline execution on subagent failure.

  • Token optimization: verification skill dedup: Verify skill loads verification-before-completion once before the light/full branch point instead of in each branch, eliminating redundant skill content.

  • Token optimization: tasks.md incremental scan: Build skill uses grep to find unchecked tasks instead of re-reading the entire tasks.md file after each task completion.

  • Token optimization: hash on-demand read in verify: Verify skill checks handoff_hash before re-reading OpenSpec artifacts. When hash matches, only tasks.md is skipped (proposal.md and design.md are still read for comparison checks). Uses new comet-handoff.sh --hash-only flag.

  • --hash-only flag for comet-handoff.sh: New backward-compatible flag outputs the context hash without generating handoff files, used by verify phase for hash comparison. Validates required files exist before computing hash.

  • CodeGraph integration in comet init: comet init now offers an optional step to install and configure CodeGraph (@colbymchenry/codegraph) for semantic code intelligence. It auto-detects supported platforms (Claude Code, Cursor, Codex, OpenCode, Gemini, Kiro, Antigravity), installs the CLI if missing, runs codegraph install for agent wiring, and initializes the project index. Skips gracefully under --json mode.

  • Stale PR automation: Added a scheduled and manually runnable GitHub Actions workflow that marks inactive pull requests stale after 90 days and closes them after another 30 days, helping keep long-idle review queues manageable.

  • TDD mode field: Added tdd_mode (tdd|direct) to .comet.yaml state machine so users choose whether to enforce TDD during build. When tdd_mode: tdd, subagent dispatches inject an explicit TDD hard constraint, bypassing implementer-prompt.md's conditional trigger. Addresses #67.

  • subagent_dispatch field: Added subagent_dispatch (null|confirmed) to .comet.yaml state machine, ensuring build_mode: subagent-driven-development can only leave the build phase after the platform's real background dispatch capability is confirmed.

  • Verify retry limit: Verify skill now enforces a mandatory user decision after 3 consecutive verify-fail cycles, preventing indefinite automated retry loops.

  • Manual verify_mode override: Users can override automatic verification scale assessment via comet-state set <name> verify_mode <light|full> when the auto-detected mode doesn't fit.

  • Local context compression benchmark: Added benchmark:context, a local Codex benchmark harness that creates matched context_compression: off and beta Comet fixtures, runs codex exec against each mode, and reports token savings, spec drift rate, task completion rate, parse success, and timing. Use --dry-run for deterministic non-Codex verification.

  • Beta-gated context compression switch: Project installs now create .comet/config.yaml with context_compression: off, allowing teams to opt new changes into beta spec projection by setting context_compression: beta. This switch controls only the OpenSpec handoff projection path (spec-context.*); the workflow token optimizations above are default-on and do not require beta mode.

  • Beta spec projection handoff: /comet-design can now use beta context compression to generate spec-context.json and spec-context.md, preserving OpenSpec requirement and scenario headings with source hashes so compact design handoffs reduce token load without weakening acceptance coverage.

Changed

  • executing-plans review gate: When build_mode is executing-plans, the build phase now requires loading the Superpowers requesting-code-review skill and requesting code review at least once before the build→verify phase guard. CRITICAL findings must be fixed before verify; accepted non-CRITICAL findings must record acceptance rationale in a durable artifact. The build-exit checklist enforces this gate (#76, #41).
  • Phase advancement vs handoff wording: Chinese and English Comet skills now consistently distinguish guard-driven phase advancement (--apply, always updates phase) from next-skill invocation control (auto_transition). Open/design/build/verify/hotfix/tweak guidance now routes through comet-state next for auto/manual handoff.
  • Preset continuity wording: Hotfix and tweak guidance now explicitly documents the auto_transition: false exception in continuous execution mode, removing contradictory wording around "always continue" behavior.
  • Verify hash-skip scoped to tasks.md only: Full verification always reads proposal.md and design.md even when hash matches, ensuring goal-satisfaction and design-consistency checks have complete context.
  • Design Doc creation stays in main session: Design Doc is created inline (not offloaded to subagent) to preserve full brainstorming conversation context and prevent information loss for complex requirements.
  • Subagent failure fallback: Plan creation subagent offload includes explicit degraded fallback — if the subagent fails, the main session loads writing-plans inline.
  • Beta spec verbatim projection: Beta context compression now projects entire spec files verbatim (cat) instead of filtering by English keywords (GIVEN/WHEN/THEN/AND/BUT). This eliminates language-dependent matching, ensures zero acceptance-criteria drift for Chinese or non-English specs, and removes the fragile AWK filter entirely.
  • JSON structural validation: comet-guard.sh now validates spec-context.json structure (required fields: change, phase, mode, files, context_hash) and source file reference coverage, replacing the previous English-heading-based markdown check. Guard catches corrupted or incomplete JSON before phase transition.
  • JSON file roles: spec-context.json files array now includes a role field (spec for spec files, supporting for proposal/design/tasks), removing the language-dependent projection array entirely.
  • --full warning in beta mode: Running comet-handoff.sh with --full in beta mode now emits an explicit warning instead of silently ignoring the flag.
  • CodeGraph step in comet update: comet update now prompts to install/update CodeGraph alongside skill file updates, using the same platform detection and CLI installation flow.
  • Rules and hooks distribution in comet update: comet update now distributes anti-drift phase guard rules and hooks to all installed platforms alongside skill files, keeping rules and hooks in sync after a Comet upgrade.
  • Archive confirmation gate: Chinese /comet-archive now pauses for explicit user confirmation before running the archive script, giving users a final chance to adjust or re-run verification before main spec merge and change archival.
  • English archive confirmation parity: English Comet skills now match the confirmed Chinese archive-confirmation workflow, including /comet-archive, /comet-verify, /comet, hotfix, and tweak guidance.
  • Archive reopen transition: Added comet-state transition <change-name> archive-reopen so users who decline final archive confirmation can return from phase: archive to phase: verify for adjustment or re-verification without manually editing .comet.yaml.
  • OpenSpec clarification gate: Chinese and English /comet-open now require a confirmed requirements clarification summary before proposal, design, or tasks artifacts are created, preventing one Q&A turn from immediately generating a full OpenSpec change.
  • PRD split preflight: Chinese and English /comet-open now triage large PRDs before creating OpenSpec artifacts, allowing users to split independent capabilities into multiple Comet changes while keeping each accepted split on the /comet-open state-machine path. Addresses #62.
  • Skill invocation wording guidance: Added repository guidance in CLAUDE.md requiring new skill-trigger descriptions to use the existing "use the Skill tool to load..." wording and place context details after the skill loads.
  • Anti-drift phase guard rule: Added .claude/rules/comet-phase-guard.md that re-injects Comet phase awareness, skill invocation requirements, script execution requirements, user confirmation gates, and context compaction recovery instructions every conversation turn, preventing long-context attention drift from breaking the 5-phase workflow. Works on all platforms as a soft reminder.
  • Anti-drift phase guard hook: Added comet-hook-guard.sh PreToolUse hook (configured in .claude/settings.local.json) that hard-blocks file writes when the active Comet change is in open, design, or archive phase, providing a platform-specific hard enforcement layer that the model cannot bypass. Whitelists openspec/*, docs/superpowers/*, .claude/*, and .comet/* paths.
  • Platform rules/hooks distribution in comet init: comet init now distributes the anti-drift phase guard rule and hook-guard script to all supported platforms during initialization. Platform definitions were corrected: Cline uses .clinerules/ at project root (not .cline/rules/), GitHub Copilot uses .github/instructions/*.instructions.md with applyTo frontmatter, Kiro uses .kiro/steering/, and Gemini CLI has no rules directory (uses GEMINI.md files). Added rulesDir/rulesFormat to 8 platforms that were missing it, and supportsHooks/hookFormat to 7 platforms. Hook installation supports 7 format variants: Claude Code, Gemini, Windsurf, Copilot, Qwen, Kiro, and Qoder.
  • Systematic debugging gate: Chinese and English build and hotfix skills now require loading Superpowers systematic-debugging when implementation-time crashes, unexpected behavior, test failures, or build failures appear, ensuring root-cause investigation and in-change regression tests happen before source fixes.
  • Verification-before-completion gate: Chinese and English /comet-verify now require loading Superpowers verification-before-completion before executing lightweight or full verification checks, enforcing evidence-based confirmation before any completion claims.
  • Platform-neutral confirmation gates: Chinese and English Comet skills and recovery messages now refer to the current platform's user input/confirmation mechanism instead of hard-coding AskUserQuestion, preventing Codex users from being directed to a tool that may not exist while preserving blocking user decisions.
  • Preset upgrade path: Hotfix and tweak skills now include set <name> phase design step when upgrading to full workflow, preventing comet-design entry check failure after workflow switch.
  • Build-complete conditional field reset: build-complete transition preserves verification_report and branch_status when the previous verify_result was fail, enabling verify-fail→build→build-complete re-verify cycles without data loss.
  • Open phase recovery granularity: Open phase recovery now distinguishes three states (all artifacts done / none done / partial) with specific recovery actions per state.
  • 50% scope threshold option: Build skill now offers "continue in current change" as a third option when changes exceed 50% scope, avoiding forced change splitting.
  • Worktree plan commit: Build skill now explicitly instructs committing plan files before creating a worktree when using worktree isolation.

Removed

  • openspec/config.yaml: Removed unused example OpenSpec config file containing only placeholder comments.

Fixed

  • Subagent task persistence: /comet-build now requires every subagent dispatch prompt to persist completed task checks in the Superpowers plan and, when mapped, the corresponding OpenSpec tasks.md item before committing. Build guard blocks unchecked Superpowers plan tasks, and build recovery reports both OpenSpec and plan progress before inspecting recent git history/diff or dispatching more work, preventing resume after interruption or context compression from re-running already completed subagent work (#79).

  • skip-all skipping uninstalled components: comet init no longer treats a previously skipped component as already installed. Choosing skip-all now only skips components that are actually present, so uninstalled OpenSpec, Superpowers, Comet, or CodeGraph components are still offered for installation instead of being silently bypassed (#73).

  • Update JSON output for rules/hooks: comet update --json now includes rules and hooks distribution results alongside skill update results, with per-target error isolation so a single platform failure doesn't break the entire update output.

  • Duplicate YAML fields: replace_yaml_field in comet-state.sh now deduplicates all fields after replacement, keeping only the last occurrence of each key. Previously, multiple cmd_set calls for the same field (e.g., during verify-fail → re-verify cycles) could leave duplicate lines in .comet.yaml, confusing downstream parsers. Fixes #77.

  • Hook config format: installClaudeCodeHooks and .claude/settings.local.json now use the correct matcher + hooks: [{ type, command }] array format instead of the flat { matcher, command, description } format, fixing the /doctor schema validation error.

  • Archive delta merge: comet-archive.sh now delegates archive spec updates to OpenSpec's delta merge semantics instead of copying change specs over main specs, preventing ADDED/MODIFIED/REMOVED/RENAMED section headings from leaking into stable specs. Addresses #69.

  • Brainstorming depth: Chinese and English /comet-design no longer tell Superpowers brainstorming to skip context exploration, so unclear goals, scope, non-goals, acceptance scenarios, or constraints must be clarified before a Design Doc is created.

  • Command injection prevention: run_command_string() in comet-guard.sh now rejects build/verify commands containing shell metacharacters (;, |, &, $, backtick), preventing command injection through .comet.yaml command fields.

  • Path traversal prevention: comet-state.sh cmd_set now validates path fields (design_doc, plan, verification_report, handoff_context, handoff_hash) for .. traversal sequences before writing.

  • Design guard enforcement: Design guard now requires design_doc for full workflow (FAIL instead of WARN), preventing phase advance without a design document.

  • branch_status preservation on verify-fail: verify-fail transition no longer resets branch_status, keeping branch handling state across re-verify cycles.

  • UTC date consistency: All scripts now use date -u +%Y-%m-%d for created_at, verified_at, and archive naming, eliminating local/UTC date mismatches.

  • macOS SCRIPT_DIR resolution: All scripts use portable $(cd "$(dirname "$0")" && pwd -P) instead of readlink -f for cross-platform compatibility.

  • Archive directory resolution fallback: comet-archive.sh resolve_archive_dir() now searches by *-$CHANGE pattern when the exact UTC-based path doesn't match, fixing test reliability across timezone differences.

  • Temp file permissions: All mktemp calls now set chmod 600 on temporary files before writing sensitive data.

  • Pipe hash error propagation: Hash computation in comet-handoff.sh and comet-guard.sh captures pipe output in variables before piping to hash stream, preventing silent failures under pipefail.

Tests

  • Auto-transition regression: Added state-machine and skill coverage for auto_transition init defaults, enum validation, .comet/config.yaml project default propagation, schema validation, and the manual-transition vs auto-advance branching in build/design/open/verify skills (#74).
  • comet-state next regression: Added shell-script coverage for next-step resolution across full/hotfix/tweak workflows, manual-handoff mode, archived completion (NEXT: done), and missing .comet.yaml failure behavior.
  • Skill handoff wording regression update: Updated skill-content assertions to validate next-driven handoff wording (NEXT: auto|manual|done) and synchronized Chinese/English expectation checks.
  • Output language regression: Added skill coverage that Comet propagates the triggering user request language into OpenSpec and Superpowers steps across the open, design, build, verify, hotfix, tweak, and archive skills (#53).
  • Review gate regression: Added skill coverage that executing-plans build mode requires the requesting-code-review gate before the build→verify transition, plus updated init-e2e expectations (#76).
  • skip-all regression: Added comet init coverage that skip-all only skips installed components and still offers uninstalled OpenSpec/Superpowers/Comet/CodeGraph components (#73).
  • --hash-only flag coverage: New tests verify correct hash output, change-directory validation, required-file validation, and no handoff file regeneration.
  • Context benchmark runner coverage: New tests verify benchmark token-savings math, Codex JSONL usage/verdict parsing, and dry-run report generation without invoking Codex.
  • Flaky test timeout fix: Design guard test without design_doc now has explicit 20s timeout to prevent Windows bash startup flakiness.
  • Chinese spec coverage: Beta handoff test uses Chinese spec content to verify verbatim projection of all content (headings, descriptions, non-keyword steps) regardless of language.
  • JSON corruption detection: New test verifies guard blocks design exit when spec-context.json is structurally invalid.
  • --full beta warning: New test verifies the warning message and confirms beta files are still generated when --full is passed.
  • Doctor CodeGraph check: comet doctor now reports CodeGraph CLI availability and project initialization status (.codegraph/ presence).
  • Archive confirmation regression: Added Chinese skill coverage that /comet-archive requires a final confirmation gate before executing the archive script.
  • English archive confirmation regression: Added English skill coverage for final archive confirmation, archive reopen guidance, and hotfix/tweak preset blocking points.
  • Phase write guard hook coverage: 10 new tests for comet-hook-guard.sh covering phase-based write blocking (open/design/archive block, build/verify allow), whitelist paths (openspec, docs/superpowers, .claude), archived change bypass, and no-active-change passthrough.
  • Archive reopen regression: Added state-machine coverage for returning an unarchived change from archive confirmation back to verification and rejecting reopen attempts after archived: true.
  • Archive spec merge regression: Added shell-script coverage for archiving a delta spec without copying delta-only requirement section headings into the stable main spec.
  • OpenSpec proposal regression: Added Chinese and English skill coverage for the pre-artifact clarification gate, the default ban on one-shot openspec-propose, and preservation of the Superpowers brainstorming clarification flow.
  • Skill authoring regression: Added coverage that CLAUDE.md documents the required skill invocation wording pattern.
  • Debug gate regression: Added Chinese skill safeguard coverage for systematic-debugging invocation, minimal failing-test requirements, and keeping crash verification inside the current change.
  • Confirmation mechanism regression: Added coverage that Chinese workflow decision gates no longer hard-code AskUserQuestion and that recovery output points agents to a platform-neutral confirmation mechanism.
  • PRD split workflow regression: Added Chinese and English skill coverage for open-phase PRD split choices, /comet-open state initialization, repeated-triage prevention, split completion selection, and minimal resume guidance.
  • tdd_mode state machine regression: Added coverage for tdd_mode init defaults (null for full, direct for hotfix), enum validation, build-exit guard, hotfix bypass, and schema validation rejection of invalid values.
  • Review fix regression: Added coverage for conditional verification_report preservation on re-verify, branch_status preservation across verify-fail, path traversal rejection on design_doc, command injection rejection on build_command, and design guard enforcement for full workflow without design_doc.
  • Context compression regression: Added coverage for project config defaults, change-level context_compression snapshots, environment override during change initialization, beta spec projection generation, and guard rejection when beta projection misses requirement or scenario headings.

New Contributors

Full Changelog: 0.3.6...0.3.7