[DSPX-3302] (6/6) feature-orchestrate skill + cells-of-effort spec#455
[DSPX-3302] (6/6) feature-orchestrate skill + cells-of-effort spec#455dmihalcik-virtru wants to merge 2 commits into
Conversation
…-3302)
Implements the second half of the multi-repo feature workflow. feature-design
(landed in PR 5) authors the spec and tests-side artifacts; feature-orchestrate
reads that spec, creates git worktrees, and fans claude -p subagents out in
topological waves so each cell of work proceeds in parallel where possible.
Spec schema change (informal — no Pydantic model yet):
The platform monorepo holds proto definitions, the Go SDK, KAS service code,
and shared libs, so a single feature often touches multiple "cells of effort"
inside one repo. The spec now expresses work as cells, not repos. Each cell
has a `path:` (which sibling repo to worktree from), a `branch:`, a `todo:`
list, and an optional `depends_on:` edge. Canonical example: every SDK cell
declares `depends_on: [platform-proto]` whenever the feature changes wire
format — the proto cell regenerates Go/Java/JS bindings before the SDKs can
adopt them.
- `otdf-sdk-mgr orchestrate run <spec>` (new CLI verb in `cli_orchestrate.py`):
parses the spec, topologically sorts cells by `depends_on`, creates worktrees
at `~/Documents/GitHub/worktrees/<JIRA-KEY>-<cell-key>/`, and dispatches one
`claude -p` subagent per cell. Cells in the same wave run in parallel via
`ThreadPoolExecutor`. Per-cell prompts embed the full spec body for
cross-cell context. `--dry-run`, `--only <cell>`, `--timeout`, `--model`
flags. Per-cell stdout captured to `.claude/tmp/runs/<KEY>-<cell>.jsonl`.
- Per-worktree `.claude/settings.json` auto-written if absent, with a minimal
allowlist tailored to the repo (`go`/`make`/`buf` for platform, `mvn` for
java-sdk, `npm` for web-sdk) plus universal `git`/`gh pr create`. Skipped
if the user has committed their own settings.
- `feature-orchestrate` SKILL.md: thin wrapper that surfaces the dry-run plan,
asks the user to confirm, then dispatches.
- `feature-design` SKILL.md Step 2/3: teaches the cell shape, `path:`,
`depends_on:`, and the proto-blocks-SDK pattern as the canonical example.
Branches are now per-cell (`<JIRA-KEY>-<cell-key>`) so concurrent worktrees
of the same repo don't collide.
- 10 new unit tests (`test_orchestrate.py`): load + validation, topological
waves with skip semantics, cycle detection (incl. diamond), worktree path
resolution. All passing alongside the existing 65.
- `xtest/features/{README,CLAUDE}.md`: cell-aware terminology.
- `settings.json` + `plugin.json`: `Bash(claude -p *)`, `Bash(git worktree *)`,
`Bash(gh pr create *)` allowlists; `Skill(feature-orchestrate)` registered.
Out of scope for this PR: status command, --retry, Pydantic model for Feature,
cross-PR linking automation, non-Sonnet subagent models. See plan file for
the full follow-up list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces the feature-orchestrate skill and a CLI command to automate multi-repo feature implementation using a 'cells of effort' model. This approach allows for granular task management within repositories via git worktrees and dependency-based parallel execution. Feedback from the reviewer focuses on enhancing the orchestrator's reliability and portability, specifically suggesting improvements for git branch handling during worktree setup, consolidating exception handling in subagents, providing configurability for root directories, broadening error catching for YAML parsing, and implementing a cap on parallel worker threads to prevent resource exhaustion.
| subprocess.check_call( | ||
| ["git", "-C", str(repo), "worktree", "add", str(wt), "-b", cell.branch], | ||
| ) |
There was a problem hiding this comment.
The git worktree add -b <branch> command will fail if the branch already exists in the repository. This is a common scenario when retrying a failed orchestration run where the worktree was removed but the branch remains. It's safer to check if the branch exists first and only use -b if it doesn't.
# Check if branch exists in the source repo
has_branch = subprocess.run(
["git", "-C", str(repo), "rev-parse", "--verify", cell.branch],
capture_output=True,
).returncode == 0
cmd = ["git", "-C", str(repo), "worktree", "add", str(wt)]
if not has_branch:
cmd.extend(["-b", cell.branch])
else:
cmd.append(cell.branch)
subprocess.check_call(cmd)| try: | ||
| wt = ensure_worktree(spec, cell) | ||
| except Exception as e: | ||
| return CellResult(cell, Path(), Path(), False, None, f"worktree: {e}") | ||
|
|
||
| ensure_subagent_settings(wt, cell.path) | ||
|
|
||
| transcripts_dir.mkdir(parents=True, exist_ok=True) | ||
| transcript = transcripts_dir / f"{spec.jira or spec.name}-{cell.key}.jsonl" | ||
|
|
||
| cmd = [ | ||
| "claude", "-p", | ||
| "--model", model, | ||
| "--permission-mode", "acceptEdits", | ||
| "--output-format", "stream-json", | ||
| "--verbose", | ||
| build_prompt(spec, cell), | ||
| ] | ||
| try: | ||
| with transcript.open("w", encoding="utf-8") as out: | ||
| completed = subprocess.run( | ||
| cmd, | ||
| cwd=wt, | ||
| stdout=out, | ||
| stderr=subprocess.STDOUT, | ||
| timeout=timeout_s, | ||
| ) | ||
| except subprocess.TimeoutExpired: | ||
| return CellResult(cell, wt, transcript, False, None, f"timed out after {timeout_s}s") | ||
|
|
||
| if completed.returncode != 0: | ||
| return CellResult(cell, wt, transcript, False, None, f"exit {completed.returncode}") | ||
|
|
||
| pr_url: str | None = None | ||
| for line in transcript.read_text(encoding="utf-8").splitlines(): | ||
| m = PR_URL_RE.search(line) | ||
| if m: | ||
| pr_url = m.group(0) | ||
| return CellResult(cell, wt, transcript, True, pr_url, None) |
There was a problem hiding this comment.
The current exception handling in run_cell is fragmented and misses several potential failure points (e.g., ensure_subagent_settings, transcript.open, or OSError from subprocess.run if the claude binary is missing). Wrapping the entire logic in a single try...except block ensures that any failure is captured and reported as a CellResult error, preventing thread crashes in the orchestrator.
wt = Path()
transcript = Path()
try:
wt = ensure_worktree(spec, cell)
ensure_subagent_settings(wt, cell.path)
transcripts_dir.mkdir(parents=True, exist_ok=True)
transcript = transcripts_dir / f"{spec.jira or spec.name}-{cell.key}.jsonl"
cmd = [
"claude", "-p",
"--model", model,
"--permission-mode", "acceptEdits",
"--output-format", "stream-json",
"--verbose",
build_prompt(spec, cell),
]
with transcript.open("w", encoding="utf-8") as out:
completed = subprocess.run(
cmd,
cwd=wt,
stdout=out,
stderr=subprocess.STDOUT,
timeout=timeout_s,
)
if completed.returncode != 0:
return CellResult(cell, wt, transcript, False, None, f"exit {completed.returncode}")
pr_url: str | None = None
for line in transcript.read_text(encoding="utf-8").splitlines():
m = PR_URL_RE.search(line)
if m:
pr_url = m.group(0)
return CellResult(cell, wt, transcript, True, pr_url, None)
except subprocess.TimeoutExpired:
return CellResult(cell, wt, transcript, False, None, f"timed out after {timeout_s}s")
except Exception as e:
return CellResult(cell, wt, transcript, False, None, str(e))| OPENTDF_ROOT = Path.home() / "Documents/GitHub/opentdf" | ||
| WORKTREES_ROOT = Path.home() / "Documents/GitHub/worktrees" |
There was a problem hiding this comment.
| try: | ||
| spec = load_spec(spec_path) | ||
| except ValueError as e: | ||
| typer.echo(f"Error: {e}", err=True) | ||
| raise typer.Exit(1) from e |
There was a problem hiding this comment.
The load_spec function uses ruamel.yaml.load, which can raise various YAMLError exceptions if the input file is malformed. The current try...except only catches ValueError. Catching a broader exception would make the CLI more robust against invalid YAML files.
| try: | |
| spec = load_spec(spec_path) | |
| except ValueError as e: | |
| typer.echo(f"Error: {e}", err=True) | |
| raise typer.Exit(1) from e | |
| try: | |
| spec = load_spec(spec_path) | |
| except Exception as e: | |
| typer.echo(f"Error: {e}", err=True) | |
| raise typer.Exit(1) from e |
| ) | ||
| failed.add(skipped.key) | ||
|
|
||
| with concurrent.futures.ThreadPoolExecutor(max_workers=max(1, len(runnable))) as ex: |
There was a problem hiding this comment.
Spawning one thread (and one claude -p process) per cell in a wave without a cap could lead to resource exhaustion (CPU/Memory) or API rate limiting if a feature has many independent cells. It's recommended to cap the max_workers to a reasonable value (e.g., 8).
max_workers = min(8, len(runnable)) if runnable else 1
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as ex:…X-3302) Change Write(xtest/bug_*_test.py) to Write(xtest/**) — wildcards must sit at path boundaries, not in filename patterns. Consolidates and simplifies xtest write permissions across both settings.json and plugin.json. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|



Summary
Adds
feature-orchestrate— the second half of the multi-repo feature workflow thatfeature-design(PR 5) started. Reads a feature spec atxtest/features/<name>.yaml, creates git worktrees, and fansclaude -psubagents out in topological waves to implement each cell of work in parallel where possible.What it does
feature-orchestrateskill (tests/.claude/skills/feature-orchestrate/SKILL.md) — thin wrapper that surfaces a dry-run plan, asks the user to confirm, then dispatches.otdf-sdk-mgr orchestrate run <spec>(tests/otdf-sdk-mgr/src/otdf_sdk_mgr/cli_orchestrate.py) — the engine. Parses the spec, topologically sorts cells bydepends_on, creates worktrees at~/Documents/GitHub/worktrees/<JIRA-KEY>-<cell-key>/, dispatches oneclaude -psubagent per cell. Cells in the same wave run in parallel viaThreadPoolExecutor. Each per-cell prompt embeds the full spec body for cross-cell context. Flags:--dry-run,--only <cell>(repeatable),--timeout,--model.Spec schema change (cells, not repos)
The Feature spec format evolves from "one entry per repo" to "one entry per cell of effort." The platform monorepo holds proto definitions, the Go SDK, KAS service code, and shared libs, so a single feature often touches multiple cells inside one repo. Each cell now has a
path:(which sibling repo to worktree from), abranch:, atodo:list, and an optionaldepends_on:edge.Canonical example: every SDK cell declares
depends_on: [platform-proto]whenever the feature changes wire format — the proto cell regenerates Go/Java/JS bindings before the SDKs can adopt them.feature-design's SKILL.md Step 2/3 is updated in this PR to teach the new cell shape. Branches are now per-cell (<JIRA-KEY>-<cell-key>) so concurrent worktrees of the same repo don't collide.Subagent dispatch
Each per-cell subagent:
claude -p --model sonnet --permission-mode acceptEditsinside its worktree.path/branch/todo+ house-style commit guidance (subject embeds(DSPX-XXXX), noJira:footer,Co-Authored-By: Claude).gh pr create --draft, prints the PR URL as the last line of stdout.--timeoutto override).The orchestrator pre-writes a minimal
.claude/settings.jsoninto each worktree if one isn't already there, allowlisting the repo-type-appropriate test commands (go/make/buffor platform,mvnfor java-sdk,npmfor web-sdk) plus universalgit/gh pr create. User-committed.claude/settings.jsonfiles are left alone.Stack
This PR is stacked on #454; merge that one first.
Test plan
cd tests/otdf-sdk-mgr && uv run pytest tests/test_orchestrate.py(10 new tests covering topological sort, cycle detection, diamond dependencies, schema validation, worktree path resolution)uv run otdf-sdk-mgr orchestrate run <spec> --dry-run— verify topo order and per-cell worktree paths print correctlyuv run otdf-sdk-mgr orchestrate run <spec> --only platform-proto— verify worktree creation, subagent dispatch, and PR URL extraction on one celldepends_on) — verify wave ordering and parallel dispatch within a wave.claude/tmp/runs/<KEY>-<cell>.jsonl— confirm the subagent invokedgh pr create --draftand printed a PR URLOut of scope (deferred follow-ups)
feature-orchestrate status <spec>— pollsgh pr listto update on PR progress per cell.feature-orchestrate --retry <cell-key>— re-launch a failed subagent (v1 just reports the failure and tells the user to re-invoke with--only <cell>).Featuremodel + schema-dump entry (informal YAML for v1).--model sonnet).--permission-mode acceptEdits+ auto-written.claude/settings.json; a tighter per-repo allowlist is a follow-up.Jira: https://virtru.atlassian.net/browse/DSPX-3302
🤖 Generated with Claude Code