An adaptive, multi-agent development workflow system that orchestrates specialized AI agents through a complexity-aware, phase-gated software development lifecycle. Works with any skills-capable AI harness — Claude Code, zClaw, Gemini CLI, OpenCode, and others.
Instead of a single AI agent trying to do everything, ZFlow dynamically constructs a pipeline using 35 purpose-built agents — each with a focused mission — tailored to the specific complexity of your task.
Building software with AI agents works best when the workflow matches the task and has a clear, narrow focus. A brainstorming agent should think differently than an implementation agent. A security reviewer needs an adversarial mindset that a code-quality checker doesn't. But a trivial bug shouldn't require the same overhead as a major architectural change.
ZFlow applies this principle systematically:
- Adaptive Pipelines, not fixed sequences — chooses from 4 distinct profiles (Quick Fix to Extended) based on a 1-15 complexity score
- Specialized agents, not one monolithic prompt — each phase deploys agents with focused roles and boundaries
- Document-driven handoffs — every phase produces a structured artifact that becomes the input for the next, creating an auditable trail
- Parallel where possible, sequential where necessary — research, review, and QA agents fan out in parallel; phase transitions are gated checkpoints
- No fix without understanding — no implementation without design, no design without research, no debugging fix without root cause
- Intelligent QA Loop-Back — classifies failures (Implementation, Design, Scope) to loop back to the correct layer, not just the previous phase
- Human-in-the-loop — the workflow pauses at critical checkpoints for your review and approval
ZFlow provides two distinct workflows depending on what you're doing:
For building new features, planning functionality, or doing structured end-to-end development.
graph LR
B[Brainstorm] --> R[Research]
R --> D[Design]
D --> Rev[Review]
Rev --> UI{UI work?}
UI -- Yes --> UD[UI Design]
UD --> I[Implement]
UI -- No --> I
I --> QA[QA Audit]
QA --> Doc[Document]
style B fill:#f9f,stroke:#333,stroke-width:2px
style I fill:#bbf,stroke:#333,stroke-width:2px
style QA fill:#bfb,stroke:#333,stroke-width:2px
Brainstorm → Research → Design → Review → [UI Design] → Implement → QA → Document
│ │ │ │ │ │ │ │
scope.md research solution reviewed ui-design code + qa.md commit +
report.md .md solution report.md impl security
.md report.md audit
For fixing bugs, investigating issues, or resolving regressions.
graph LR
Rep[Reproduce] --> Inv[Investigate]
Inv --> RC[Root Cause]
RC --> DF[Design Fix]
DF --> IF[Implement]
IF --> Ver[Verify]
style Rep fill:#f96,stroke:#333,stroke-width:2px
style RC fill:#f66,stroke:#333,stroke-width:2px
style Ver fill:#6f6,stroke:#333,stroke-width:2px
Reproduce → Investigate → Analyze → Design Fix → Implement Fix → Verify
│ │ │ │ │ │
repro.md investigation root fix- code + verification
.md cause.md design.md impl report .md
An AI harness that supports skills and sub-agents. ZFlow works with:
- Claude Code
- zClaw
- Gemini CLI
- OpenCode
- Any other skills-compatible harness
No additional frameworks or dependencies required.
Copy the zflow/ directory into your harness's skills directory:
# Claude Code
cp -r zflow/ ~/.claude/skills/zflow/
# zClaw
cp -r zflow/ ~/.zclaw/skills/zflow/
# Gemini CLI
cp -r zflow/ ~/.gemini/skills/zflow//zflow I want to add a notification system to my appZFlow will:
- Engage you in a guided brainstorming conversation with multiple-choice questions grounded in your actual codebase
- Deploy research agents to analyze your architecture, dependencies, patterns, and tests
- Present 2-3 design approaches for you to choose from, then refine the design section-by-section
- Run fresh review agents to catch gaps, over-engineering, and security concerns
- Implement in parallel, organized by dependency tiers
- Run a full QA sweep including security audit
- Update docs and prepare a commit
Action: A Socratic interviewer agent reads your codebase for context, then asks you guided questions — one at a time, in multiple-choice format with recommendations grounded in your actual project. It assesses scope, surfaces ambiguities, and helps decompose complex requests.
Feedback: Questions like "Based on your current auth setup, how should we handle permissions?" with options that reference your actual codebase patterns.
Artifact:
scope.md— what needs to be built and why, but not how.
Action: 5-6 parallel agents fan out across your codebase — one maps architecture, another traces dependencies, another finds existing patterns, another surveys test infrastructure, and another finds related code. If UI work is detected, a design system scout joins the swarm.
Feedback: "Deploying 6 parallel research agents..." then "All agents complete. Merging findings..."
Artifact:
research-report.md— real codebase context organized by dimension.
Action: A senior architect agent maps your scope against the research findings, then proposes 2-3 solution approaches with trade-offs. You pick one. Then the design is presented section-by-section — architecture, components, data flow, errors, testing, tasks — each approved before the next.
Feedback: Approach comparisons like "Extend Existing Service (Recommended)" vs "New Microservice" with effort, risk, and codebase fit ratings.
Artifact:
solution.md— the full technical design with task breakdown and dependency graph.
Action: 5 fresh agents — with no prior context bias — examine your scope and solution from different angles: missing requirements, over-engineering, security holes, performance concerns, and architecture alignment. The coordinator then runs a structural self-review for completeness and consistency.
Feedback: The
overengineering-criticspecifically enforces simplicity — would a senior engineer say this is overcomplicated?Artifact:
reviewed-solution.md— your solution with adjustments and a full appendix of reviewer findings.
Action: Only triggered when your scope involves UI work. If Pencil.dev MCP tools are available, a design agent creates the interface on a visual canvas — building design tokens, components, and screen layouts — before any implementation code is written. You approve designs via screenshots.
Feedback: If Pencil.dev is not available: You're asked whether to install it or proceed with standard code-first UI development.
Artifact:
ui-design-report.md— design tokens, component specs, layout descriptions, and exported reference images.
Action: Implementation agents are deployed in parallel, organized by dependency tiers. Tier 0 tasks (no dependencies) run first, then Tier 1, and so on. Each agent gets a focused task slice with success criteria and operates under surgical change constraints.
Feedback: "Tier 0: 3 agents running..." then "Tier 1: 2 agents running..."
Artifact: Working code +
impl-report.md— every file changed and any deviations from the design.
Action: 6-7 parallel QA agents check different dimensions: completeness, UX, code quality, test coverage, design alignment, and a deep OWASP Top 10 security audit. If UI work was done, a visual QA agent compares implementation against designs.
Feedback: Issues are categorized: Critical (security), Blocker, Major, Minor, or Note. Critical and blocker issues loop back to Phase 4 for targeted fixes.
Artifact:
qa-report.md— all findings by severity.
Action: A documentation agent updates relevant docs, CHANGELOG, and README based on everything produced. Generates a conventional commit message and stages changes.
Artifact: Updated documentation + commit (requires your approval).
Action: Agent confirms the bug is reproducible, documents exact steps, captures error output, and identifies the minimal reproduction case. Artifact:
repro-report.md
Action: 5 parallel agents trace the issue: backward from the symptom (call chain), backward from invalid data (data flow), similar patterns in the codebase, recent git history, and security impact assessment (can this be exploited?). Artifact:
investigation.md
Action: A deliberation agent synthesizes all findings to identify the true root cause — distinguishing symptom from cause with supporting evidence. Artifact:
root-cause.md
Action: 3 parallel reviewers check the proposed fix: does it address root cause (not just symptom)? Does it introduce regressions? Is it the minimal effective change? Artifact:
fix-design.md
Action: Implementation agent applies the fix. If 3 attempts fail, the issue escalates to architectural review. Artifact: Working code +
fix-impl-report.md
Action: 4 parallel verifiers confirm: bug is fixed, no regressions, similar patterns are checked, and no security vulnerabilities were introduced. Artifact:
verification.md
35 specialized agents, each with a focused mission:
| Agent | Focus |
|---|---|
| Socratic Interviewer | Guided discovery with multiple-choice questions grounded in your codebase |
| Agent | Focus |
|---|---|
| Architecture Scout | Project structure and architectural patterns |
| Dependency Mapper | Import chains and module coupling |
| Pattern Analyzer | Coding conventions and existing implementations |
| Test Surveyor | Test infrastructure, frameworks, and coverage |
| Related Code Finder | Code affected by the proposed changes |
| UI System Scout | Conditional — design system, tokens, component library |
| Agent | Focus |
|---|---|
| Solution Architect | Approach selection and section-by-section design |
| Agent | Focus |
|---|---|
| Gap Detector | Missing requirements and edge cases |
| Overengineering Critic | Simplicity enforcement |
| Security Reviewer | Security implications of the design |
| Performance Reviewer | Performance and scaling concerns |
| Alignment Checker | Architecture fit and consistency |
| Agent | Focus |
|---|---|
| Pencil Designer | Visual canvas design via Pencil.dev |
| Design System Builder | Token and component system |
| UI Review Agent | Accessibility, responsiveness, consistency |
| Agent | Focus |
|---|---|
| Focused Implementer | Single-task implementation with surgical changes |
| UI Implementer | Conditional — implements from Pencil.dev designs |
| Agent | Focus |
|---|---|
| Completeness Checker | Every solution task is implemented |
| UX Reviewer | API ergonomics, error messages, edge cases |
| Code Quality Auditor | Linting, naming, complexity — enforces simplicity |
| Test Coverage Agent | Test quality and edge case coverage |
| Design Alignment QA | Implementation matches the reviewed solution |
| Security Auditor | Full OWASP Top 10 2025 deep audit |
| UI Visual QA | Conditional — design fidelity and accessibility |
| Agent | Focus |
|---|---|
| Reproducer | Minimal bug reproduction |
| Call Chain Tracer | Execution path backward from symptom |
| Data Flow Tracer | Invalid data traced to source |
| Pattern Scanner | Similar patterns that share the bug |
| History Investigator | Git blame/log analysis |
| Security Impact Assessor | Can this bug be exploited? |
| Root Cause Analyst | Synthesize true root cause |
| Fix Designer | Minimal effective fix design |
| Fix Verifier | Fix confirmation + regression check |
| Agent | Focus |
|---|---|
| Documentation Writer | Docs, CHANGELOG, commit message |
Security isn't a checkbox in ZFlow — it's a dedicated workflow dimension:
During Development: The QA phase includes a deep security audit covering the full OWASP Top 10 2025 — broken access control, injection, cryptographic failures, misconfiguration, and more. Every finding includes an attack scenario, not just a code smell.
During Debugging: A security impact assessor evaluates whether bugs can be exploited, what the blast radius would be, and whether fixes introduce new attack surface.
Configurable depth: Set audit_depth to "full" (all OWASP categories), "targeted" (only relevant categories), or "minimal" (top 5). Control the severity threshold for reporting.
ZFlow creates a .zflow/ workspace in your project root on first run. Edit .zflow/config.json to customize:
Control which phases pause for your approval ("human") and which proceed automatically ("auto"):
{
"workflow": {
"gates": {
"brainstorm": "human",
"research": "auto",
"design": "human",
"review": "human",
"implement": "auto",
"qa": "human",
"document": "auto"
}
}
}For smaller tasks, skip phases you don't need:
{
"workflow": {
"skip_phases": ["research"]
}
}Control how many agents run simultaneously:
{
"workflow": {
"max_parallel_agents": 5
}
}{
"security": {
"audit_depth": "full",
"dependency_scan": true,
"secrets_scan": true,
"security_severity_threshold": "medium"
}
}See the default configuration in the design plan.
When ZFlow runs, it creates a .zflow/ directory to track progress:
.zflow/
├── current-phase.json # Active phase tracking
├── config.json # Your preferences
└── phases/
├── 00-brainstorm/
│ └── scope.md
├── 01-research/
│ ├── research-report.md
│ └── agent-reports/ # Individual agent findings
├── 02-design/
│ └── solution.md
├── 03-review/
│ ├── reviewed-solution.md
│ └── reviewer-reports/
├── 03.5-ui-design/ # Only if UI work
├── 04-implement/
│ ├── implementation-plan.md
│ └── impl-report.md
├── 05-qa/
│ └── qa-report.md
└── 06-document/
Resume anytime — if you interrupt ZFlow and run /zflow again, it picks up where you left off.
zflow/ # Copy this folder to your harness's skills directory
├── SKILL.md # Main orchestrator entry point
├── LICENSE.txt # MIT License
│
├── skills/ # Phase sub-skills
│ ├── zflow-brainstorm/SKILL.md
│ ├── zflow-research/SKILL.md
│ ├── zflow-design/SKILL.md
│ ├── zflow-review/SKILL.md
│ ├── zflow-ui-design/SKILL.md
│ ├── zflow-implement/SKILL.md
│ ├── zflow-qa/SKILL.md
│ ├── zflow-document/SKILL.md
│ └── zflow-debug/SKILL.md
│
├── agents/ # 35 agent prompt templates
│ ├── _shared/karpathy-preamble.md
│ ├── brainstorm/
│ ├── research/
│ ├── design/
│ ├── review/
│ ├── ui-design/
│ ├── implement/
│ ├── qa/
│ ├── debug/
│ └── document/
│
├── templates/ # Output document templates
├── references/ # Internal reference documentation
└── scripts/ # Workspace and validation scripts
100 files, ~14,600 lines across skills, agents, templates, references, and evals.
These principles shape every agent's behavior in ZFlow:
Plain Language Communication — Every question, option, explanation, and status update is written in plain, accessible English. Technical jargon is explained in context. Options describe what they do, not what they're called. ZFlow conversations should feel natural whether you're a junior developer or a senior architect.
Think Before Coding — State assumptions explicitly. Present alternatives rather than picking silently. Stop and ask when something is unclear.
Simplicity First — Minimum code that solves the problem. No speculative features, no abstractions for single-use code, no "flexibility" that wasn't requested.
Surgical Changes — Touch only what's necessary. Don't refactor adjacent code. Match existing style. Every changed line must trace directly to the scope.
Goal-Driven Execution — Define success criteria before starting. Each step has a verification check. Strong criteria let agents loop independently.
These rules are enforced at three levels: embedded in every agent's prompt preamble, audited by the overengineering-critic during review, and verified by the code-quality-auditor during QA.
| Command | Workflow |
|---|---|
/zflow |
Development or Debug workflow — auto-detected based on your request |
ZFlow builds on two foundational ideas:
Superpowers — zFlow draws inspiration from the Superpowers skill framework's structured methodology: brainstorming before implementation, writing plans before executing, verification before completion, and phase-gated workflows with human checkpoints. The skill architecture, agent orchestration patterns, and escalation protocols are inspired by Superpowers conventions
Andrej Karpathy's LLM Coding Guidelines — The behavioral rules that govern every ZFlow agent — think before coding, simplicity first, surgical changes, goal-driven execution — are adapted from Karpathy's widely shared principles for effective AI-assisted development. These aren't just documented; they're baked into every agent's prompt as enforceable constraints, with dedicated review and QA agents that specifically audit against them.
Optimizes how the coordinator manages context and delegates work to subagents, reducing token consumption and adding resilience against API rate limits.
- Coordinator is now a pure dispatcher — no longer reads artifacts for analysis, merges reports, or writes outputs itself; delegates all of that to subagents
- Pass paths, not contents — subagents receive file paths and read them themselves, keeping coordinator context lean
- Synthesis agent pattern — a dedicated agent merges worker reports and writes the final phase output
- Rate-limit retry with sequential fallback — if parallel agent spawning hits rate limits or server errors, automatically falls back to sequential deployment
- Token efficiency across all files — trimmed verbose prose, shortened labels, condensed explanations in references, agents, templates, and phase docs
No breaking changes — only coordinator behavior and internal optimization changed.
All user-facing prompts, questions, and templates across ZFlow now use plain, accessible language — understandable by developers of any experience level.
- Added "Communication Style" directive to the orchestrator and all interactive phase docs
- Simplified pipeline proposals, QA gate summaries, and human gate prompts
- Rewrote all 10 brainstorm question examples in everyday language
- Renamed design phase sections from technical jargon to plain English
No breaking changes — only the wording changed, not the underlying logic.
First stable release — marks the transition from experimental workflow system to a production-ready, platform-agnostic skill.
- Migrated from sub-skill invocation to phase document reading — phases are now simple
.mdfiles inphases/that the orchestrator reads and follows directly - Platform agnostic execution — works on any AI coding platform that can read files (Claude Code, Gemini CLI, Copilot, OpenCode, etc.)
- Portable path resolution — all internal references use
${CLAUDE_SKILL_DIR}runtime variable, fixing brittle relative paths - Simplified structure — 9 skill files consolidated into 9 leaner phase docs (30-50% smaller each), eliminating duplicate frontmatter and invocation metadata
- Clearer orchestrator role — "read phases/X.md, follow instructions" instead of "invoke sub-skill X via harness mechanism Y"
- Faster phase transitions — no skill invocation overhead
- Easier customization — edit
.mdfiles directly - Better debugging — single source of truth per phase
- Uniform paths — all phase docs at
phases/<phase>.md
zflow/CHANGELOG.md— version history trackingzflow/RELEASE.md— release notes and migration guidezflow/.claude-plugin/— plugin configuration directory
External workflow unchanged — same phases, same artifacts, same human gates. Internal restructuring only.
This major update transforms ZFlow from a static 8-phase pipeline into an adaptive, complexity-aware orchestration system and modularizes the core engine for better scalability.
- Dynamic Profile Selection: ZFlow now dynamically selects from 4 task-optimized profiles based on a 1-15 complexity score:
- Quick Fix (Trivial): 3-4 agents, abbreviated brainstorm, skips Research/Review phases, and uses "Design Sketches" for speed.
- Standard (Default): The balanced, structured workflow for typical features.
- Full (Complex): Comprehensive 8-phase pipeline with exhaustive Research and Review.
- Extended (Critical): Maximum rigor with multiple QA/Review swarms and structural validation for high-risk changes.
- Complexity Assessment Rubric: Implemented a multi-signal scoring rubric across five dimensions:
- Affected Systems: Counts distinct modules or architectural layers.
- Technical Domains: Varieties of tech stacks (Backend, UI, Database, etc.).
- Existing Patterns: Follows established code vs. requiring new abstractions.
- User Language: Quality and detail level of the initial prompt.
- Ambiguity: Level of technical uncertainty or requirement gaps.
- Pipeline Invariants: Core guarantees (Design-before-Implementation, QA-after-Implementation, Human-in-the-Loop gates) are now enforced regardless of the selected profile.
- Root Cause Layer Classification: Critical/Blocker findings are now categorized into Implementation, Design, Scope, or Unknown.
- Smart Re-entry Protocol:
- Implementation errors trigger targeted re-Implementation.
- Design flaws loop back to the Design phase while attempting to preserve valid implementation work.
- Scope mismatches (e.g., user rejection) loop back to Brainstorm for clarification.
- Artifact Preservation: Logic added to prevent full re-writes by tracking which sections of a solution or implementation are invalidated.
- "Read, Don't Inline" Architecture: Reduced the main
SKILL.mdsize by 50% by extracting content into a new/referencesdirectory:default-config.md: Full JSON schema for ZFlow configuration.pencil-integration.md: Pencil.dev detection flow and decision logic.phase-resumption.md: Logic for detecting interrupts and state checking.error-handling.md: Unified procedures for phase failures and missing artifacts.quick-reference.md: Naming conventions, checklists, and human gate prompt templates.
- Harness-Agnostic Invocation: Sub-skill calling conventions are now independent of specific AI harnesses (Claude, zClaw, Gemini).
- Standardized Karpathy Preamble: All 34 agent prompts now use a unified inclusion note for
agents/_shared/karpathy-preamble.md, ensuring behavioral consistency. - Template Section Classification: 16 templates updated with a three-tier (Required/Expected/Optional) system to reduce boilerplate for simple tasks.
- Abbreviated Brainstorm Mode: Guided path for Trivial tasks reduced to 3-4 targeted questions.
- Design Alignment Logic: Design agents can now operate without Research Reports for "Quick Fix" profiles.
- QA Severity Grading: Improved categorization (Critical, Blocker, Major, Minor, Note) with explicit enforcement rules for loop-backs.
- Security Audit Depth: Standardized
audit_depthsettings across all QA agents.
See the full history in CHANGELOG.md.
MIT — use it, modify it, ship it.