A reliability-focused multi-agent LLM orchestration system designed for long-running software workflows with explicit state management, failure handling, and human-in-the-loop escalation.
Status: Active development. Task lifecycle, retry protocol, and HITL escalation are functional. Supports 7 task states, configurable retries (default 4), and persistence via SQLite/PostgreSQL/MySQL.
Most agent frameworks optimize for impressive demos. This one optimizes for systems that run unattended and fail gracefully.
| Principle | Implementation |
|---|---|
| Explicit state | Blackboard architecture β state lives outside the model, in observable task objects |
| Bounded authority | LLMs propose actions; tools execute through validation layer with logging |
| Failure as expected | Phoenix retry protocol with accumulated context; escalates to human after max retries |
| Human-in-the-loop | HITL is a designed control path, not an exception handler β runs pause and wait for guidance |
| Replayable runs | Full task history persisted; conversation logs saved per task for debugging |
| Isolation | Git worktrees per task β parallel execution without merge conflicts |
For the philosophy behind these decisions, see WHY.md.
- Overview
- Key Features
- Architecture
- Quick Start
- Configuration
- Dashboard
- API Reference
- How It Works
- Project Structure
- Development
- Contributing
The Agent Orchestrator Framework coordinates multi-step software workflows β planning, building, testing, and integration β using specialized agents with explicit state management and controlled execution.
- Explicit task lifecycle: Tasks move through defined states (
plannedβreadyβactiveβawaiting_qaβcomplete/failed) - Separation of reasoning and execution: LLMs decide what to do; tools execute through a validation layer
- Failure handling built-in: Phoenix retry protocol accumulates context across attempts; escalates to human after configurable retries
- Observable state: All task state, agent logs, and decisions are persisted and queryable
Implemented with: LangGraph (state machines), FastAPI (REST/WebSocket), React (dashboard), Git worktrees (isolation)
| Layer | Components | Purpose |
|---|---|---|
| Control Plane | Director, Dispatch Loop, Phoenix Protocol, HITL | Orchestration, retry logic, human escalation |
| Execution Plane | Code Worker, Test Worker, Test Architect, Planner, Research Worker, Writer | Domain-specific task execution |
| Integration Plane | Merge Worker, Git Manager | Code integration, conflict resolution |
| Safety Layer | Guardian (optional, WIP), Strategist (QA) | Drift detection, quality evaluation |
All agents operate on a shared blackboard β state lives outside the model in structured task objects.
Each task executes in its own git worktree, enabling:
- Parallel development without merge conflicts
- Clean rollback on task failure
- Automatic merge to main on QA approval
- LLM-assisted conflict resolution for complex merges
Failed tasks don't just error out:
- Automatic retry with accumulated context
- Previous attempt history provided to next attempt
- Configurable max retries per task (default: 4)
- Human escalation for persistent failures
When tasks exceed retry limits or are interrupted by human intervention:
- Task pauses and waits for human decision - other tasks continue if dependencies allow
- Options: Retry (with modifications), Abandon, Spawn New Task
- Real-time dashboard shows task status and agent logs
- Seamless resume after human intervention
React-based monitoring interface with:
- Live task graph visualization (DAG with dependency arrows)
- Agent conversation logs (full LLM chat history)
- WebSocket-based real-time updates
- Run management (start, stop, restart, cancel)
- Visual task dependency creation (click-to-connect/disconnect)
- Full agent conversation history saved per task
- Survives server restarts
- Multiple database backends: SQLite (default), PostgreSQL, MySQL
- Useful for debugging and post-mortem analysis
- Configurable number of parallel workers (default: 5)
- Non-blocking dispatch loop for maximum parallelism
- Rate-limited API to prevent LLM quota exhaustion
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FastAPI Server (server.py) β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββ β
β β REST API β β WebSocket β β Run Persistence (SQLite) β β
β β /api/v1/* β β Manager β β Checkpointing β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Continuous Dispatch Loop (api/dispatch.py) β
β ββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββ β
β β Director β β β Task Queue β β β Workers (Parallel) β β
β β Node β β (Concurrent) β β code/test/plan/... β β
β ββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Phoenix Protocol β ββββββββββ Strategist (QA Node) β β
β β (Retry & Escalation) β β LLM-based evaluation β β
β ββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Git Worktree Manager (git_manager.py) β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β task_abc123 β β task_def456 β β task_ghi789 β ... β
β β (worktree) β β (worktree) β β (worktree) β β
β βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββ β
β β main β β Merged on QA approval β
β β (branch) β β
β βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Director decomposes objective β creates planner tasks
- Planner creates worker tasks with dependencies
- Dispatch Loop spawns workers for READY tasks (dependencies satisfied)
- Workers execute in isolated worktrees using ReAct agent pattern
- Strategist evaluates completed work against acceptance criteria
- Director promotes successful tasks, retries failures (Phoenix protocol)
- Git Manager merges approved work to main branch
- Python 3.11+
- Node.js 18+ (for dashboard)
- Git
# Clone the repository
git clone https://github.com/maveric/agent-framework.git
cd agent-framework
# Create virtual environment
python -m venv .venv
# Activate (Windows)
.venv\Scripts\activate
# Activate (Unix/macOS)
source .venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Build the dashboard
cd orchestrator-dashboard
npm install
npm run build
cd ..Create a .env file in the project root:
# LLM Providers (at least one required)
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxx
# For local models (optional)
OLLAMA_BASE_URL=http://localhost:11434/v1
# Web search (for research worker)
TAVILY_API_KEY=tvly-xxxxxxxxxxxxx
# Optional: LangSmith tracing
LANGSMITH_API_KEY=your_key_here
LANGSMITH_PROJECT=agent-orchestrator# Start the server (serves both API and dashboard)
python src/server.py
# Open dashboard at http://localhost:8085- Open the dashboard at
http://localhost:8085 - Click "New Run"
- Enter an objective, e.g., "Create a TODO list web app with FastAPI backend and vanilla JS frontend"
- Specify a workspace path (where code will be generated)
- Click "Start Run"
- Watch the agents work in real-time!
Edit src/config.py to customize which LLM models are used:
@dataclass
class OrchestratorConfig:
# Director uses smart model for planning
director_model: ModelConfig = field(default_factory=lambda: ModelConfig(
provider="openai",
model_name="gpt-4.1",
temperature=0.7
))
# Default worker model (can be overridden per worker type)
worker_model: ModelConfig = field(default_factory=lambda: ModelConfig(
provider="glm",
model_name="glm-4.6",
temperature=0.5
))
# Per-worker-type models (optional - falls back to worker_model)
planner_model: Optional[ModelConfig] = None
researcher_model: Optional[ModelConfig] = None
coder_model: Optional[ModelConfig] = None
tester_model: Optional[ModelConfig] = None
# QA strategist
strategist_model: ModelConfig = field(default_factory=lambda: ModelConfig(
provider="openrouter",
model_name="minimax/minimax-m2",
temperature=0.3
))
# Execution limits
max_concurrent_workers: int = 5
max_iterations_per_task: int = 10
worker_timeout: int = 300 # seconds# In config.py
checkpoint_mode: str = "sqlite" # "sqlite", "postgres", "mysql", or "memory"
# PostgreSQL
postgres_uri: Optional[str] = None # Falls back to POSTGRES_URI env var
# MySQL
mysql_uri: Optional[str] = None # Falls back to MYSQL_URI env varenable_guardian: bool = False # Drift detection during execution
enable_webhooks: bool = False # External notifications - WIP| Provider | Config Value | Notes |
|---|---|---|
| OpenAI | openai |
GPT-4, GPT-3.5 |
| Anthropic | anthropic |
Claude 3.5, Claude 3 |
| OpenRouter | openrouter |
Access to 100+ models |
google |
Gemini models | |
| GLM | glm |
ZhipuAI models |
| Ollama | local |
Local models via Ollama |
The dashboard provides real-time visibility into agent operations:
| View | Description |
|---|---|
| Dashboard | List of all runs with status and progress |
| Run Details | Live task graph, agent logs, model config |
| New Run | Create new runs with objective and workspace config |
| Human Queue | Tasks waiting for HITL intervention |
- Nodes = Tasks (color-coded by status)
- Edges = Dependencies (arrows show "depends on")
- Click = View task details and agent conversation
- Link Mode = Click-to-connect/disconnect for adding/removing dependencies
| Status | Color | Description |
|---|---|---|
planned |
Gray | Waiting for dependencies |
ready |
Slate | Dependencies satisfied, ready to run |
active |
Blue | Currently being executed |
awaiting_qa |
Orange | Waiting for QA evaluation |
complete |
Green | Successfully completed |
failed |
Red | Failed (will retry via Phoenix) |
waiting_human |
Yellow | Needs HITL intervention |
abandoned |
Dim | Manually abandoned |
All endpoints are prefixed with /api/v1/.
| Method | Endpoint | Description |
|---|---|---|
GET |
/runs |
List all runs (paginated) |
POST |
/runs |
Create new run |
GET |
/runs/{id} |
Get run details + tasks |
POST |
/runs/{id}/cancel |
Cancel a running task |
POST |
/runs/{id}/restart |
Restart from last checkpoint |
POST |
/runs/{id}/replan |
Trigger dependency rebuild |
| Method | Endpoint | Description |
|---|---|---|
PATCH |
/runs/{id}/tasks/{task_id} |
Update task dependencies |
DELETE |
/runs/{id}/tasks/{task_id} |
Abandon task + replan |
| Method | Endpoint | Description |
|---|---|---|
GET |
/runs/{id}/interrupts |
Check for pending interrupts |
POST |
/runs/{id}/resolve |
Submit human resolution |
POST |
/runs/{id}/tasks/{task_id}/interrupt |
Force interrupt a task |
| Endpoint | Description |
|---|---|
/ws |
Real-time updates (connect with run_id param) |
PLANNED β READY β ACTIVE β AWAITING_QA β COMPLETE
β β
(blocked) FAILED
β
(Phoenix retry)
β
WAITING_HUMAN
Each worker uses a ReAct (Reasoning + Acting) agent loop:
- Reason: LLM analyzes the task and decides next action
- Act: Execute a tool (write_file, run_shell, search, etc.)
- Observe: See tool output
- Repeat until task is complete
Tools use a progressive disclosure system β workers request only what they need.
| Category | Tools |
|---|---|
| Filesystem | read_file, write_file, append_file, list_directory, delete_file |
| Code Execution | run_shell, run_python |
| Git | git_status, git_commit, git_diff, git_log |
| Search | search_codebase, web_search (Tavily) |
| Framework | create_subtasks, post_insight, log_design_decision |
agent-framework/
βββ src/
β βββ server.py # FastAPI app, lifespan, mounts (MAIN ENTRY)
β βββ config.py # Configuration dataclasses
β βββ api/
β β βββ dispatch.py # Continuous dispatch loop (core engine)
β β βββ state.py # Shared API state
β β βββ types.py # API type definitions
β β βββ websocket.py # WebSocket connection manager
β β βββ routes/
β β βββ runs.py # Run CRUD endpoints
β β βββ tasks.py # Task endpoints
β β βββ interrupts.py # HITL endpoints
β β βββ metrics.py # Prometheus metrics endpoint
β β βββ ws.py # WebSocket endpoint
β βββ nodes/
β β βββ director_main.py # Director orchestration logic
β β βββ director/ # Director helper modules
β β β βββ decomposition.py
β β β βββ integration.py
β β β βββ readiness.py
β β β βββ hitl.py
β β β βββ graph_utils.py
β β βββ worker.py # Worker node entry point
β β βββ execution.py # ReAct loop execution
β β βββ guardian.py # Drift detection (optional)
β β βββ strategist.py # QA evaluation node
β β βββ handlers/ # Worker profile handlers
β β β βββ code_handler.py
β β β βββ plan_handler.py
β β β βββ test_handler.py
β β β βββ test_architect_handler.py
β β β βββ research_handler.py
β β β βββ write_handler.py
β β β βββ merge_handler.py
β β βββ tools_binding.py # Tool wrappers for agents
β βββ tools/ # Tool implementations
β β βββ base.py # Tool registry & definitions
β β βββ filesystem_async.py
β β βββ code_execution_async.py
β β βββ git_async.py
β β βββ search_tools.py
β βββ git_manager.py # Worktree management
β βββ llm_client.py # Multi-provider LLM client
β βββ state.py # State definition and reducers
β βββ orchestrator_types.py # Core type definitions
β βββ run_persistence.py # Database state persistence
β βββ task_queue.py # Async task queue
β βββ metrics.py # Prometheus metrics
β βββ async_utils.py # Async helper utilities
β
βββ orchestrator-dashboard/ # React frontend
β βββ src/
β β βββ pages/
β β β βββ Dashboard.tsx
β β β βββ RunDetails.tsx
β β β βββ NewRun.tsx
β β β βββ HumanQueue.tsx
β β βββ components/
β β β βββ TaskGraph.tsx # DAG visualization
β β β βββ InterruptModal.tsx
β β β βββ run-details/ # RunDetails subcomponents
β β βββ api/
β β βββ client.ts
β β βββ websocket.ts
β
βββ Spec/ # Design documentation
β βββ agent_orchestrator_spec_v2.3.md
β βββ dashboard_frontend_spec.md
β βββ future/ # Planned features
β
βββ tests/
β βββ unit/ # Unit tests
β βββ test_task_memories.py # Integration tests
β
βββ docs/ # Additional documentation
βββ requirements.txt
βββ README.md
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src --cov-report=html
# Run specific test file
pytest tests/unit/test_state_reducers.py -v# Run backend with auto-reload
uvicorn src.server:app --reload --port 8085
# Run frontend dev server (separate terminal)
cd orchestrator-dashboard
npm run dev # Runs on port 2999# Type checking (future)
mypy src/
# Linting (future)
ruff check src/This is an active research project. Key areas for contribution:
- Deep task cancellation (subprocess tracking)
- Improved conflict resolution strategies
- Multi-project workspace support
- Streaming LLM responses to UI
- Agent memory compression for long contexts
- Task cost estimation and budgeting
- Plugin system for custom tools
- Kubernetes deployment manifests
- Alternative state backends (Redis, Postgres)
- Multi-user authentication
- Project templates/scaffolding
- Not a general AGI sandbox
- Not optimizing for clever prompting
- Not agent-to-agent roleplay chat
- Not a turnkey SaaS / multi-user system (yet)
-
Database Size:
orchestrator.dbcan grow large with many runs. Periodic cleanup recommended. -
LLM Costs: Each agent iteration makes LLM calls. Monitor your API usage.
-
Blocking Commands: Agents occasionally run blocking commands (servers, watchers). Force-interrupt may be needed.
-
Worktree Cleanup: Old worktrees in
.worktrees/can accumulate. Manual cleanup may be needed.
MIT License - See LICENSE for details.
Built with:
For questions or support, please open an issue on GitHub.