Skip to content

maveric/agent-framework

Repository files navigation

πŸ€– Agent Orchestrator Framework

A reliability-focused multi-agent LLM orchestration system designed for long-running software workflows with explicit state management, failure handling, and human-in-the-loop escalation.

Status Python License

Status: Active development. Task lifecycle, retry protocol, and HITL escalation are functional. Supports 7 task states, configurable retries (default 4), and persistence via SQLite/PostgreSQL/MySQL.


🎯 What Makes This Different

Most agent frameworks optimize for impressive demos. This one optimizes for systems that run unattended and fail gracefully.

Principle Implementation
Explicit state Blackboard architecture β€” state lives outside the model, in observable task objects
Bounded authority LLMs propose actions; tools execute through validation layer with logging
Failure as expected Phoenix retry protocol with accumulated context; escalates to human after max retries
Human-in-the-loop HITL is a designed control path, not an exception handler β€” runs pause and wait for guidance
Replayable runs Full task history persisted; conversation logs saved per task for debugging
Isolation Git worktrees per task β€” parallel execution without merge conflicts

For the philosophy behind these decisions, see WHY.md.


πŸ“‹ Table of Contents


🌟 Overview

The Agent Orchestrator Framework coordinates multi-step software workflows β€” planning, building, testing, and integration β€” using specialized agents with explicit state management and controlled execution.

Core Design

  • Explicit task lifecycle: Tasks move through defined states (planned β†’ ready β†’ active β†’ awaiting_qa β†’ complete/failed)
  • Separation of reasoning and execution: LLMs decide what to do; tools execute through a validation layer
  • Failure handling built-in: Phoenix retry protocol accumulates context across attempts; escalates to human after configurable retries
  • Observable state: All task state, agent logs, and decisions are persisted and queryable

Implemented with: LangGraph (state machines), FastAPI (REST/WebSocket), React (dashboard), Git worktrees (isolation)

System Architecture

Layer Components Purpose
Control Plane Director, Dispatch Loop, Phoenix Protocol, HITL Orchestration, retry logic, human escalation
Execution Plane Code Worker, Test Worker, Test Architect, Planner, Research Worker, Writer Domain-specific task execution
Integration Plane Merge Worker, Git Manager Code integration, conflict resolution
Safety Layer Guardian (optional, WIP), Strategist (QA) Drift detection, quality evaluation

All agents operate on a shared blackboard β€” state lives outside the model in structured task objects.


✨ Key Features

πŸ”€ Git Worktree Isolation

Each task executes in its own git worktree, enabling:

  • Parallel development without merge conflicts
  • Clean rollback on task failure
  • Automatic merge to main on QA approval
  • LLM-assisted conflict resolution for complex merges

πŸ”„ Phoenix Protocol (Retry System)

Failed tasks don't just error out:

  • Automatic retry with accumulated context
  • Previous attempt history provided to next attempt
  • Configurable max retries per task (default: 4)
  • Human escalation for persistent failures

πŸ‘€ Human-in-the-Loop (HITL)

When tasks exceed retry limits or are interrupted by human intervention:

  • Task pauses and waits for human decision - other tasks continue if dependencies allow
  • Options: Retry (with modifications), Abandon, Spawn New Task
  • Real-time dashboard shows task status and agent logs
  • Seamless resume after human intervention

πŸ“Š Real-Time Dashboard

React-based monitoring interface with:

  • Live task graph visualization (DAG with dependency arrows)
  • Agent conversation logs (full LLM chat history)
  • WebSocket-based real-time updates
  • Run management (start, stop, restart, cancel)
  • Visual task dependency creation (click-to-connect/disconnect)

🧠 Task Memory Persistence

  • Full agent conversation history saved per task
  • Survives server restarts
  • Multiple database backends: SQLite (default), PostgreSQL, MySQL
  • Useful for debugging and post-mortem analysis

πŸš€ Concurrent Execution

  • Configurable number of parallel workers (default: 5)
  • Non-blocking dispatch loop for maximum parallelism
  • Rate-limited API to prevent LLM quota exhaustion

πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          FastAPI Server (server.py)                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  REST API    β”‚   β”‚  WebSocket   β”‚   β”‚  Run Persistence (SQLite) β”‚   β”‚
β”‚  β”‚  /api/v1/*   β”‚   β”‚  Manager     β”‚   β”‚  Checkpointing            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Continuous Dispatch Loop (api/dispatch.py)           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  Director  β”‚ β†’  β”‚  Task Queue   β”‚ β†’  β”‚  Workers (Parallel)   β”‚     β”‚
β”‚  β”‚  Node      β”‚    β”‚  (Concurrent) β”‚    β”‚  code/test/plan/...   β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚        β”‚                                          β”‚                    β”‚
β”‚        β–Ό                                          β–Ό                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚  Phoenix Protocol      β”‚ ←────────│  Strategist (QA Node)    β”‚     β”‚
β”‚  β”‚  (Retry & Escalation)  β”‚          β”‚  LLM-based evaluation    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Git Worktree Manager (git_manager.py)              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚  task_abc123  β”‚  β”‚  task_def456  β”‚  β”‚  task_ghi789  β”‚  ...         β”‚
β”‚  β”‚  (worktree)   β”‚  β”‚  (worktree)   β”‚  β”‚  (worktree)   β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚                               β”‚                                        β”‚
β”‚                               β–Ό                                        β”‚
β”‚                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                β”‚
β”‚                        β”‚    main     β”‚  ← Merged on QA approval       β”‚
β”‚                        β”‚  (branch)   β”‚                                β”‚
β”‚                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

State Flow

  1. Director decomposes objective β†’ creates planner tasks
  2. Planner creates worker tasks with dependencies
  3. Dispatch Loop spawns workers for READY tasks (dependencies satisfied)
  4. Workers execute in isolated worktrees using ReAct agent pattern
  5. Strategist evaluates completed work against acceptance criteria
  6. Director promotes successful tasks, retries failures (Phoenix protocol)
  7. Git Manager merges approved work to main branch

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+ (for dashboard)
  • Git

Installation

# Clone the repository
git clone https://github.com/maveric/agent-framework.git
cd agent-framework

# Create virtual environment
python -m venv .venv

# Activate (Windows)
.venv\Scripts\activate

# Activate (Unix/macOS)
source .venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Build the dashboard
cd orchestrator-dashboard
npm install
npm run build
cd ..

Configuration

Create a .env file in the project root:

# LLM Providers (at least one required)
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxx
ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxx

# For local models (optional)
OLLAMA_BASE_URL=http://localhost:11434/v1

# Web search (for research worker)
TAVILY_API_KEY=tvly-xxxxxxxxxxxxx

# Optional: LangSmith tracing
LANGSMITH_API_KEY=your_key_here
LANGSMITH_PROJECT=agent-orchestrator

Running

# Start the server (serves both API and dashboard)
python src/server.py

# Open dashboard at http://localhost:8085

Creating Your First Run

  1. Open the dashboard at http://localhost:8085
  2. Click "New Run"
  3. Enter an objective, e.g., "Create a TODO list web app with FastAPI backend and vanilla JS frontend"
  4. Specify a workspace path (where code will be generated)
  5. Click "Start Run"
  6. Watch the agents work in real-time!

βš™οΈ Configuration

Model Configuration

Edit src/config.py to customize which LLM models are used:

@dataclass
class OrchestratorConfig:
    # Director uses smart model for planning
    director_model: ModelConfig = field(default_factory=lambda: ModelConfig(
        provider="openai",
        model_name="gpt-4.1",
        temperature=0.7
    ))
    
    # Default worker model (can be overridden per worker type)
    worker_model: ModelConfig = field(default_factory=lambda: ModelConfig(
        provider="glm",
        model_name="glm-4.6",
        temperature=0.5
    ))
    
    # Per-worker-type models (optional - falls back to worker_model)
    planner_model: Optional[ModelConfig] = None
    researcher_model: Optional[ModelConfig] = None
    coder_model: Optional[ModelConfig] = None
    tester_model: Optional[ModelConfig] = None
    
    # QA strategist
    strategist_model: ModelConfig = field(default_factory=lambda: ModelConfig(
        provider="openrouter",
        model_name="minimax/minimax-m2",
        temperature=0.3
    ))
    
    # Execution limits
    max_concurrent_workers: int = 5
    max_iterations_per_task: int = 10
    worker_timeout: int = 300  # seconds

Database Backend

# In config.py
checkpoint_mode: str = "sqlite"  # "sqlite", "postgres", "mysql", or "memory"

# PostgreSQL
postgres_uri: Optional[str] = None  # Falls back to POSTGRES_URI env var

# MySQL
mysql_uri: Optional[str] = None  # Falls back to MYSQL_URI env var

Feature Flags

enable_guardian: bool = False      # Drift detection during execution
enable_webhooks: bool = False      # External notifications - WIP

Supported LLM Providers

Provider Config Value Notes
OpenAI openai GPT-4, GPT-3.5
Anthropic anthropic Claude 3.5, Claude 3
OpenRouter openrouter Access to 100+ models
Google google Gemini models
GLM glm ZhipuAI models
Ollama local Local models via Ollama

πŸ“Š Dashboard

The dashboard provides real-time visibility into agent operations:

Views

View Description
Dashboard List of all runs with status and progress
Run Details Live task graph, agent logs, model config
New Run Create new runs with objective and workspace config
Human Queue Tasks waiting for HITL intervention

Task Graph

  • Nodes = Tasks (color-coded by status)
  • Edges = Dependencies (arrows show "depends on")
  • Click = View task details and agent conversation
  • Link Mode = Click-to-connect/disconnect for adding/removing dependencies

Task Statuses

Status Color Description
planned Gray Waiting for dependencies
ready Slate Dependencies satisfied, ready to run
active Blue Currently being executed
awaiting_qa Orange Waiting for QA evaluation
complete Green Successfully completed
failed Red Failed (will retry via Phoenix)
waiting_human Yellow Needs HITL intervention
abandoned Dim Manually abandoned

πŸ”Œ API Reference

All endpoints are prefixed with /api/v1/.

Runs

Method Endpoint Description
GET /runs List all runs (paginated)
POST /runs Create new run
GET /runs/{id} Get run details + tasks
POST /runs/{id}/cancel Cancel a running task
POST /runs/{id}/restart Restart from last checkpoint
POST /runs/{id}/replan Trigger dependency rebuild

Tasks

Method Endpoint Description
PATCH /runs/{id}/tasks/{task_id} Update task dependencies
DELETE /runs/{id}/tasks/{task_id} Abandon task + replan

HITL (Human-in-the-Loop)

Method Endpoint Description
GET /runs/{id}/interrupts Check for pending interrupts
POST /runs/{id}/resolve Submit human resolution
POST /runs/{id}/tasks/{task_id}/interrupt Force interrupt a task

WebSocket

Endpoint Description
/ws Real-time updates (connect with run_id param)

πŸ”§ How It Works

Task Lifecycle

PLANNED β†’ READY β†’ ACTIVE β†’ AWAITING_QA β†’ COMPLETE
             ↓                    ↓
          (blocked)            FAILED
                                 ↓
                          (Phoenix retry)
                                 ↓
                          WAITING_HUMAN

Worker Execution (ReAct Pattern)

Each worker uses a ReAct (Reasoning + Acting) agent loop:

  1. Reason: LLM analyzes the task and decides next action
  2. Act: Execute a tool (write_file, run_shell, search, etc.)
  3. Observe: See tool output
  4. Repeat until task is complete

Available Tools

Tools use a progressive disclosure system β€” workers request only what they need.

Category Tools
Filesystem read_file, write_file, append_file, list_directory, delete_file
Code Execution run_shell, run_python
Git git_status, git_commit, git_diff, git_log
Search search_codebase, web_search (Tavily)
Framework create_subtasks, post_insight, log_design_decision

πŸ“ Project Structure

agent-framework/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ server.py              # FastAPI app, lifespan, mounts (MAIN ENTRY)
β”‚   β”œβ”€β”€ config.py              # Configuration dataclasses
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ dispatch.py        # Continuous dispatch loop (core engine)
β”‚   β”‚   β”œβ”€β”€ state.py           # Shared API state
β”‚   β”‚   β”œβ”€β”€ types.py           # API type definitions
β”‚   β”‚   β”œβ”€β”€ websocket.py       # WebSocket connection manager
β”‚   β”‚   └── routes/
β”‚   β”‚       β”œβ”€β”€ runs.py        # Run CRUD endpoints
β”‚   β”‚       β”œβ”€β”€ tasks.py       # Task endpoints
β”‚   β”‚       β”œβ”€β”€ interrupts.py  # HITL endpoints
β”‚   β”‚       β”œβ”€β”€ metrics.py     # Prometheus metrics endpoint
β”‚   β”‚       └── ws.py          # WebSocket endpoint
β”‚   β”œβ”€β”€ nodes/
β”‚   β”‚   β”œβ”€β”€ director_main.py   # Director orchestration logic
β”‚   β”‚   β”œβ”€β”€ director/          # Director helper modules
β”‚   β”‚   β”‚   β”œβ”€β”€ decomposition.py
β”‚   β”‚   β”‚   β”œβ”€β”€ integration.py
β”‚   β”‚   β”‚   β”œβ”€β”€ readiness.py
β”‚   β”‚   β”‚   β”œβ”€β”€ hitl.py
β”‚   β”‚   β”‚   └── graph_utils.py
β”‚   β”‚   β”œβ”€β”€ worker.py          # Worker node entry point
β”‚   β”‚   β”œβ”€β”€ execution.py       # ReAct loop execution
β”‚   β”‚   β”œβ”€β”€ guardian.py        # Drift detection (optional)
β”‚   β”‚   β”œβ”€β”€ strategist.py      # QA evaluation node
β”‚   β”‚   β”œβ”€β”€ handlers/          # Worker profile handlers
β”‚   β”‚   β”‚   β”œβ”€β”€ code_handler.py
β”‚   β”‚   β”‚   β”œβ”€β”€ plan_handler.py
β”‚   β”‚   β”‚   β”œβ”€β”€ test_handler.py
β”‚   β”‚   β”‚   β”œβ”€β”€ test_architect_handler.py
β”‚   β”‚   β”‚   β”œβ”€β”€ research_handler.py
β”‚   β”‚   β”‚   β”œβ”€β”€ write_handler.py
β”‚   β”‚   β”‚   └── merge_handler.py
β”‚   β”‚   └── tools_binding.py   # Tool wrappers for agents
β”‚   β”œβ”€β”€ tools/                 # Tool implementations
β”‚   β”‚   β”œβ”€β”€ base.py            # Tool registry & definitions
β”‚   β”‚   β”œβ”€β”€ filesystem_async.py
β”‚   β”‚   β”œβ”€β”€ code_execution_async.py
β”‚   β”‚   β”œβ”€β”€ git_async.py
β”‚   β”‚   └── search_tools.py
β”‚   β”œβ”€β”€ git_manager.py         # Worktree management
β”‚   β”œβ”€β”€ llm_client.py          # Multi-provider LLM client
β”‚   β”œβ”€β”€ state.py               # State definition and reducers
β”‚   β”œβ”€β”€ orchestrator_types.py  # Core type definitions
β”‚   β”œβ”€β”€ run_persistence.py     # Database state persistence
β”‚   β”œβ”€β”€ task_queue.py          # Async task queue
β”‚   β”œβ”€β”€ metrics.py             # Prometheus metrics
β”‚   └── async_utils.py         # Async helper utilities
β”‚
β”œβ”€β”€ orchestrator-dashboard/    # React frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   β”œβ”€β”€ Dashboard.tsx
β”‚   β”‚   β”‚   β”œβ”€β”€ RunDetails.tsx
β”‚   β”‚   β”‚   β”œβ”€β”€ NewRun.tsx
β”‚   β”‚   β”‚   └── HumanQueue.tsx
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ TaskGraph.tsx  # DAG visualization
β”‚   β”‚   β”‚   β”œβ”€β”€ InterruptModal.tsx
β”‚   β”‚   β”‚   └── run-details/   # RunDetails subcomponents
β”‚   β”‚   └── api/
β”‚   β”‚       β”œβ”€β”€ client.ts
β”‚   β”‚       └── websocket.ts
β”‚
β”œβ”€β”€ Spec/                      # Design documentation
β”‚   β”œβ”€β”€ agent_orchestrator_spec_v2.3.md
β”‚   β”œβ”€β”€ dashboard_frontend_spec.md
β”‚   └── future/                # Planned features
β”‚
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit/                  # Unit tests
β”‚   └── test_task_memories.py  # Integration tests
β”‚
β”œβ”€β”€ docs/                      # Additional documentation
β”œβ”€β”€ requirements.txt
└── README.md

πŸ›  Development

Running Tests

# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test file
pytest tests/unit/test_state_reducers.py -v

Development Server

# Run backend with auto-reload
uvicorn src.server:app --reload --port 8085

# Run frontend dev server (separate terminal)
cd orchestrator-dashboard
npm run dev  # Runs on port 2999

Code Quality

# Type checking (future)
mypy src/

# Linting (future)
ruff check src/

🀝 Contributing

This is an active research project. Key areas for contribution:

High Priority

  • Deep task cancellation (subprocess tracking)
  • Improved conflict resolution strategies
  • Multi-project workspace support
  • Streaming LLM responses to UI

Medium Priority

  • Agent memory compression for long contexts
  • Task cost estimation and budgeting
  • Plugin system for custom tools
  • Kubernetes deployment manifests

Low Priority

  • Alternative state backends (Redis, Postgres)
  • Multi-user authentication
  • Project templates/scaffolding

Non-goals

  • Not a general AGI sandbox
  • Not optimizing for clever prompting
  • Not agent-to-agent roleplay chat
  • Not a turnkey SaaS / multi-user system (yet)

⚠️ Known Limitations

  1. Database Size: orchestrator.db can grow large with many runs. Periodic cleanup recommended.

  2. LLM Costs: Each agent iteration makes LLM calls. Monitor your API usage.

  3. Blocking Commands: Agents occasionally run blocking commands (servers, watchers). Force-interrupt may be needed.

  4. Worktree Cleanup: Old worktrees in .worktrees/ can accumulate. Manual cleanup may be needed.


πŸ“„ License

MIT License - See LICENSE for details.


πŸ™ Acknowledgments

Built with:


For questions or support, please open an issue on GitHub.

About

multi-agent framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors