Your AI development team.
European Portuguese for "team" — a self-improving AI agent orchestrator that builds, reviews, tests, and secures your code.
Software development takes dedication, perseverance, and knowledge. No tool changes that. What EQUIPA does is multiply your productivity — it handles the repetitive, parallelizable parts of the workflow so you can focus on the hard problems that actually need a human brain.
You talk to Claude. Describe what you want in plain English — "fix the login bug", "add search to the dashboard", "run a security review" — and Claude handles the rest. It creates tasks, dispatches specialized AI agents, monitors their work, retries on failure, and reports results back to you.
Then it gets better at its job. A three-layer self-improvement system benchmarks agent performance, evolves prompts using genetic optimization, and auto-rolls back changes that hurt results. Your agents tomorrow are measurably better than your agents today.
This is a productivity tool, not a magic wand. You still need to review the output, understand your codebase, and make the real decisions. EQUIPA just means you're not doing the grunt work alone.
You: "Build user authentication with Google OAuth"
EQUIPA:
Planning: broke feature into 5 tasks with dependency graph
Tasks 1-3 dispatched in parallel (no dependencies)
Task 4 waiting on task 2 (needs routes)
Task 5 waiting on all (integration tests)
Developer agent -> wrote OAuth config, routes, middleware, UI
Tester agent -> 8 integration tests, all passing
Security reviewer -> flagged session token rotation issue
Done. 3 tasks passed first try, 2 needed one retry.
You direct. EQUIPA executes. You review and decide what ships.
Every coding task runs through a developer -> tester cycle. If tests fail, the developer gets the failure context and tries again — up to 5 cycles. No human babysitting required.
Three systems work together:
- ForgeSmith extracts lessons from failures and tunes configuration
- GEPA evolves agent prompts through CMA-ES genetic optimization
- SIMBA synthesizes behavioral rules from recurring failure patterns
Bad changes get auto-rolled back when effectiveness scores drop below threshold.
Lessons and past experiences are embedded as vectors (via Ollama) and retrieved by semantic similarity — not just keyword matching. When a task resembles something EQUIPA has seen before, relevant lessons get injected into the agent's context automatically. A knowledge graph tracks which lessons are most connected and influential, prioritizing the most useful ones via PageRank.
Falls back to keyword matching if Ollama is not available. Works fine either way.
EQUIPA analyzes task descriptions and automatically routes simple tasks to cheaper models and complex tasks to more capable ones. A circuit breaker degrades gracefully when a model has consecutive failures. Manual model overrides still take priority — auto-routing only kicks in as a fallback.
EQUIPA exposes itself as an MCP (Model Context Protocol) server. Any IDE that supports MCP — Claude Code, VS Code, Cursor, JetBrains — can dispatch tasks, check status, query lessons, and read project context without touching the CLI. Pure Python, JSON-RPC over stdio, zero dependencies.
# Register in any Claude Code session
claude mcp add equipa python3 /path/to/equipa/equipa/mcp_server.py| Role | What It Does |
|---|---|
| Developer | Writes code, navigates codebases, plans implementations |
| Tester | Writes and runs tests, validates developer output |
| Security Reviewer | Deep audit with 7 security skills, static analysis, variant analysis |
| Code Reviewer | Quality, patterns, best practices, architecture feedback |
| Debugger | Hypothesis-driven bug investigation, traces root causes |
| Planner | Breaks features into task lists with dependency graphs |
| Frontend Designer | UI/UX focused development |
| Evaluator | Assesses implementations against requirements |
| Integration Tester | Tests cross-boundary component interactions |
| QA Tester | End-to-end quality assurance |
| Researcher | Deep-dives into technologies and approaches |
| Economy Tester | Game economy balance testing |
| Multiplayer Tester | Multiplayer game flow testing |
| Story Tester | Narrative and story flow validation |
| World Builder | Game world and lore construction |
Parallel tasks each get their own git branch. Changes are isolated — one task cannot break another. Successful work merges back automatically.
Per-task budgets scale by complexity (simple/medium/complex/epic). Agents that waste turns reading without writing get warned and then killed. You set the limits, EQUIPA enforces them.
Detects your project language (Python, TypeScript, Go, C#, Java, Rust, JavaScript) and injects language-specific best practices. Agents write idiomatic code for your stack.
Pure Python standard library. No pip install, no virtualenv, no supply chain risk. Copy the folder, run the script. Works on any machine with Python 3.10+.
equipa/ # 21 modules, ~11,500 lines
|-- cli.py # Entry point and argument parsing
|-- dispatch.py # Task scanning, scoring, parallel dispatch
|-- loops.py # Dev-test iteration loop
|-- agent_runner.py # Agent subprocess management and streaming
|-- prompts.py # System prompt construction with token budgeting
|-- monitoring.py # Stuck detection, loop detection, cost tracking
|-- embeddings.py # Ollama vector embeddings + cosine similarity
|-- routing.py # Complexity scoring + cost-based model routing
|-- graph.py # Knowledge graph, PageRank, community detection
|-- mcp_server.py # MCP server (JSON-RPC over stdio)
|-- db.py # Database connection and schema management
|-- tasks.py # Task fetching and project context
|-- lessons.py # Episodic memory, Q-values, vector retrieval
|-- parsing.py # Agent output parsing and compaction
|-- security.py # Skill integrity verification
|-- preflight.py # Build checking, dependency installation
|-- checkpoints.py # Task checkpointing for crash recovery
|-- messages.py # Inter-agent messaging
|-- reflexion.py # Post-task self-reflection
|-- roles.py # Role configuration and model selection
|-- constants.py # Configuration constants
+-- git_ops.py # Git operations and language detection
Self-improvement lives outside the package:
forgesmith.py— Lesson extraction and configuration tuningforgesmith_gepa.py— CMA-ES prompt evolution with A/B testingscripts/forgesmith_simba.py— Behavioral rule synthesis from failure patternsscripts/autoresearch_loop.py— Nightly benchmarking and optimization
# Clone
git clone https://github.com/sbknana/equipa.git
cd equipa
# Setup
python equipa_setup.py
# Run a task
python forge_orchestrator.py --task 42 --dev-test -y
# Run tasks in parallel
python forge_orchestrator.py --tasks 42-50 --dev-test -y
# Auto-dispatch pending work
python forge_orchestrator.py --dispatch -y
# Start MCP server (for IDE integration)
python -m equipa.mcp_server
# Run self-improvement
python forgesmith.py --auto- Python 3.10+ (no pip install needed)
- Claude Code CLI (
claude) or Ollama for local LLM - Git (for worktree isolation)
cp dispatch_config.example.json dispatch_config.json
cp forge_config.example.json forge_config.jsonKey settings in dispatch_config.json:
model— default model (sonnet/opus/haiku)features.vector_memory— semantic lesson retrieval via Ollamafeatures.auto_model_routing— cost-based model selectionfeatures.knowledge_graph— PageRank lesson prioritization
- Agents still get stuck. Complex tasks can trigger analysis paralysis. The early termination system catches this, but some tasks need multiple attempts.
- Git merges are not perfect. Parallel task merges occasionally need manual intervention.
- Self-improvement needs data. ForgeSmith needs 20-30 task completions before patterns emerge.
- Tests required. The dev-test loop only works if your project has a working test suite.
- Context limits are real. Very long tasks can exhaust the LLM context window. Checkpointing helps but does not eliminate the problem.
- Vector memory needs Ollama. Without it, falls back to keyword matching — still works, just less smart.
EQUIPA has been running in production since January 2026, building real software across multiple projects. It is not a demo or proof of concept — it is a tool we use every day.
- Quick Start — Get running in 5 minutes
- User Guide — Day-to-day usage
- Architecture — How the pieces fit together
- API Reference — Module and function reference
- Custom Agents — Adding your own agent roles
- Local LLM Support — Using Ollama instead of Claude
- Deployment — Server and CI/CD setup
- Contributing — How to contribute
Built by Forgeborn. Vibe coded with Claude.