EQUIPA

Your AI development team.

European Portuguese for "team" — a self-improving AI agent orchestrator that builds, reviews, tests, and secures your code.

Software development takes dedication, perseverance, and knowledge. No tool changes that. What EQUIPA does is multiply your productivity — it handles the repetitive, parallelizable parts of the workflow so you can focus on the hard problems that actually need a human brain.

You talk to Claude. Describe what you want in plain English — "fix the login bug", "add search to the dashboard", "run a security review" — and Claude handles the rest. It creates tasks, dispatches specialized AI agents, monitors their work, retries on failure, and reports results back to you.

Then it gets better at its job. A three-layer self-improvement system benchmarks agent performance, evolves prompts using genetic optimization, and auto-rolls back changes that hurt results. Your agents tomorrow are measurably better than your agents today.

This is a productivity tool, not a magic wand. You still need to review the output, understand your codebase, and make the real decisions. EQUIPA just means you're not doing the grunt work alone.

What It Actually Does

You: "Build user authentication with Google OAuth"

EQUIPA:
  Planning: broke feature into 5 tasks with dependency graph
  Tasks 1-3 dispatched in parallel (no dependencies)
  Task 4 waiting on task 2 (needs routes)
  Task 5 waiting on all (integration tests)

  Developer agent -> wrote OAuth config, routes, middleware, UI
  Tester agent -> 8 integration tests, all passing
  Security reviewer -> flagged session token rotation issue
  Done. 3 tasks passed first try, 2 needed one retry.

You direct. EQUIPA executes. You review and decide what ships.

Features

Dev-Test Iteration Loop

Every coding task runs through a developer -> tester cycle. If tests fail, the developer gets the failure context and tries again — up to 5 cycles. No human babysitting required.

Self-Improving Agents

Three systems work together:

ForgeSmith extracts lessons from failures and tunes configuration
GEPA evolves agent prompts through CMA-ES genetic optimization
SIMBA synthesizes behavioral rules from recurring failure patterns

Bad changes get auto-rolled back when effectiveness scores drop below threshold.

Semantic Memory

Lessons and past experiences are embedded as vectors (via Ollama) and retrieved by semantic similarity — not just keyword matching. When a task resembles something EQUIPA has seen before, relevant lessons get injected into the agent's context automatically. A knowledge graph tracks which lessons are most connected and influential, prioritizing the most useful ones via PageRank.

Falls back to keyword matching if Ollama is not available. Works fine either way.

Cost-Based Model Routing

EQUIPA analyzes task descriptions and automatically routes simple tasks to cheaper models and complex tasks to more capable ones. A circuit breaker degrades gracefully when a model has consecutive failures. Manual model overrides still take priority — auto-routing only kicks in as a fallback.

MCP Server

EQUIPA exposes itself as an MCP (Model Context Protocol) server. Any IDE that supports MCP — Claude Code, VS Code, Cursor, JetBrains — can dispatch tasks, check status, query lessons, and read project context without touching the CLI. Pure Python, JSON-RPC over stdio, zero dependencies.

# Register in any Claude Code session
claude mcp add equipa python3 /path/to/equipa/equipa/mcp_server.py

15 Specialized Agent Roles

Role	What It Does
Developer	Writes code, navigates codebases, plans implementations
Tester	Writes and runs tests, validates developer output
Security Reviewer	Deep audit with 7 security skills, static analysis, variant analysis
Code Reviewer	Quality, patterns, best practices, architecture feedback
Debugger	Hypothesis-driven bug investigation, traces root causes
Planner	Breaks features into task lists with dependency graphs
Frontend Designer	UI/UX focused development
Evaluator	Assesses implementations against requirements
Integration Tester	Tests cross-boundary component interactions
QA Tester	End-to-end quality assurance
Researcher	Deep-dives into technologies and approaches
Economy Tester	Game economy balance testing
Multiplayer Tester	Multiplayer game flow testing
Story Tester	Narrative and story flow validation
World Builder	Game world and lore construction

Git Worktree Isolation

Parallel tasks each get their own git branch. Changes are isolated — one task cannot break another. Successful work merges back automatically.

Cost Controls

Per-task budgets scale by complexity (simple/medium/complex/epic). Agents that waste turns reading without writing get warned and then killed. You set the limits, EQUIPA enforces them.

Language Detection

Detects your project language (Python, TypeScript, Go, C#, Java, Rust, JavaScript) and injects language-specific best practices. Agents write idiomatic code for your stack.

Zero Dependencies

Pure Python standard library. No pip install, no virtualenv, no supply chain risk. Copy the folder, run the script. Works on any machine with Python 3.10+.

Architecture

equipa/                    # 21 modules, ~11,500 lines
|-- cli.py                 # Entry point and argument parsing
|-- dispatch.py            # Task scanning, scoring, parallel dispatch
|-- loops.py               # Dev-test iteration loop
|-- agent_runner.py        # Agent subprocess management and streaming
|-- prompts.py             # System prompt construction with token budgeting
|-- monitoring.py          # Stuck detection, loop detection, cost tracking
|-- embeddings.py          # Ollama vector embeddings + cosine similarity
|-- routing.py             # Complexity scoring + cost-based model routing
|-- graph.py               # Knowledge graph, PageRank, community detection
|-- mcp_server.py          # MCP server (JSON-RPC over stdio)
|-- db.py                  # Database connection and schema management
|-- tasks.py               # Task fetching and project context
|-- lessons.py             # Episodic memory, Q-values, vector retrieval
|-- parsing.py             # Agent output parsing and compaction
|-- security.py            # Skill integrity verification
|-- preflight.py           # Build checking, dependency installation
|-- checkpoints.py         # Task checkpointing for crash recovery
|-- messages.py            # Inter-agent messaging
|-- reflexion.py           # Post-task self-reflection
|-- roles.py               # Role configuration and model selection
|-- constants.py           # Configuration constants
+-- git_ops.py             # Git operations and language detection

Self-improvement lives outside the package:

forgesmith.py — Lesson extraction and configuration tuning
forgesmith_gepa.py — CMA-ES prompt evolution with A/B testing
scripts/forgesmith_simba.py — Behavioral rule synthesis from failure patterns
scripts/autoresearch_loop.py — Nightly benchmarking and optimization

Quick Start

# Clone
git clone https://github.com/sbknana/equipa.git
cd equipa

# Setup
python equipa_setup.py

# Run a task
python forge_orchestrator.py --task 42 --dev-test -y

# Run tasks in parallel
python forge_orchestrator.py --tasks 42-50 --dev-test -y

# Auto-dispatch pending work
python forge_orchestrator.py --dispatch -y

# Start MCP server (for IDE integration)
python -m equipa.mcp_server

# Run self-improvement
python forgesmith.py --auto

Requirements

Python 3.10+ (no pip install needed)
Claude Code CLI (claude) or Ollama for local LLM
Git (for worktree isolation)

Configuration

cp dispatch_config.example.json dispatch_config.json
cp forge_config.example.json forge_config.json

Key settings in dispatch_config.json:

model — default model (sonnet/opus/haiku)
features.vector_memory — semantic lesson retrieval via Ollama
features.auto_model_routing — cost-based model selection
features.knowledge_graph — PageRank lesson prioritization

Honest Limitations

Agents still get stuck. Complex tasks can trigger analysis paralysis. The early termination system catches this, but some tasks need multiple attempts.
Git merges are not perfect. Parallel task merges occasionally need manual intervention.
Self-improvement needs data. ForgeSmith needs 20-30 task completions before patterns emerge.
Tests required. The dev-test loop only works if your project has a working test suite.
Context limits are real. Very long tasks can exhaust the LLM context window. Checkpointing helps but does not eliminate the problem.
Vector memory needs Ollama. Without it, falls back to keyword matching — still works, just less smart.

Production Use

EQUIPA has been running in production since January 2026, building real software across multiple projects. It is not a demo or proof of concept — it is a tool we use every day.

Documentation

Quick Start — Get running in 5 minutes
User Guide — Day-to-day usage
Architecture — How the pieces fit together
API Reference — Module and function reference
Custom Agents — Adding your own agent roles
Local LLM Support — Using Ollama instead of Claude
Deployment — Server and CI/CD setup
Contributing — How to contribute

License

Apache 2.0

Credits

Built by Forgeborn. Vibe coded with Claude.

Name		Name	Last commit message	Last commit date
Latest commit History 456 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
equipa		equipa
hooks		hooks
prompts		prompts
scripts		scripts
skills		skills
task-outputs		task-outputs
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
analyze_performance.py		analyze_performance.py
autoresearch_loop.py		autoresearch_loop.py
autoresearch_prompts.py		autoresearch_prompts.py
db_migrate.py		db_migrate.py
dispatch_config.example.json		dispatch_config.example.json
equipa_setup.py		equipa_setup.py
forge_config.example.json		forge_config.example.json
forge_orchestrator.py		forge_orchestrator.py
forgesmith.py		forgesmith.py
forgesmith_backfill.py		forgesmith_backfill.py
forgesmith_config.example.json		forgesmith_config.example.json
forgesmith_gepa.py		forgesmith_gepa.py
forgesmith_impact.py		forgesmith_impact.py
forgesmith_simba.py		forgesmith_simba.py
goals.example.json		goals.example.json
hooks.json.example		hooks.json.example
lesson_sanitizer.py		lesson_sanitizer.py
mcp_config.example.json		mcp_config.example.json
nightly_review.py		nightly_review.py
ollama_agent.py		ollama_agent.py
rubric_quality_scorer.py		rubric_quality_scorer.py
schema.sql		schema.sql
skill_manifest.json		skill_manifest.json
sparkforge_estimator_research.py		sparkforge_estimator_research.py
test_integration_tool_result.py		test_integration_tool_result.py
test_retry_logic.py		test_retry_logic.py
test_tool_result_simple.py		test_tool_result_simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EQUIPA

What It Actually Does

Features

Dev-Test Iteration Loop

Self-Improving Agents

Semantic Memory

Cost-Based Model Routing

MCP Server

15 Specialized Agent Roles

Git Worktree Isolation

Cost Controls

Language Detection

Zero Dependencies

Architecture

Quick Start

Requirements

Configuration

Honest Limitations

Production Use

Documentation

License

Credits

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EQUIPA

What It Actually Does

Features

Dev-Test Iteration Loop

Self-Improving Agents

Semantic Memory

Cost-Based Model Routing

MCP Server

15 Specialized Agent Roles

Git Worktree Isolation

Cost Controls

Language Detection

Zero Dependencies

Architecture

Quick Start

Requirements

Configuration

Honest Limitations

Production Use

Documentation

License

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages