A no-RAG, cache-aware multi-agent framework for extremely long, dense contexts (1B+ tokens).
Colony is a framework for building tightly-coupled, self-improving, self-aware multi-agent systems (agent colonies) that reason over extremely long context without retrieval-augmented generation (RAG). Instead of fragmenting context into chunks and retrieving snippets, Colony keeps the entire context live across a cluster of LLMs through a virtual memory system that manages GPU KV caches the same way an operating system manages (almost unlimited) virtual memory over finite physical memory.
| 💡 Colony's Vision |
| Colony's goal is to be the most efficient country of geniuses in a datacenter — the ideal substrate for civilization-building AI. |
| Colony is still in pre-alpha early access. The API is not stable and the framework is under active development. We welcome feedback and contributions, but be aware that breaking changes may occur. |
| ℹ️ Who should use Colony? |
| Colony is designed for engineers building complex multi-agent systems that require reasoning over extremely long contexts. It is not a general-purpose agent framework or a consumer product. If you are looking for a simple agent orchestration tool or a way to add tool use to an LLM, Colony may not be the right fit. It runs over a Ray cluster (local or in the cloud) and it can be resource-intensive and expensive. |
Most agent frameworks treat context as something to retrieve or manage. Colony treats it as something to be brought to life. Certain domains require reasoning deep and wide. Examples include:
- Scientific research: synthesizing novel insights from a vast literature requires complex integration
- Cyber-physical systems: understanding the full context of a complex system (code, physical environment, requirements, regulations) is essential for architecting solutions and identifying edge cases and failure modes
- Systemic vulnerability analysis: identifying security risks in a complex system by reasoning over a large attack surface and many potential interactions.
- Business intelligence: making strategic decisions based on a wide range of internal and external data, where relevant information may be siloed and require cross-domain reasoning
- Economic modeling: simulating and understanding complex economic systems with many interacting agents and factors and long supply chains
- Long-form content creation: writing a book or comprehensive report that requires maintaining a coherent narrative across a large amount of information
Colony's core innovations are:
-
NoRAG -- Colony keeps the full context live and accessible, not filtered through retrieval. Colony manages all kinds of context (code, text, data) through distributed KV cache paging, not vector search.
-
Cache-Aware Agents -- Agents are aware of what's in GPU memory (at the cluster level) and consciously plan their work to maximize cache reuse.
-
Agents All the Way Down -- General intelligence emerges from the right composition of agent capabilities and multi-agent patterns. Every cognitive process -- attention, memory, planning, confidence tracking -- is a pluggable policy with a default implementation.
-
Distributed Reasoning Patterns -- Multi-agent game protocols (hypothesis games, contract nets, negotiation) combat specific LLM failure modes: hallucination, laziness, and goal drift.
Read the full Philosophy for the ideas behind the framework.
P.S. Colony does not preclude agents from using retrieval or vector search -- those can be implemented as capabilities that agents use when appropriate. Colony's point is that retrieval is not the only way to manage long context, and for certain domains, it's not the best way.
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Agent Colony │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Agent 1 │ │ Agent 2 │ │ Agent N │ │
│ │ Capabilities │ │ Capabilities │ │ Capabilities │ ... │
│ │ Action Policy │ │ Action Policy │ │ Action Policy │ │
│ │ Planner (LLM) │ │ Planner (LLM) │ │ Planner (LLM) │ │
│ └──────┬─────────┘ └──────┬─────────┘ └──────┬─────────┘ │
│ read/write/query/mmap │ │ │ infer_with_suffix │
│ ┌────────────────────┴────────────────────┴────────────────────┴──────┐ page_graph_ops │
│ │ ▼ │
│ │ ┌────────────────────────────┐ ┌────────────────────────────────────────┐ │
│ │ │ Blackboard (Redis) │ │ Virtual Context Memory (VCM) │ │
│ │ │ │ │ │ │
│ ┠───►│ Shared state & events │ │ Page Table · Page Graph │ │
│ │ │ OCC · Memory scopes │ │ Cache Scheduling · Page Faults │ │
│ │ │ Agent coordination │ │ │ │
│ │ └─────────────┬──────────────┘ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ │ │ LLM N1 │ │ LLM N2 │ ... │ │
│ │ │ mmap/munmap/invalidate │ │ KV Cache │ │ KV Cache │ │ │
│ │ └───────────────────────────────►│ └──────────┘ └──────────┘ │ │
│ │ mmap/munmap/invalidate │ │ │
│ │ ┌───────────────────────────────►│ │ │
│ │ │ │ Context Sources (mapped as pages): │ │
│ │ ┌─────────────┴──────────────┐ │ ┌────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │ External Sources │ │ │ Repos │ │Knowledge │ │Blackbrd │ │ │
│ └───►│ Git repos, documents, │ │ │ │ │ Bases │ │ Data │ │ │
│ │ knowledge bases, data │ │ └────────┘ └──────────┘ └─────────┘ │ │
│ └────────────────────────────┘ └────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
Each Agent composes pluggable capabilities (memory, attention, games, confidence tracking, grounding, reflection, cache awareness, etc.) coordinated by an ActionPolicy that consults an LLM Planner. Agents share state through a Redis-backed Blackboard with optimistic concurrency control (OCC) and causal ordering. The Virtual Context Memory (VCM) manages distributed GPU KV caches as pages, enabling agents to reason over contexts far larger than any single model's window.
See the full Architecture docs.
pip install polymathera-colonyWith optional extras:
pip install polymathera-colony[code_analysis] # Code analysis tools
pip install polymathera-colony[gpu] # GPU inference (vLLM, PyTorch)
pip install polymathera-colony[cpu] # CPU-only inference (Anthropic API)
pip install polymathera-colony --all-extras # EverythingColony ships with colony-env, a CLI tool that spins up a local Ray cluster + Redis using Docker Compose. The only prerequisite is Docker.
# Start the cluster (builds image on first run)
colony-env up
# Generate a sample analysis config
polymath init-config --output my_analysis.yaml
# Run a code analysis over a local codebase
colony-env run /path/to/codebase --config my_analysis.yaml
# Check service status
colony-env status
# Open the web dashboard
colony-env dashboard
# Scale workers
colony-env up --workers 3
# Tear down
colony-env down
# Verify prerequisites
colony-env doctorAll Colony dependencies run inside Docker -- no local GPU drivers, Ray, or Redis installation required. The colony-env run command copies your codebase to be analyzed into the cluster and executes inside the Ray head container with full access to the framework.
Services started by colony-env up:
| Service | Port | Description |
|---|---|---|
| Colony dashboard | localhost:8080 |
Web UI for agents, sessions, VCM |
| Ray dashboard | localhost:8265 |
Cluster monitoring UI |
| Ray client | localhost:10001 |
Ray client connection |
| Redis | localhost:6379 |
State management backend |
The Colony dashboard starts automatically with colony-env up at localhost:8080. It provides:
- Overview — cluster health, application deployments, quick stats
- Agents — list registered agents, view state, capabilities, and details
- Sessions — browse sessions and their agent runs with token usage
- VCM — page table, working set, and virtual context statistics
- Traces — detailed tracing of agent actions, VCM operations, and system events for debugging and performance analysis
# Run the agent colony
colony-env down && colony-env up --workers 3 && colony-env run --local-repo /path/to/codebase --config my_analysis.yaml --verbose
# Open the dashboard in your browser
colony-env dashboard
# Use a custom port (must match COLONY_DASHBOARD_UI_PORT)
colony-env dashboard --port 9090For frontend development, run the Vite dev server on the host with hot-reload:
cd src/polymathera/colony/web_ui/frontend
npm install
npm run dev # Starts on localhost:5173, proxies /api to localhost:8080| Feature | Description | Docs |
|---|---|---|
| Virtual Context Memory | OS-style virtual memory for LLM KV caches with page tables and cache-aware scheduling | VCM |
| Agent Capabilities | Composable cognitive modules (memory, attention, games, confidence) attached to agents via AOP-inspired patterns | Agent System |
| Action Policies | LLM-centric planning with Model Predictive Control -- the LLM is the planner, not the framework | Action Policies |
| Blackboard | Redis-backed shared state with optimistic concurrency, causal timelines, and event-driven coordination | Blackboard |
| Memory Hierarchies | Unified memory system with sensory, working, short-term, and long-term memory -- all backed by blackboards | Memory |
| Game Engine | Hypothesis games, contract nets, negotiation, and consensus protocols for multi-agent coordination | Games |
| Hook System | AOP-inspired hooks for cross-cutting concerns (logging, tracing, metrics, memory triggers) | Hooks |
git clone https://github.com/polymathera/colony.git
cd colony
poetry install --all-extraspytest src/ --timeout=120 -x -qpoetry run mkdocs serve --livereload # Local docs server at http://127.0.0.1:8000/
poetry run mkdocs build # Build static siteWe welcome contributions. See CONTRIBUTING.md for development setup, code conventions, and the PR process.
Apache 2.0 -- see LICENSE.