CLMA — Closed-Loop Multi-Agent Framework

A self-verifying code generation system that combines C++17 orchestration with Python-based multi-agent pipelines and a real-time Web UI. Automates the validation loop — no human-in-the-middle required.

Motivation

Large language models excel at generating plausible code, but generated code frequently fails at execution time. The traditional workflow — the user copies error messages back to the model, the model produces a revised version, and the cycle repeats — is fundamentally a manual feedback loop.

CLMA embeds this verification loop into the framework itself. Given a natural language requirement, the system:

Refines ambiguous requirements into structured specifications
Reasons about algorithmic choices, edge cases, and constraints
Generates executable code
Verifies correctness through execution and rule-based checks
Evaluates output quality across three dimensions
Iterates automatically when quality falls below threshold — no manual intervention required

The result: LLM-generated code that converges to a verifiably correct solution without human oversight.

Architecture Overview

Figure 1: CLMA Web UI — Single Loop mode with real-time flow graph, score gauge, and execution timeline.

┌─────────────────────────────────────────────────────────────────┐
│                    Web UI (Flask + SSE)                          │
│    Dark theme · Real-time SVG flow graph · Score dashboard      │
│    Pan/zoom · Mode selector · Session history · LLM catalog     │
└──────────────────────────┬──────────────────────────────────────┘
                           │ HTTP / Server-Sent Events
┌──────────────────────────▼──────────────────────────────────────┐
│               Python Interface Layer (pybind11)                   │
│    Config management · API adapters · Tool executors             │
│    Scoring engine · Iteration controller · Experience store      │
└──────────────────────────┬──────────────────────────────────────┘
                           │ pybind11 bindings
┌──────────────────────────▼──────────────────────────────────────┐
│                   C++17 Core Engine                              │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐             │
│  │ Orchestrator │ │ Rule Engine  │ │Token Monitor │ LoopCtrl    │
│  └──────────────┘ └──────────────┘ └──────────────┘             │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐             │
│  │PluginManager │ │DAG Processor │ │   Sandbox    │             │
│  └──────────────┘ └──────────────┘ └──────────────┘             │
└─────────────────────────────────────────────────────────────────┘

Core Modules

Module	Language	Responsibility
Orchestrator	C++17	Central scheduler; coordinates agent execution and iteration flow
DAG Processor	C++17	Directed acyclic graph task decomposition with parallel dispatch
RuleEngine	C++17 + YAML	Regex/keyword rule matching; configurable validation methods
TokenMonitor	C++17	Token consumption tracking with budget-aware preemption
LoopController	C++17	Iteration limit enforcement and convergence detection
PluginManager	C++17	Dynamic `.so` hot-loading for extensible agent plugins
Sandbox	Python + subprocess	Isolated code execution environment with timeout enforcement
Framework	Python	Agent prompt management, scoring pipeline, context maintenance

Agent Roles

Agent	Input	Output	Pluggable
Refiner	Raw user query	Structured task specification with extracted constraints	✅
Reasoner	Refined query	Solution steps, algorithm selection, edge case analysis	✅
Solver	Reasoning + execution feedback	Executable code (Python/Bash/C++/JS/Go...)	✅
Verifier	Code + execution results	JSON verdict: hard checks, soft checks, pass/fail	✅
Evaluator	Verification results + execution output	JSON scores: reasonableness, executability, satisfaction	✅

Execution Modes

CLMA selects the optimal execution strategy based on task complexity through automatic classification:

query entry
  │
  ├── Simple ("print Hello World" / "compute 1+1")
  │     └── 🚀 Fast Path ~2s
  │               Direct solver → auto-execute → score from results
  │
  ├── Moderate ("implement binary search" / "write fibonacci")
  │     └── 🔄 Single Closed-Loop ~5s
  │              Refiner → Reasoner → Solver → Verifier → Evaluator ← score feedback
  │
  └── Complex ("design microservice architecture with API gateway, service discovery, circuit breaker")
        └── 🔁 Nested Multi-Loop ~40s
               ┌─ Outer Loop: Strategy Refiner → Strategy Reasoner
               │           ↓
               │      Inner Loop: [Solver → Verifier → Evaluator] (convergence)
               │           ↓
               │      Outer Verifier → Outer Evaluator (strategy alignment)
               └── Outer score below threshold → strategy refinement → re-execute inner loop

Adaptive Agent Network (AAN) — NEW in v1.1

AAN is CLMA's newly introduced self-organizing execution topology. Instead of a fixed pipeline, AAN dynamically selects, composes, and parallelizes agent modules based on query complexity at runtime.

Four topologies, auto-selected:

query entry
  │
  ├── Simple direct task
  │     └── 🔹 Direct Topology: Solver → Evaluator (single agent, minimal path)
  │
  ├── Structured single-context task
  │     └── 🔹 Chain Topology: Refiner → Reasoner → Solver → Verifier → Evaluator
  │
  ├── Multi-module decomposable task
  │     └── 🔸 Parallel Topology: Parser → [Module 1∥Module 2∥...∥Module N] → Integrator → Verifier → Evaluator
  │
  └── Hierarchically complex task
        └── 🌲 Tree Topology: Recursive binary decomposition → parallel leaf execution → hierarchical integration

AAN Router: A lightweight heuristic classifier runs at entry — evaluating query length, keyword presence, and complexity indicators — and selects the optimal topology in <2ms overhead.

Key advantages over static mode selection:

🎯 Automatic: No manual mode switch required — submit any query and AAN picks the right topology
📐 Adaptive: Modular agent topology responds to actual task structure, not a one-size-fits-all pipeline
⚡ Efficient: Simple queries skip unnecessary agents; complex queries gain more processing power
🌲 True Tree mode: Complex multi-module tasks are recursively decomposed into binary subtrees, executed in parallel, and hierarchically merged — a flat parallel pipeline cannot capture structural depth

Benchmark	Query	Direct	Chain	Parallel	Tree
Print numbers	"print 1 to 100"	2.3s / 0.97	—	—	—
Single function	"fibonacci in Python"	—	4.7s / 0.99	—	—
Algorithm	"quicksort from scratch"	—	5.1s / 0.98	—	—
Multi-file	"batch file rename tool"	—	8.1s / 0.98	—	—
Distributed system	"design microservice with auth, API gateway, database"	—	—	22s / 0.97	28s / 0.99
Hierarchical	"build web app with backend, frontend, and database"	—	—	25s / 0.97	30s / 0.99 ✅

AAN Tree mode excels where task substructures have natural decomposition — architectural design, multi-service applications, and systems engineering tasks.

Extremely simple tasks bypass the full pipeline planner overhead:

# Trigger: query length ≤ 60 characters + code keyword match
# Suppressed by: algorithm keywords (sort, search, recursion, etc.)
"print 1 to 100"              → Fast Path ✓
"implement fibonacci in Python" → Fast Path ✗ (contains "fibonacci")

DAG Mode

Complex multi-component tasks are decomposed by the C++ DAG Processor into independent sub-tasks. Each sub-task executes through its own closed-loop verification pipeline; results are aggregated upon completion.

Nested Multi-Loop

Figure 2: CLMA Web UI — Nested Multi-Loop mode with outer strategy loop and inner execution loop visualized in the flow graph.

Outer Loop (Strategy Loop): Defines the architectural strategy and validates strategy alignment
Inner Loop (Execution Loop): Generates code, verifies correctness, evaluates code quality
Convergence Criteria: Iteration terminates when score ≥ threshold — does not exhaust iteration budget unnecessarily

Benchmarks

Measured against DeepSeek API (single LLM call latency ~5-8s):

Task	Fast Path	Single Loop	DAG	Nested Loop
Hello World	2.3s / 0.97	4.7s / 0.99	5.3s / 0.99	—
Fibonacci	—	4.7s / 0.99	5.3s / 0.99	—
Quicksort	—	5.1s / 0.98	—	—
Batch file rename	—	8.1s / 0.98	12.0s / 0.97	—
Microservice architecture	—	—	—	39s / 1.0 ✅
Multi-component project	—	—	~25s / 0.97	—

Mode Selection Guidelines

Scenario	Recommended Mode	Rationale
One-liner, simple command, basic calculation	Fast Path	Minimal overhead, ~2s response
Single function, single file, well-defined requirements	Single Loop	Best latency/quality trade-off, ~5s
Multiple independent modules, parallelizable work	DAG	Modules verified independently, aggregated results
Complex architecture, strategic constraints	Nested Loop	Only viable approach when top-level planning is required

Note on Nested Loop Performance: At 39s total (7 real LLM calls), the nested loop is the slowest mode by wall-clock time, but it is the only mode that reliably produces correct output for strategical tasks. Pre-nested-loop benchmarks showed 0.49 scores with non-convergent behavior — the current iteration converges at 1.0 in a single outer pass.

Quick Start

Prerequisites

# Required
gcc/g++ 9+ or clang 12+
cmake 3.15+
python 3.10+
yaml-cpp      # Ubuntu: apt install libyaml-cpp-dev
sqlite3       # Usually pre-installed
pybind11      # pip install pybind11

# Optional
docker        # Containerized code execution

One-Step Launch (Recommended)

git clone https://github.com/kriely/CLMA.git
cd clma

# 1. Create virtual environment
python3 -m venv venv
source venv/bin/activate
pip install flask flask-cors pybind11

# 2. Build C++ core
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
cd ..

# 3. Configure API (edit config/api_config.json with your key)
#    See "LLM Provider Configuration" below

# 4. Launch Web UI
./run_webui.sh
# → Open http://localhost:5000 in your browser

Step-by-Step Build

Build C++ Core Engine:

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# Run C++ unit tests (62 test cases)
ctest -j$(nproc)

Build Python Bindings + Web UI:

pip install flask flask-cors pybind11

cd build
cmake .. -DBUILD_PYTHON_BINDINGS=ON
make -j$(nproc) clma_core
# The generated .so file is automatically copied to python_interface/

cd ..
./run_webui.sh

Run Python Tests (verify all logic):

source venv/bin/activate
cd tests && python3 -m pytest test_core_python.py -v

# Expected: 46 passed

China-Specific Setup

Users in mainland China may experience API connectivity issues. We recommend:

DeepSeek API (recommended for China users): Set up config/api_config.json with provider: "deepseek" and your DeepSeek API key
Local models: Configure provider: "local" with a Ollama/vLLM endpoint running on the same machine

See LLM Provider Configuration for details.

LLM Provider Configuration

CLMA supports five LLM providers through a unified adapter pattern. All provider-specific logic is encapsulated behind the BaseProvider interface; the framework is provider-agnostic.

Method 1: Direct Configuration

Edit config/api_config.json:

{
  "provider": "deepseek",
  "api_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "base_url": "https://api.deepseek.com/v1",
  "model": "deepseek-chat",
  "max_tokens": 8192,
  "temperature": 0.7,
  "enabled": true
}

Method 2: Web UI Configuration

Navigate to http://localhost:5000 → API configuration button (top-right) → Select provider → Enter API key → Test connection.

Supported Providers

Provider	Config Value	Default Model	Notes
OpenAI	`openai`	gpt-4o	General-purpose, best overall quality
Anthropic	`anthropic`	claude-sonnet-4	Strong code comprehension
DeepSeek	`deepseek`	deepseek-chat	Cost-effective, China-friendly
Gemini	`gemini`	gemini-2.0-flash	Free tier available
Local	`local`	Custom	Ollama/vLLM compatible, fully offline

Configuration file: config/api_config.json LLM catalog (auto-updated): config/llm_catalog.json

CLMA includes an automated LLM catalog updater (scripts/auto_update_providers.py) that syncs the latest model lists from each provider every 72 hours.

API Key Security

API keys are stored locally in config/api_config.json
The framework makes no outbound telemetry calls — all LLM traffic goes directly to the configured provider
config/api_config.json is excluded from version control via .gitignore

Project Structure

clma/
├── src/                    # C++ core engine
│   ├── core/               # Core module implementations
│   │   ├── Orchestrator.cpp    # Central scheduler
│   │   ├── RuleEngine.cpp      # Rule matching engine
│   │   ├── TokenMonitor.cpp    # Token consumption tracking
│   │   ├── LoopController.cpp  # Iteration control
│   │   ├── PluginManager.cpp   # Plugin lifecycle management
│   │   ├── PluginWatcher.cpp   # File-system hot-reload watcher
│   │   ├── Sandbox.cpp         # Sandbox execution
│   │   └── Types.cpp           # Type definitions
│   ├── agents/             # Agent plugin interface (C++ plugins)
│   └── main.cpp            # CLI entry point (optional)
├── include/core/           # C++ headers
├── plugins/                # Agent plugins (.so)
│   ├── agent_refiner/
│   ├── agent_reasoner/
│   ├── agent_solver/
│   ├── agent_verifier/
│   └── agent_evaluator/
├── python_interface/       # Python interface layer
│   ├── core.py             # Framework logic + agent prompts (~2400 LOC)
│   ├── web_app.py          # Flask web application
│   ├── api_providers.py    # 5 LLM provider adapters
│   ├── tool_executor.py    # Sandbox code execution
│   ├── experience_store.py # Experience storage/retrieval
│   ├── session_store.py    # Session persistence
│   └── templates/          # HTML templates
├── tests/                  # Test suites (Google Test + pytest)
│   ├── test_core_python.py # 46 Python unit tests
│   ├── test_*.cpp          # 62 C++ unit tests
│   └── CMakeLists.txt
├── config/                 # Configuration files
│   ├── api_config.json     # LLM provider configuration
│   ├── llm_catalog.json    # LLM model catalog
│   ├── rules/default.yaml  # Rule definitions
│   └── sessions/           # Historical sessions (JSON)
├── docs/                   # Design documents
├── run_webui.sh            # One-click launch script
└── CMakeLists.txt          # Top-level build configuration

Web UI Reference

Interface Components

Component	Location	Description
Query Input	Central text area	Enter requirements; submit via Enter
Architecture Selector	Top button group	Single / DAG / Multi-Loop mode toggle
Mode Toggle	Settings panel	Closed (iterative) / Open (single-pass)
Settings Panel	Gear icon (top-right)	Iteration count, threshold, timeout, token budget
Real-time Flow Graph	Left panel	SVG flow diagram; nodes transition green/red dynamically
Score Dashboard	Right panel	Three scoring dimensions updated in real-time + gauge visualization
Execution Timeline	Bottom panel	Waterfall chart showing per-agent latency
Session History	Left menu	Historical queries with search and replay
API Configuration	Top-right API button	Provider switch and connection test
Plugin Management	Plugin page	View, load, and unload agent plugins

Settings Reference

Parameter	Default	Description
`max_iterations`	10	Maximum iterations before forced termination
`threshold`	0.3	Convergence threshold (iteration stops when score ≥ this value)
`execution_timeout`	120	Code execution timeout in seconds
`mode`	`closed`	`closed` (iterative) / `open` (single-pass)
`arch_mode`	`single`	`single` / `dag` / `multi`

Python API Reference

from core import CLMAFramework

fw = CLMAFramework(
    mode="closed",        # "closed" | "open"
    max_iterations=5,
    threshold=0.7,
    token_budget=10000,
)

# Synchronous query (blocking)
result = fw.process_query("implement quicksort in Python")
print(f"Score: {result['score']['overall']}")
print(f"Code:\n{result['content']}")

# Streaming query (SSE events)
for event in fw.process_query_stream("design a REST API route"):
    if event["event"] == "agent_start":
        print(f"[{event['agent']}] processing...")
    elif event["event"] == "agent_complete":
        print(f"[{event['agent']}] completed ({event['duration_ms']:.0f}ms)")
    elif event["event"] == "iteration":
        print(f"[Iteration {event['iteration']}] score: {event['scores']['overall']:.3f}")
    elif event["event"] == "done":
        print(f"Final result (score: {event['result']['score']['overall']:.3f})")

Development Guide

Adding Rules

Edit config/rules/default.yaml:

rules:
  - pattern: "deploy|build|run"
    validation_method: "execution"
    recommended_tools: ["docker", "shell"]
    weights:
      reasonableness: 0.3
      executability: 0.5
      satisfaction: 0.2
    threshold: 0.3

Adding a New LLM Provider

Register in python_interface/api_providers.py:

@register_provider("my_provider")
class MyProvider(BaseProvider):
    """Custom LLM provider implementation."""
    
    def create_completion(self, messages, **kwargs):
        # Implement your API call logic
        ...

Adding a New Agent Plugin

// plugins/my_agent/plugin.cpp
#include <core/PluginInterface.hpp>

class MyAgent : public AgentPlugin {
public:
    AgentResult execute(const Context& ctx) override {
        // Your agent logic
        return result;
    }
};

extern "C" AgentPlugin* create_plugin() {
    return new MyAgent();
}

Running All Tests

# C++ tests
cd build && ctest -j$(nproc)

# Python tests
source venv/bin/activate
python3 -m pytest tests/test_core_python.py -v
python3 -m pytest tests/test_dag_planner.py -v
python3 -m pytest tests/test_sandbox_tiering.py -v

# Total: 108+ test cases

FAQ

Q: How does CLMA differ from LangChain or CrewAI?

A: LangChain provides chainable LLM calls; CrewAI focuses on role-based agent orchestration. CLMA's key differentiator is its closed-loop feedback mechanism — rather than executing agents in a fixed sequence, each iteration passes through Verifier + Evaluator stages that measure output quality. Results below threshold automatically trigger another iteration until convergence. This transforms LLM code generation from a "generate and hope" model to a "generate, verify, and iterate" one.

Q: Why use nested loops when they are slower than single loops?

A: Nested loops are not intended for simple tasks — the mode selector routes those to Fast Path or Single Loop automatically. Nested loops exist for strategically complex tasks where the outer strategy loop and inner execution loop must align. A single-pass pipeline cannot validate architectural decisions against generated code. Analogous to why distributed systems use consensus protocols for critical state transitions: the overhead is justified by correctness guarantees.

Q: Which programming languages are supported?

A: The execution layer (Sandbox) supports Python, Bash, and C++ natively. The LLM Solver can generate code in any language. Additional runtime language support can be added through ToolExecutor extensions.

Q: Is the framework usable offline?

A: Yes. Configure provider: "local" and point base_url to your Ollama or vLLM endpoint. All components function without internet access when using a local model.

Q: Can I try CLMA without an API key?

A: Yes. The framework includes a simulated fallback mode — when LLM calls fail or API configuration is absent, calls are automatically degraded to rule-based simulated agents. Scoring and pipeline mechanics remain fully functional, though code quality is naturally lower than with a real LLM.

Q: Does CLMA support Chinese-language queries?

A: Yes. All agent prompts are language-agnostic — Chinese, English, and other natural language queries are processed through the same pipeline. The classification router correctly handles Chinese keywords (e.g., "用 Python" → code query).

License

MIT License

Support

If you find this project useful:

⭐ Star the repository
🐛 Submit an issue for bugs
💡 Pull requests and feature suggestions are welcome

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
Comparison		Comparison
backup_20260501_185001		backup_20260501_185001
backup_20260504_112453		backup_20260504_112453
blog		blog
config		config
docs		docs
include/core		include/core
plugins		plugins
python_bindings		python_bindings
python_interface		python_interface
scripts		scripts
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CLMA_Frame_Test		CLMA_Frame_Test
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
phase6_plan.md		phase6_plan.md
phase7_analysis.md		phase7_analysis.md
phase7_plan.md		phase7_plan.md
run_webui.sh		run_webui.sh
scientific_calculator.html		scientific_calculator.html

Folders and files

Latest commit

History

Repository files navigation

CLMA — Closed-Loop Multi-Agent Framework

Table of Contents

Motivation

Architecture Overview

Core Modules

Agent Roles

Execution Modes

Adaptive Agent Network (AAN) — NEW in v1.1

DAG Mode

Nested Multi-Loop

Benchmarks

Mode Selection Guidelines

Quick Start

Prerequisites

One-Step Launch (Recommended)

Step-by-Step Build

China-Specific Setup

LLM Provider Configuration

Method 1: Direct Configuration

Method 2: Web UI Configuration

Supported Providers

API Key Security

Project Structure

Web UI Reference

Interface Components

Settings Reference

Python API Reference

Development Guide

Adding Rules

Adding a New LLM Provider

Adding a New Agent Plugin

Running All Tests

FAQ

License

Support

About

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages