A Multi-Agent system for ROOT/PyROOT code generation, documentation QA, debugging, and C++ → Python translation. ReAct reasoning loop + MCP RAG retrieval + Docker sandbox verification.
面向 CERN ROOT 框架的 Multi-Agent 系统,支持 PyROOT 代码生成、文档问答、调试和 C++ → Python 翻译。 ReAct 推理循环 + MCP RAG 检索 + Docker 沙箱验证。
ROOTwise Agent is an intelligent assistant for CERN ROOT — the data analysis framework used by high-energy physics experiments worldwide. It understands ROOT APIs, generates correct PyROOT code, verifies it in a Docker sandbox, and iteratively fixes errors.
- 4 Specialized Agents — CodeGen, DocQA, Debug, Translate — each with domain-specific prompts and verification
- ReAct Reasoning Loop — Think → Act → Observe → Evaluate → Respond, with configurable stop policy
- MCP RAG Integration — ROOT documentation retrieval via Model Context Protocol, graceful fallback when unavailable
- Docker Sandbox — Code execution in isolated
rootproject/rootcontainers,--network nonefor security - 3-Layer Memory — Working (in-context) + Episodic (SQLite) + Semantic (ChromaDB)
- Iterative Verification — Generate → Execute → Reflect → Fix loop with CRITIC + Reflexion
- C++ → PyROOT Translation — Automated translation with sandbox output comparison
- Quantitative Evaluation — Retrieval (Recall/MRR/Precision) + CodeGen (Pass@k) + RAG ablation benchmarks
graph TB
User([User]) --> UI[Streamlit Chat UI]
UI --> API[FastAPI /chat SSE]
API --> Router[Intent Router<br/>gpt-4o-mini]
Router -->|code_gen| CG[CodeGen Agent]
Router -->|doc_qa| DQ[DocQA Agent]
Router -->|debug| DB[Debug Agent]
Router -->|translate| TR[Translate Agent]
subgraph ReAct["ReAct Loop (BaseReActAgent)"]
Think[Think] --> Act[Act]
Act --> Observe[Observe]
Observe --> Evaluate{Evaluate}
Evaluate -->|continue| Think
Evaluate -->|stop| Respond[Respond]
end
CG & DQ & DB & TR --> ReAct
subgraph Tools["Tool Registry"]
MCP[MCP Retrieve<br/>ROOT Docs]
Docker[Docker Exec<br/>Sandbox]
Mem[Search Memory]
Reflect[Reflect<br/>CRITIC + Reflexion]
Finish[Finish]
end
subgraph Infra["Infrastructure"]
Sandbox[Docker Sandbox<br/>rootproject/root:6.32.02]
RAG[MCP RAG Server<br/>Hybrid Search]
Memory[(Memory<br/>Episodic + Semantic)]
Chroma[(ChromaDB)]
end
ReAct --> Tools
MCP --> RAG --> Chroma
Docker --> Sandbox
Mem --> Memory --> Chroma
| Decision | Choice | Rationale |
|---|---|---|
| LLM Interaction | JSON mode + FinishTool | Cross-provider compatibility (no native tool calling) |
| Agent Termination | Explicit FinishTool | Reliable stop signal vs parsing heuristics |
| All Agents | GPT-4o unified | DeepSeek JSON format issues forced single-provider |
| Sandbox Image | rootproject/root:6.32.02-ubuntu24.04 |
Version-locked for reproducibility |
| RAG Protocol | MCP (stdio / streamable HTTP) | Decoupled from RAG implementation |
| Memory | SQLite (episodic) + Chroma (semantic) | Lightweight, no external DB server needed |
- Python 3.10+
- Docker (for code sandbox)
- OpenAI API key
git clone https://github.com/nobitalqs/ROOTwiseAgent.git
cd ROOTwiseAgent
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # fill in OPENAI_API_KEY
# Pull sandbox image (~2.5GB, first time only)
docker pull rootproject/root:6.32.02-ubuntu24.04# Start UI
streamlit run src/rootwise/ui/chat.py
# Or start API server
uvicorn rootwise.api.app:app --port 8000# Clone and start the RAG MCP Server
git clone https://github.com/nobitalqs/MODULAR-RAG-MCP-SERVER.git ../MODULAR-RAG-MCP-SERVER
cd ../MODULAR-RAG-MCP-SERVER && pip install -e . && cd -
# Ingest ROOT documentation
python scripts/ingest.py
# The agent automatically connects via MCP when RAG server is configuredEvaluated against a curated golden test set with domain-level matching:
| Metric | Score | Description |
|---|---|---|
| Recall@5 | 0.796 | Relevant doc found in top-5 results |
| MRR | 0.674 | Reciprocal rank of first relevant hit |
| Precision@3 | 0.559 | Fraction of top-3 results that are relevant |
Mixed difficulty (easy/medium/hard), common ROOT operations:
| Condition | Pass@1 | Class Coverage |
|---|---|---|
| Agent-only | 0.860 | 0.885 |
| Agent + RAG | 0.860 | 0.885 |
GPT-4o's training knowledge covers standard ROOT APIs well — RAG provides no uplift on common tasks.
Deliberately designed tasks requiring niche ROOT APIs (RooIntegralMorph, VariationsFor, RooSimWSTool, etc.) that LLMs don't know from training data:
| Condition | Pass@1 | Class Coverage |
|---|---|---|
| Agent-only | 0.200 | 0.439 |
| Agent + RAG | 0.400 | 0.650 |
RAG doubles Pass@1 on tasks requiring specific API knowledge not in LLM training data.
| Type | Count |
|---|---|
| Unit | 314 |
| Integration | 33 |
| E2E | 2 |
| Total | 349 |
pytest tests/ -q # runs in ~17s| Layer | Technology |
|---|---|
| Agents | LangGraph StateGraph, custom BaseReActAgent |
| LLM | GPT-4o (JSON mode), gpt-4o-mini (router) |
| RAG | MCP protocol → MODULAR-RAG-MCP-SERVER |
| Sandbox | Docker (rootproject/root:6.32.02-ubuntu24.04) |
| Memory | SQLite (episodic) + ChromaDB (semantic) |
| API | FastAPI with SSE streaming |
| UI | Streamlit (multi-turn chat, code highlighting, plot rendering) |
| Evaluation | Custom benchmark framework (Recall@k, MRR, Pass@k, ablation) |
| Testing | pytest (349 tests) |
| Linting | ruff (format + check) |
src/rootwise/
├── agents/ # ReAct agent layer
│ ├── base.py # BaseReActAgent (think/act/observe/evaluate/respond)
│ ├── router.py # Intent classification (4 intents)
│ ├── code_gen.py # PyROOT code generation + sandbox verification
│ ├── doc_qa.py # Documentation Q&A + source grounding
│ ├── debug.py # Error diagnosis + auto-fix
│ ├── translate.py # C++ → PyROOT translation + verification
│ ├── context/ # Priority-based context assembly + token counting
│ ├── memory/ # 3-layer memory (episodic + semantic + working)
│ ├── tools/ # ToolRegistry + specialized tools
│ └── verification/ # Code execution, translation, grounding verifiers
├── mcp_client/ # MCP client (stdio + streamable HTTP)
├── sandbox/ # Docker sandbox executor + renderer
├── evaluation/ # Benchmark framework (retrieval + codegen + ablation)
├── api/ # FastAPI service (SSE streaming)
├── ui/ # Streamlit chat interface
└── config/ # Pydantic settings from YAML
| Priority | Item | Status |
|---|---|---|
| 1 | Docker Compose deployment | Planned |
| 2 | Multi-Agent Orchestrator | Deferred |
| 3 | API Reference ingestion (Doxygen) | Optional |
| 4 | Multi-user file upload | Future |
Built for the high-energy physics community.