ROOTwise Agent

A Multi-Agent system for ROOT/PyROOT code generation, documentation QA, debugging, and C++ → Python translation. ReAct reasoning loop + MCP RAG retrieval + Docker sandbox verification.

面向 CERN ROOT 框架的 Multi-Agent 系统，支持 PyROOT 代码生成、文档问答、调试和 C++ → Python 翻译。 ReAct 推理循环 + MCP RAG 检索 + Docker 沙箱验证。

Quick Start · Architecture · Evaluation · Tech Stack

📖 Overview

ROOTwise Agent is an intelligent assistant for CERN ROOT — the data analysis framework used by high-energy physics experiments worldwide. It understands ROOT APIs, generates correct PyROOT code, verifies it in a Docker sandbox, and iteratively fixes errors.

Key Features

4 Specialized Agents — CodeGen, DocQA, Debug, Translate — each with domain-specific prompts and verification
ReAct Reasoning Loop — Think → Act → Observe → Evaluate → Respond, with configurable stop policy
MCP RAG Integration — ROOT documentation retrieval via Model Context Protocol, graceful fallback when unavailable
Docker Sandbox — Code execution in isolated rootproject/root containers, --network none for security
3-Layer Memory — Working (in-context) + Episodic (SQLite) + Semantic (ChromaDB)
Iterative Verification — Generate → Execute → Reflect → Fix loop with CRITIC + Reflexion
C++ → PyROOT Translation — Automated translation with sandbox output comparison
Quantitative Evaluation — Retrieval (Recall/MRR/Precision) + CodeGen (Pass@k) + RAG ablation benchmarks

🏗 Architecture

graph TB
    User([User]) --> UI[Streamlit Chat UI]
    UI --> API[FastAPI /chat SSE]
    API --> Router[Intent Router<br/>gpt-4o-mini]

    Router -->|code_gen| CG[CodeGen Agent]
    Router -->|doc_qa| DQ[DocQA Agent]
    Router -->|debug| DB[Debug Agent]
    Router -->|translate| TR[Translate Agent]

    subgraph ReAct["ReAct Loop (BaseReActAgent)"]
        Think[Think] --> Act[Act]
        Act --> Observe[Observe]
        Observe --> Evaluate{Evaluate}
        Evaluate -->|continue| Think
        Evaluate -->|stop| Respond[Respond]
    end

    CG & DQ & DB & TR --> ReAct

    subgraph Tools["Tool Registry"]
        MCP[MCP Retrieve<br/>ROOT Docs]
        Docker[Docker Exec<br/>Sandbox]
        Mem[Search Memory]
        Reflect[Reflect<br/>CRITIC + Reflexion]
        Finish[Finish]
    end

    subgraph Infra["Infrastructure"]
        Sandbox[Docker Sandbox<br/>rootproject/root:6.32.02]
        RAG[MCP RAG Server<br/>Hybrid Search]
        Memory[(Memory<br/>Episodic + Semantic)]
        Chroma[(ChromaDB)]
    end

    ReAct --> Tools
    MCP --> RAG --> Chroma
    Docker --> Sandbox
    Mem --> Memory --> Chroma

Design Decisions

Decision	Choice	Rationale
LLM Interaction	JSON mode + FinishTool	Cross-provider compatibility (no native tool calling)
Agent Termination	Explicit FinishTool	Reliable stop signal vs parsing heuristics
All Agents	GPT-4o unified	DeepSeek JSON format issues forced single-provider
Sandbox Image	`rootproject/root:6.32.02-ubuntu24.04`	Version-locked for reproducibility
RAG Protocol	MCP (stdio / streamable HTTP)	Decoupled from RAG implementation
Memory	SQLite (episodic) + Chroma (semantic)	Lightweight, no external DB server needed

🚀 Quick Start

Prerequisites

Python 3.10+
Docker (for code sandbox)
OpenAI API key

Setup

git clone https://github.com/nobitalqs/ROOTwiseAgent.git
cd ROOTwiseAgent
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env   # fill in OPENAI_API_KEY

# Pull sandbox image (~2.5GB, first time only)
docker pull rootproject/root:6.32.02-ubuntu24.04

Run

# Start UI
streamlit run src/rootwise/ui/chat.py

# Or start API server
uvicorn rootwise.api.app:app --port 8000

With RAG (Optional)

# Clone and start the RAG MCP Server
git clone https://github.com/nobitalqs/MODULAR-RAG-MCP-SERVER.git ../MODULAR-RAG-MCP-SERVER
cd ../MODULAR-RAG-MCP-SERVER && pip install -e . && cd -

# Ingest ROOT documentation
python scripts/ingest.py

# The agent automatically connects via MCP when RAG server is configured

📊 Evaluation

Retrieval Quality (93 queries)

Evaluated against a curated golden test set with domain-level matching:

Metric	Score	Description
Recall@5	0.796	Relevant doc found in top-5 results
MRR	0.674	Reciprocal rank of first relevant hit
Precision@3	0.559	Fraction of top-3 results that are relevant

Code Generation — Standard Benchmark (50 tasks)

Mixed difficulty (easy/medium/hard), common ROOT operations:

Condition	Pass@1	Class Coverage
Agent-only	0.860	0.885
Agent + RAG	0.860	0.885

GPT-4o's training knowledge covers standard ROOT APIs well — RAG provides no uplift on common tasks.

Code Generation — RAG Benchmark (15 hard tasks)

Deliberately designed tasks requiring niche ROOT APIs (RooIntegralMorph, VariationsFor, RooSimWSTool, etc.) that LLMs don't know from training data:

Condition	Pass@1	Class Coverage
Agent-only	0.200	0.439
Agent + RAG	0.400	0.650

RAG doubles Pass@1 on tasks requiring specific API knowledge not in LLM training data.

Test Suite

Type	Count
Unit	314
Integration	33
E2E	2
Total	349

pytest tests/ -q   # runs in ~17s

🔧 Tech Stack

Layer	Technology
Agents	LangGraph StateGraph, custom BaseReActAgent
LLM	GPT-4o (JSON mode), gpt-4o-mini (router)
RAG	MCP protocol → MODULAR-RAG-MCP-SERVER
Sandbox	Docker (`rootproject/root:6.32.02-ubuntu24.04`)
Memory	SQLite (episodic) + ChromaDB (semantic)
API	FastAPI with SSE streaming
UI	Streamlit (multi-turn chat, code highlighting, plot rendering)
Evaluation	Custom benchmark framework (Recall@k, MRR, Pass@k, ablation)
Testing	pytest (349 tests)
Linting	ruff (format + check)

📂 Project Structure

src/rootwise/
├── agents/              # ReAct agent layer
│   ├── base.py          # BaseReActAgent (think/act/observe/evaluate/respond)
│   ├── router.py        # Intent classification (4 intents)
│   ├── code_gen.py      # PyROOT code generation + sandbox verification
│   ├── doc_qa.py        # Documentation Q&A + source grounding
│   ├── debug.py         # Error diagnosis + auto-fix
│   ├── translate.py     # C++ → PyROOT translation + verification
│   ├── context/         # Priority-based context assembly + token counting
│   ├── memory/          # 3-layer memory (episodic + semantic + working)
│   ├── tools/           # ToolRegistry + specialized tools
│   └── verification/    # Code execution, translation, grounding verifiers
├── mcp_client/          # MCP client (stdio + streamable HTTP)
├── sandbox/             # Docker sandbox executor + renderer
├── evaluation/          # Benchmark framework (retrieval + codegen + ablation)
├── api/                 # FastAPI service (SSE streaming)
├── ui/                  # Streamlit chat interface
└── config/              # Pydantic settings from YAML

🗺 Roadmap

Priority	Item	Status
1	Docker Compose deployment	Planned
2	Multi-Agent Orchestrator	Deferred
3	API Reference ingestion (Doxygen)	Optional
4	Multi-user file upload	Future

Built for the high-energy physics community.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.claude/skills		.claude/skills
config		config
data/benchmark		data/benchmark
docs/superpowers		docs/superpowers
scripts		scripts
src/rootwise		src/rootwise
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DEV_SPEC.md		DEV_SPEC.md
README.md		README.md
ROOT_Agent_Architecture.md		ROOT_Agent_Architecture.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ROOTwise Agent

📖 Overview

Key Features

🏗 Architecture

Design Decisions

🚀 Quick Start

Prerequisites

Setup

Run

With RAG (Optional)

📊 Evaluation

Retrieval Quality (93 queries)

Code Generation — Standard Benchmark (50 tasks)

Code Generation — RAG Benchmark (15 hard tasks)

Test Suite

🔧 Tech Stack

📂 Project Structure

🗺 Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ROOTwise Agent

📖 Overview

Key Features

🏗 Architecture

Design Decisions

🚀 Quick Start

Prerequisites

Setup

Run

With RAG (Optional)

📊 Evaluation

Retrieval Quality (93 queries)

Code Generation — Standard Benchmark (50 tasks)

Code Generation — RAG Benchmark (15 hard tasks)

Test Suite

🔧 Tech Stack

📂 Project Structure

🗺 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages