Skip to content

nobitalqs/ROOTwiseAgent

Repository files navigation

ROOTwise Agent

English | 中文

Python 3.10+ Tests LLM MCP

A Multi-Agent system for ROOT/PyROOT code generation, documentation QA, debugging, and C++ → Python translation. ReAct reasoning loop + MCP RAG retrieval + Docker sandbox verification.

面向 CERN ROOT 框架的 Multi-Agent 系统,支持 PyROOT 代码生成、文档问答、调试和 C++ → Python 翻译。 ReAct 推理循环 + MCP RAG 检索 + Docker 沙箱验证。

Quick Start · Architecture · Evaluation · Tech Stack


📖 Overview

ROOTwise Agent is an intelligent assistant for CERN ROOT — the data analysis framework used by high-energy physics experiments worldwide. It understands ROOT APIs, generates correct PyROOT code, verifies it in a Docker sandbox, and iteratively fixes errors.

Key Features

  • 4 Specialized Agents — CodeGen, DocQA, Debug, Translate — each with domain-specific prompts and verification
  • ReAct Reasoning Loop — Think → Act → Observe → Evaluate → Respond, with configurable stop policy
  • MCP RAG Integration — ROOT documentation retrieval via Model Context Protocol, graceful fallback when unavailable
  • Docker Sandbox — Code execution in isolated rootproject/root containers, --network none for security
  • 3-Layer Memory — Working (in-context) + Episodic (SQLite) + Semantic (ChromaDB)
  • Iterative Verification — Generate → Execute → Reflect → Fix loop with CRITIC + Reflexion
  • C++ → PyROOT Translation — Automated translation with sandbox output comparison
  • Quantitative Evaluation — Retrieval (Recall/MRR/Precision) + CodeGen (Pass@k) + RAG ablation benchmarks

🏗 Architecture

graph TB
    User([User]) --> UI[Streamlit Chat UI]
    UI --> API[FastAPI /chat SSE]
    API --> Router[Intent Router<br/>gpt-4o-mini]

    Router -->|code_gen| CG[CodeGen Agent]
    Router -->|doc_qa| DQ[DocQA Agent]
    Router -->|debug| DB[Debug Agent]
    Router -->|translate| TR[Translate Agent]

    subgraph ReAct["ReAct Loop (BaseReActAgent)"]
        Think[Think] --> Act[Act]
        Act --> Observe[Observe]
        Observe --> Evaluate{Evaluate}
        Evaluate -->|continue| Think
        Evaluate -->|stop| Respond[Respond]
    end

    CG & DQ & DB & TR --> ReAct

    subgraph Tools["Tool Registry"]
        MCP[MCP Retrieve<br/>ROOT Docs]
        Docker[Docker Exec<br/>Sandbox]
        Mem[Search Memory]
        Reflect[Reflect<br/>CRITIC + Reflexion]
        Finish[Finish]
    end

    subgraph Infra["Infrastructure"]
        Sandbox[Docker Sandbox<br/>rootproject/root:6.32.02]
        RAG[MCP RAG Server<br/>Hybrid Search]
        Memory[(Memory<br/>Episodic + Semantic)]
        Chroma[(ChromaDB)]
    end

    ReAct --> Tools
    MCP --> RAG --> Chroma
    Docker --> Sandbox
    Mem --> Memory --> Chroma
Loading

Design Decisions

Decision Choice Rationale
LLM Interaction JSON mode + FinishTool Cross-provider compatibility (no native tool calling)
Agent Termination Explicit FinishTool Reliable stop signal vs parsing heuristics
All Agents GPT-4o unified DeepSeek JSON format issues forced single-provider
Sandbox Image rootproject/root:6.32.02-ubuntu24.04 Version-locked for reproducibility
RAG Protocol MCP (stdio / streamable HTTP) Decoupled from RAG implementation
Memory SQLite (episodic) + Chroma (semantic) Lightweight, no external DB server needed

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • Docker (for code sandbox)
  • OpenAI API key

Setup

git clone https://github.com/nobitalqs/ROOTwiseAgent.git
cd ROOTwiseAgent
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env   # fill in OPENAI_API_KEY

# Pull sandbox image (~2.5GB, first time only)
docker pull rootproject/root:6.32.02-ubuntu24.04

Run

# Start UI
streamlit run src/rootwise/ui/chat.py

# Or start API server
uvicorn rootwise.api.app:app --port 8000

With RAG (Optional)

# Clone and start the RAG MCP Server
git clone https://github.com/nobitalqs/MODULAR-RAG-MCP-SERVER.git ../MODULAR-RAG-MCP-SERVER
cd ../MODULAR-RAG-MCP-SERVER && pip install -e . && cd -

# Ingest ROOT documentation
python scripts/ingest.py

# The agent automatically connects via MCP when RAG server is configured

📊 Evaluation

Retrieval Quality (93 queries)

Evaluated against a curated golden test set with domain-level matching:

Metric Score Description
Recall@5 0.796 Relevant doc found in top-5 results
MRR 0.674 Reciprocal rank of first relevant hit
Precision@3 0.559 Fraction of top-3 results that are relevant

Code Generation — Standard Benchmark (50 tasks)

Mixed difficulty (easy/medium/hard), common ROOT operations:

Condition Pass@1 Class Coverage
Agent-only 0.860 0.885
Agent + RAG 0.860 0.885

GPT-4o's training knowledge covers standard ROOT APIs well — RAG provides no uplift on common tasks.

Code Generation — RAG Benchmark (15 hard tasks)

Deliberately designed tasks requiring niche ROOT APIs (RooIntegralMorph, VariationsFor, RooSimWSTool, etc.) that LLMs don't know from training data:

Condition Pass@1 Class Coverage
Agent-only 0.200 0.439
Agent + RAG 0.400 0.650

RAG doubles Pass@1 on tasks requiring specific API knowledge not in LLM training data.

Test Suite

Type Count
Unit 314
Integration 33
E2E 2
Total 349
pytest tests/ -q   # runs in ~17s

🔧 Tech Stack

Layer Technology
Agents LangGraph StateGraph, custom BaseReActAgent
LLM GPT-4o (JSON mode), gpt-4o-mini (router)
RAG MCP protocol → MODULAR-RAG-MCP-SERVER
Sandbox Docker (rootproject/root:6.32.02-ubuntu24.04)
Memory SQLite (episodic) + ChromaDB (semantic)
API FastAPI with SSE streaming
UI Streamlit (multi-turn chat, code highlighting, plot rendering)
Evaluation Custom benchmark framework (Recall@k, MRR, Pass@k, ablation)
Testing pytest (349 tests)
Linting ruff (format + check)

📂 Project Structure

src/rootwise/
├── agents/              # ReAct agent layer
│   ├── base.py          # BaseReActAgent (think/act/observe/evaluate/respond)
│   ├── router.py        # Intent classification (4 intents)
│   ├── code_gen.py      # PyROOT code generation + sandbox verification
│   ├── doc_qa.py        # Documentation Q&A + source grounding
│   ├── debug.py         # Error diagnosis + auto-fix
│   ├── translate.py     # C++ → PyROOT translation + verification
│   ├── context/         # Priority-based context assembly + token counting
│   ├── memory/          # 3-layer memory (episodic + semantic + working)
│   ├── tools/           # ToolRegistry + specialized tools
│   └── verification/    # Code execution, translation, grounding verifiers
├── mcp_client/          # MCP client (stdio + streamable HTTP)
├── sandbox/             # Docker sandbox executor + renderer
├── evaluation/          # Benchmark framework (retrieval + codegen + ablation)
├── api/                 # FastAPI service (SSE streaming)
├── ui/                  # Streamlit chat interface
└── config/              # Pydantic settings from YAML

🗺 Roadmap

Priority Item Status
1 Docker Compose deployment Planned
2 Multi-Agent Orchestrator Deferred
3 API Reference ingestion (Doxygen) Optional
4 Multi-user file upload Future

Built for the high-energy physics community.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors