π A 5-Layer AI Agent Harness Built from Scratch
Native ReAct Β· ContextManager/SandboxManager Β· Sandbox Isolation Β· HTTP SSE API
Architecture is what you build. Engineering is how you build it.
English | δΈζ
NanoDeer is a compact agent harness with a native async ReAct loop, explicit runtime managers, sandbox-aware tool routing, file-based memory/plan storage, SQLite checkpoint resume, structured trace events, and a Next.js assistant-ui frontend. It intentionally avoids LangGraph and middleware chains: the product path is HTTP/UI -> NanoEngine -> ReActExecutor -> tools/sandbox -> memory/plan/checkpoint.
Current product surface:
- Streaming chat over HTTP SSE with conversation list, rename/archive/delete, and resume.
- Docker-first sandbox execution with Local fallback and virtual
/mnt/user-datapath translation. - Host-side memory, wiki, and plan tools backed by inspectable files.
- Image upload bridge from frontend to API to
read_image. - Deterministic smoke benchmarks plus trace contracts for regression checks.
- Project Structure
- Quick Start
- Background
- Key Differentiators
- Architecture
- Design Principles
- Tools
- Project Status & Roadmap
- Design Inspirations
- Acknowledgments
- License
nanodeer/
βββ pyproject.toml # Build config, entry points, dependencies
βββ config.yaml # Runtime config (LLM, sandbox, memory, threadβ¦)
βββ config.yaml.example # Template β copy to config.yaml and edit
βββ .env.example # Template for API keys β copy to .env and fill in
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
βββ AGENTS.md # Agent workflow documentation
βββ README.md # This file (English)
βββ README_zh.md # δΈζηζζ‘£
β
βββ scripts/
β βββ dev.sh # One-command launch: backend + frontend
β βββ check.sh # Run tests + lint
β
βββ src/nanodeer/ # Backend source (Python)
β βββ cli/
β β βββ api.py # Layer 5: FastAPI + SSE HTTP server
β β βββ repl.py # Layer 5: Debug REPL
β βββ engine.py # Layer 4: NanoEngine β Application scheduler
β βββ agent/
β β βββ factory.py # Layer 3-4 bridge: NanoDeerFactory assembler
β β βββ react.py # Layer 3: ReActExecutor β main loop (core)
β β βββ state.py # ThreadState / TurnSignals data models
β β βββ context.py # Layer 3: ContextManager β context assembly
β β βββ prompt.py # Layer 2: Static+dynamic dual-layer prompt builder
β β βββ sandbox_manager.py # Layer 3: Sandbox lifecycle manager
β β βββ compression.py # Layer 4Β½: Conversation compression
β β βββ trace.py # Runtime observability (structured events)
β β βββ checkpoint/ # Layer 1: SQLite session persistence
β β βββ memory/ # Layer 1: File-based layered memory (L1-L4)
β βββ sandbox/
β β βββ __init__.py # SandboxProvider ABC + module-level context
β β βββ docker.py # Docker sandbox provider
β β βββ local.py # Local subprocess fallback
β β βββ path.py # Virtualβphysical path translation + security
β β βββ tools.py # SandboxExecTool β routes tools into container
β βββ tools/ # Built-in tool definitions (20 tools)
β βββ subagent/ # Semaphore-based subagent coordinator
β βββ plan/ # File-based JSON plan storage
β βββ skills/ # .md skill loading system
β βββ config.py # Pydantic config model + global singleton
β
βββ frontend/ # Web UI (Next.js + assistant-ui)
β βββ app/ # Next.js App Router pages
β βββ components/ # React components (chat, sidebar, settings)
β βββ lib/ # Frontend utilities and API client
β βββ hooks/ # Custom React hooks
β βββ package.json # Node dependencies
β βββ next.config.ts # Next.js configuration
β βββ tsconfig.json # TypeScript configuration
β βββ biome.json # Linter/formatter config
β βββ postcss.config.mjs # PostCSS configuration
β βββ components.json # shadcn/ui component registry
β βββ .env.example # Frontend environment template
β
βββ sandbox/ # Docker sandbox image build
β βββ Dockerfile # Minimal Python 3.11 sandbox image
β βββ build.sh # Image build script
β βββ README.md # Sandbox setup guide (Chinese)
β
βββ tests/ # Python test suite
β βββ conftest.py # Shared pytest fixtures
β βββ test_agent/ # ReAct executor & state tests
β βββ test_agent_memory/ # Memory system tests
β βββ test_cli/ # API endpoint & REPL tests
β βββ test_integration/ # End-to-end integration tests
β βββ test_plan/ # Plan storage tests
β βββ test_sandbox/ # Sandbox provider tests
β βββ test_skills/ # Skill loader tests
β βββ test_subagents/ # Subagent coordinator tests
β βββ test_benchmarks/ # Benchmark task tests
β βββ test_tools_integration/ # Tool execution integration tests
β
βββ benchmarks/ # Performance benchmarks
β βββ runner.py # Benchmark runner
β βββ tasks/smoke.yaml # Smoke test task definitions
β βββ judges.py # LLM-as-judge evaluation
β βββ reporters/ # Output reporters (JSON, etc.)
β βββ fixtures/ # Benchmark data fixtures
β
βββ docs/ # Design documentation (Chinese)
β βββ nanodeer_blueprint_20260401.md # Project blueprint
β βββ runtime_architecture.md # Runtime architecture
β βββ harness_architecture.md # Harness architecture
β βββ memory_design.md # Memory system design
β βββ sandbox_design.md # Sandbox design
β βββ subagent_design.md # Subagent design
β βββ plan_design.md # Plan system design
β βββ tools_design.md # Tools design
β βββ skills_design.md # Skills design
β βββ prompt_design.md # Prompt engineering design
β βββ observability_design.md # Observability & tracing
β βββ evaluation_plan.md # Evaluation plan
β βββ long_horizon_design.md # Long-horizon task design
β βββ refactoring_journey.md # Refactoring journey notes
β βββ ref/ # Reference architecture reports
β
βββ examples/ # Usage examples (coming soon)
β
βββ .agents/ # Agent orchestration configs (internal)
βββ .codex/ # Codex metadata (internal)
βββ .claude/ # Claude Code project settings (internal)
| Dependency | Version | Required | Notes |
|---|---|---|---|
| OS | Linux / macOS | β | WSL2 recommended on Windows |
| Python | β₯ 3.10 | β | 3.11+ preferred; sandbox Docker image uses 3.11 |
| Node.js | β₯ 18 | Only needed for frontend development | |
| npm | (comes w/ Node) | Frontend dependency management | |
| Docker | β₯ 24.0 | Required for sandbox isolation; Local fallback works without | |
| curl | any | Required by dev/check scripts | |
| LLM API Key | β | β | At least one provider (Anthropic, OpenAI, MiniMax, DeepSeekβ¦) |
| RAM | β₯ 4 GB | β | 8 GB+ recommended when running frontend + backend |
| Disk | β₯ 1 GB free | β | For .venv, node_modules, and runtime data |
β
Required β
Supported LLM Providers: Anthropic, OpenAI, DeepSeek, MiniMax, SiliconFlow, Zhipu (GLM), DashScope (Qwen), Moonshot (Kimi), Google Gemini, Groq, OpenRouter, Ollama (local).
git clone https://github.com/gzhzk/nanodeer
cd nanodeer
cp .env.example .env
# Edit .env with your API key
pip install -e .# Start backend API + frontend dev server
./scripts/dev.sh
# Frontend: http://127.0.0.1:20265
# Backend: http://127.0.0.1:20266# Run Python tests and frontend lint when dependencies are installed
python -m pip install -e '.[dev]'
./scripts/check.sh
# Run a focused Python test file
./scripts/check.sh tests/test_agent/test_react.pyFor manual debugging:
# Terminal 1: HTTP API server
.venv/bin/python -m nanodeer.cli.api
# Terminal 2: frontend
cd frontend
npm run dev
# Optional CLI REPL
nanodeer-replcd frontend
npm install
# Pre-build CSS (required once, re-run when changing src/app/globals.css)
npm run build:css
# Start dev server
npm run dev
# Opens at http://127.0.0.1:20265The frontend proxies /api/* to the backend at http://127.0.0.1:20266.
Edit config.yaml to configure:
- LLM provider (MiniMax, Anthropic, OpenAI, SiliconFlow, etc.)
- Sandbox settings (Docker image, network mode)
- Thread storage paths
At the end of last year I started working on agent-related projects β my understanding was rough: just AI doing things for you. In early March my mentor mentioned "harness engineering is getting popular lately, maybe look into it." So I started searching for materials and picked up Claude Code along the way.
By late March, DeerFlow came onto my radar. ByteDance's open-source project showed me for the first time what a proper enterprise-grade Agent harness framework should look like β state machine, middleware chain, sandbox isolation, tiered memory, every piece in its right place.
The story might have ended there. But on the last evening of March, I attended ByteDance's campus recruiting talk. One thing that stuck with me was their motto β "Work with great people on challenging things." During the talk, a message flashed across my phone screen β Claude Code went open source. Something clicked in that moment. DeerFlow showed me what a framework should look like. Claude Code showed me what a product could feel like. With OpenClaw trending in China, everything suddenly connected. That night, back in my dorm, I wrote down the first draft.
The core idea: distill the patterns that work β native ReAct loop, Docker sandbox isolation, tiered memory, inline orchestration β into a focused, auditable foundation where every module has one job and concerns are handled inline.
NanoDeer is a lightweight Agent harness built from scratch. What makes it different from LangGraph, CrewAI, and AutoGen:
No graph compilation, no nodes, no edges. Just a pure while True async loop with inline orchestration:
ContextManager.load() β SandboxManager.acquire() β LLM.ainvoke()
β Clarification check β [Tool loop + bash audit] β Checkpoint β loop or end
This is not a simplification for its own sake β it means you can read the entire execution path in one file (react.py), debug with standard Python tooling, and understand control flow without learning a graph DSL. No hidden state, no opaque serialization, no framework lock-in.
Most Agent frameworks route middleware as pre/post hooks around the LLM call. NanoDeer has no middleware chain β all cross-cutting concerns are inline functions or standalone Managers:
| Mechanism | Implementation |
|---|---|
WAIT |
_check_clarification() inline checks [CLARIFICATION] tag, sets next_action = WAIT |
| Context loading | ContextManager.load() parallel-executes: mkdir, memory load, plan load, upload processing |
| Sandbox lifecycle | SandboxManager.acquire()/release() idempotent container lifecycle management |
| Bash audit | _bash_safe() inline regex, blocks dangerous patterns |
| LLM retry | _call_with_retry() exponential backoff for 429/5xx/timeout |
| Loop convergence | repeated identical tool calls and max-turn guard synthesize a final answer instead of spinning forever |
NanoDeer exposes a FastAPI server with Server-Sent Events for real-time streaming. The frontend (assistant-ui) connects via standard HTTP SSE β no custom protocols, no process management.
Browser (assistant-ui) ββ HTTP SSE ββ api.py ββ NanoEngine ββ ReActExecutor
This means:
- Frontend can be any HTTP client β browser, curl, Postman
- Standard SSE protocol, no custom transport
- Independent deployment: API server can run as a service
Three design layers, not one:
| Layer | File | Role |
|---|---|---|
| Tool Routing | sandbox/tools.py | SandboxExecTool wraps 9 tools at factory assembly, routes to Docker or Local transparently |
| Path Translation | sandbox/path.py | Virtual /mnt/user-data/... β physical {base_path}/{exec_id}/user-data/..., traversal-protected |
| Security Audit | react.py | _bash_safe() inline regex audits commands, blocks dangerous patterns |
For glob and grep, paths are validated/transformed as paths while patterns are base64-encoded. This keeps Docker and Local fallback behavior aligned for /mnt/user-data/....
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 5: HTTP API β FastAPI + SSE β
β api.py β /api/chat (SSE), /api/chat/cancel, /api/conversations β
β repl.py β Async CLI REPL for debugging β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β calls engine.run_streaming()
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 4: NanoEngine β Application Entry β
β engine.py β creates ThreadState, calls executor β
β App-layer compression lives here, not in middleware β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β calls executor.run_streaming()
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 3: Execution Core β
β react.py β Native async ReAct loop β
β context.py β ContextManager β
β sandbox_manager.py β Sandbox lifecycle β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β invokes tools through the execution loop
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 2: Capabilities β
β tools/ β Built-in tools and execution surfaces β
β prompt.py β Prompt construction β
β subagent/ β SubagentCoordinator β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β tools.invoke()
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: Persistence / Isolation / Data β
β sandbox/ β DockerSandboxProvider, Local fallback, path translation β
β memory/ β File-based MemoryStore (3 tiers) β
β checkpoint/β SqliteCheckpointer for session resume β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User Input (HTTP / CLI REPL / Web UI)
β
api.py receives HTTP POST /api/chat, calls NanoEngine
β
NanoEngine.run_streaming() β ReActExecutor.run()
β
ββ ContextManager.load() (parallel I/O) βββββββββββββββββββββββββββββββββββββ
β _ensure_dirs() Creates {thread_id}/user-data/{workspace,uploads,outputs} β
β _load_memory() MemoryLayers.inject() β L1-L4 layered memory β
β _load_plan() Loads plans and step progress into context β
β _process_uploads Writes uploaded files to uploads/ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ SandboxManager.acquire() (idempotent) βββββββββββββββββββββββββββββββββββ
β Checks state.sandbox β reuses or acquires fresh container β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
LLM.ainvoke(prompt + messages) β with _call_with_retry() on 429/5xx/timeout
β
ββ _check_clarification() (inline) βββββββββββββββββββββββββββββββββββββββββ
β Detects [CLARIFICATION] tag β sets WAIT β return to user β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
[no tool_calls? β END β checkpoint + absorb β break]
β
for each tool_call (individually, not batched):
ββ _bash_safe() (inline audit) βββββββββββββββββββββββββββββββββββββββββββ
β Hard blocks: shell metachar, rm -rf /, curl|bash β
β Warns on: pip install, chmod 777 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
tool.ainvoke(args) β SandboxExecTool routes to Docker or Local
β (try/except catches ValidationError + generic errors)
ββ Persistence βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Checkpointer.save() β SQLite (messages + thread metadata) β
β ContextManager.absorb() β episodic log (auto-appended) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
PROCESS β next turn END β SandboxManager.release() + break
Key design decisions visible in this flow:
- No middleware chain β all cross-cutting concerns are inline functions or standalone Managers
- Sandbox release is END-only β
PROCESSkeeps the container alive across turns - SandboxManager.acquire() is idempotent β checks
state.sandboxbefore acquiring save_memoryis not in SANDBOX_TOOL_CONFIGS β runs on host naturally, no interception needed- Checkpoint stores only messages + thread metadata β system_prompt/sandbox/next_action reconstructed at runtime
All runtime data under ~/.nanodeer/. Harness and App layers maintain separate subtrees.
~/.nanodeer/
βββ memory/ # Agent-maintained knowledge
β βββ USER.md # User preferences and context (LLM writes)
β βββ MEMORY.md # Legacy flat-file memory (LLM writes)
β βββ wiki/entries/ # Structured wiki entries (JSON, tagged)
β βββ episodic/ # Session logs (auto-appended, daily files)
β
βββ plans/
β βββ {plan_id}.json # Full Plan document (goal, steps, status)
β βββ index.json # Plan index for fast listing
β
βββ threads/
β βββ threads.db # SQLite β ThreadState snapshots (resumable)
β βββ {thread_id}/ # Per-thread sandbox (ephemeral)
β βββ user-data/ # Volume-mounted to container /mnt/user-data/
β βββ workspace/
β βββ uploads/
β βββ outputs/
β
βββ conversations/
βββ {thread_id}.json # Metadata index (thread_id + title, no messages)
| Path | Persists | Purpose |
|---|---|---|
~/.nanodeer/memory/ |
Yes | Agent knowledge (USER/MEMORY/wiki/episodic) |
~/.nanodeer/plans/ |
Yes | Plans with embedded steps |
~/.nanodeer/threads/{id}/ |
No (ephemeral) | Sandbox working directory |
~/.nanodeer/threads/threads.db |
Yes | SQLite session snapshots (resumable) |
~/.nanodeer/conversations/ |
Yes | Web UI session index (thread_id + metadata) |
NanoDeer uses two data carriers with distinct lifetimes:
TurnSignals β ephemeral, fresh each turn:
| Signal | Written by | Read by | Effect |
|---|---|---|---|
clarification_question |
react.py _check_clarification() |
App layer | Display question to user, WAIT |
memory_context |
MemoryLayers.inject() via ContextManager | Prompt builder | Inject memory into LLM context |
plan_context |
ContextManager._load_plan() | Prompt builder | Inject plan + step progress into LLM context |
uploaded_files_list |
ContextManager._scan_uploads() | Prompt builder | Inject uploaded file info |
ThreadState β persistent across turns:
| Field | Role |
|---|---|
messages |
Full conversation history (Human/AI/Tool) |
next_action |
PROCESS β continue loop; WAIT β return to caller; END β terminate |
title |
Conversation title (for UI listing) |
sandbox |
Container state (container_id, status; runtime only, not persisted) |
- One-way dependency: Agent β Harness. Harness has no knowledge of Agent's business logic.
- No middleware chain: All cross-cutting concerns are inline functions or standalone Managers. Zero indirection.
- Inline error handling:
_call_with_retry()for LLM calls, try/except for tool execution. - Compression is app-layer: Timing decided by NanoEngine, not auto-triggered in the ReAct loop.
- Prompt auto-detection: Sections render only when data is present AND feature flag is True.
- Sandbox + Host dual paths: Sensitive ops through containers,
save_memory/plan tools directly on host. - Native ReAct loop: No LangGraph dependency. A direct
while Trueloop with retry, clarification, tool execution, and convergence guards instead of a graph compiler. - Hybrid persistence: Memory/plan uses files (inspectable, auditable). Checkpoint uses SQLite (efficient queries).
| Tool | Category | Sandbox |
|---|---|---|
read_file, write_file, ls, glob, grep, edit_file |
File | β Docker/Local |
bash, git, exec_python |
Shell | β Docker/Local |
web_search, web_fetch, read_image |
External / uploads | β Host |
save_memory, search_memory |
Memory | β Host |
create_plan, add_step, update_step, list_plans |
Plan | β Host (direct write) |
spawn_subagent, get_subagent_results |
Subagent | β Own sandbox per worker |
invoke_skill |
Skills | β Host |
Current (v0.1.0) β Core framework stable:
- β Native ReAct loop with inline orchestration
- β Docker + Local sandbox with path isolation
- β 20 built-in tools
- β File-based memory/wiki and plan storage
- β SQLite checkpoint persistence for conversation resume
- β HTTP SSE API (FastAPI) + conversation management endpoints
- β
Image upload bridge through the frontend/API into
read_image - β CLI REPL
- β SubagentCoordinator with constrained read-only workers
- β Skill workflow loader
- β assistant-ui frontend (Next.js + assistant-ui), including Projects/Plans/Memory/Wiki sidebar summary
- β Structured trace events and deterministic smoke benchmark suite
In progress / planned:
| Area | Status |
|---|---|
| Frontend polish and richer workspace views | π In progress |
| Plan/Memory/Wiki detail pages wired to backend APIs | π In progress |
| Inline: guardrail, timeout, fallback | π Planned |
| Inline: dangling tool call injection | π Planned |
| Broader benchmark task sets beyond smoke | π Planned |
| Long-horizon task loop | π Planned |
| γββ Focus (focus-driven context injection) | π Planned |
| γββ TurnBudget (turn/duration budget) | π Planned |
| γββ Learning (error analysis + lesson extraction) | π Planned |
| γββ Reflection (session-end reflection) | π Planned |
| γββ Plan-Memory bridge (step self-judgment β wiki) | π Planned |
| IM bot integration (Feishu/WeCom) | π Planned |
| Evaluation framework | π Planned |
| Multi-model comparison benchmarks | π Planned |
| Source | What it taught me |
|---|---|
| DeerFlow | Middleware chain + state machine; next_action signal routing |
| Claude Code | Tool-first design, clarification-driven pauses via <clarification> tags |
| OpenClaw | Layered memory (L1-L4); wiki-structured knowledge curated by the LLM |
| NanoClaw | Docker sandbox isolation; per-thread containers, volume mounts, path translation |
To my family β for their silent support and endless patience, which made this possible.
To my mentor β for opening the door to Agent and Harness Engineering, and encouraging me to explore.
Claude Code β my best coding companion, supercharging my AI workflow, and showing me that a product can be both powerful and elegant.
DeerFlow β for showing me what an enterprise-grade Agent framework truly looks like.
OpenClaw β for the layered memory and IM channel inspiration.
NanoClaw β for the Docker sandbox isolation pattern.
assistant-ui β for the beautiful and extensible React chat UI that powers the frontend.
DeepSeek β for providing the deepseek-v4-flash model with exceptional inference efficiency.
MiniMax β for providing the MiniMax-M2.7 model service that powers this project.
Andrej Karpathy β for the LLM wiki concept that inspired the wiki memory system: letting the LLM curate its own structured knowledge base.
This project is open source and available under the MIT License.