Skip to content

gzhzk/nanodeer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

178 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NanoDeer

πŸš€ A 5-Layer AI Agent Harness Built from Scratch

MIT License Python 3.13 FastAPI Docker Version 0.1.0

Native ReAct Β· ContextManager/SandboxManager Β· Sandbox Isolation Β· HTTP SSE API

Architecture is what you build. Engineering is how you build it.

English | δΈ­ζ–‡


NanoDeer is a compact agent harness with a native async ReAct loop, explicit runtime managers, sandbox-aware tool routing, file-based memory/plan storage, SQLite checkpoint resume, structured trace events, and a Next.js assistant-ui frontend. It intentionally avoids LangGraph and middleware chains: the product path is HTTP/UI -> NanoEngine -> ReActExecutor -> tools/sandbox -> memory/plan/checkpoint.

Current product surface:

  • Streaming chat over HTTP SSE with conversation list, rename/archive/delete, and resume.
  • Docker-first sandbox execution with Local fallback and virtual /mnt/user-data path translation.
  • Host-side memory, wiki, and plan tools backed by inspectable files.
  • Image upload bridge from frontend to API to read_image.
  • Deterministic smoke benchmarks plus trace contracts for regression checks.

Table of Contents


Project Structure

nanodeer/
β”œβ”€β”€ pyproject.toml           # Build config, entry points, dependencies
β”œβ”€β”€ config.yaml              # Runtime config (LLM, sandbox, memory, thread…)
β”œβ”€β”€ config.yaml.example      # Template β€” copy to config.yaml and edit
β”œβ”€β”€ .env.example             # Template for API keys β€” copy to .env and fill in
β”œβ”€β”€ .gitignore               # Git ignore rules
β”œβ”€β”€ LICENSE                  # MIT License
β”œβ”€β”€ AGENTS.md                # Agent workflow documentation
β”œβ”€β”€ README.md                # This file (English)
β”œβ”€β”€ README_zh.md             # δΈ­ζ–‡η‰ˆζ–‡ζ‘£
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ dev.sh               # One-command launch: backend + frontend
β”‚   └── check.sh             # Run tests + lint
β”‚
β”œβ”€β”€ src/nanodeer/            # Backend source (Python)
β”‚   β”œβ”€β”€ cli/
β”‚   β”‚   β”œβ”€β”€ api.py           # Layer 5: FastAPI + SSE HTTP server
β”‚   β”‚   └── repl.py          # Layer 5: Debug REPL
β”‚   β”œβ”€β”€ engine.py            # Layer 4: NanoEngine β€” Application scheduler
β”‚   β”œβ”€β”€ agent/
β”‚   β”‚   β”œβ”€β”€ factory.py       # Layer 3-4 bridge: NanoDeerFactory assembler
β”‚   β”‚   β”œβ”€β”€ react.py         # Layer 3: ReActExecutor β€” main loop (core)
β”‚   β”‚   β”œβ”€β”€ state.py         # ThreadState / TurnSignals data models
β”‚   β”‚   β”œβ”€β”€ context.py       # Layer 3: ContextManager β€” context assembly
β”‚   β”‚   β”œβ”€β”€ prompt.py        # Layer 2: Static+dynamic dual-layer prompt builder
β”‚   β”‚   β”œβ”€β”€ sandbox_manager.py # Layer 3: Sandbox lifecycle manager
β”‚   β”‚   β”œβ”€β”€ compression.py   # Layer 4Β½: Conversation compression
β”‚   β”‚   β”œβ”€β”€ trace.py         # Runtime observability (structured events)
β”‚   β”‚   β”œβ”€β”€ checkpoint/      # Layer 1: SQLite session persistence
β”‚   β”‚   └── memory/          # Layer 1: File-based layered memory (L1-L4)
β”‚   β”œβ”€β”€ sandbox/
β”‚   β”‚   β”œβ”€β”€ __init__.py      # SandboxProvider ABC + module-level context
β”‚   β”‚   β”œβ”€β”€ docker.py        # Docker sandbox provider
β”‚   β”‚   β”œβ”€β”€ local.py         # Local subprocess fallback
β”‚   β”‚   β”œβ”€β”€ path.py          # Virtualβ†’physical path translation + security
β”‚   β”‚   └── tools.py         # SandboxExecTool β€” routes tools into container
β”‚   β”œβ”€β”€ tools/               # Built-in tool definitions (20 tools)
β”‚   β”œβ”€β”€ subagent/            # Semaphore-based subagent coordinator
β”‚   β”œβ”€β”€ plan/                # File-based JSON plan storage
β”‚   β”œβ”€β”€ skills/              # .md skill loading system
β”‚   └── config.py            # Pydantic config model + global singleton
β”‚
β”œβ”€β”€ frontend/                # Web UI (Next.js + assistant-ui)
β”‚   β”œβ”€β”€ app/                 # Next.js App Router pages
β”‚   β”œβ”€β”€ components/          # React components (chat, sidebar, settings)
β”‚   β”œβ”€β”€ lib/                 # Frontend utilities and API client
β”‚   β”œβ”€β”€ hooks/               # Custom React hooks
β”‚   β”œβ”€β”€ package.json         # Node dependencies
β”‚   β”œβ”€β”€ next.config.ts       # Next.js configuration
β”‚   β”œβ”€β”€ tsconfig.json        # TypeScript configuration
β”‚   β”œβ”€β”€ biome.json           # Linter/formatter config
β”‚   β”œβ”€β”€ postcss.config.mjs   # PostCSS configuration
β”‚   β”œβ”€β”€ components.json      # shadcn/ui component registry
β”‚   └── .env.example         # Frontend environment template
β”‚
β”œβ”€β”€ sandbox/                 # Docker sandbox image build
β”‚   β”œβ”€β”€ Dockerfile           # Minimal Python 3.11 sandbox image
β”‚   β”œβ”€β”€ build.sh             # Image build script
β”‚   └── README.md            # Sandbox setup guide (Chinese)
β”‚
β”œβ”€β”€ tests/                   # Python test suite
β”‚   β”œβ”€β”€ conftest.py          # Shared pytest fixtures
β”‚   β”œβ”€β”€ test_agent/          # ReAct executor & state tests
β”‚   β”œβ”€β”€ test_agent_memory/   # Memory system tests
β”‚   β”œβ”€β”€ test_cli/            # API endpoint & REPL tests
β”‚   β”œβ”€β”€ test_integration/    # End-to-end integration tests
β”‚   β”œβ”€β”€ test_plan/           # Plan storage tests
β”‚   β”œβ”€β”€ test_sandbox/        # Sandbox provider tests
β”‚   β”œβ”€β”€ test_skills/         # Skill loader tests
β”‚   β”œβ”€β”€ test_subagents/      # Subagent coordinator tests
β”‚   β”œβ”€β”€ test_benchmarks/     # Benchmark task tests
β”‚   └── test_tools_integration/ # Tool execution integration tests
β”‚
β”œβ”€β”€ benchmarks/              # Performance benchmarks
β”‚   β”œβ”€β”€ runner.py            # Benchmark runner
β”‚   β”œβ”€β”€ tasks/smoke.yaml     # Smoke test task definitions
β”‚   β”œβ”€β”€ judges.py            # LLM-as-judge evaluation
β”‚   β”œβ”€β”€ reporters/           # Output reporters (JSON, etc.)
β”‚   └── fixtures/            # Benchmark data fixtures
β”‚
β”œβ”€β”€ docs/                    # Design documentation (Chinese)
β”‚   β”œβ”€β”€ nanodeer_blueprint_20260401.md  # Project blueprint
β”‚   β”œβ”€β”€ runtime_architecture.md        # Runtime architecture
β”‚   β”œβ”€β”€ harness_architecture.md        # Harness architecture
β”‚   β”œβ”€β”€ memory_design.md               # Memory system design
β”‚   β”œβ”€β”€ sandbox_design.md              # Sandbox design
β”‚   β”œβ”€β”€ subagent_design.md             # Subagent design
β”‚   β”œβ”€β”€ plan_design.md                 # Plan system design
β”‚   β”œβ”€β”€ tools_design.md                # Tools design
β”‚   β”œβ”€β”€ skills_design.md               # Skills design
β”‚   β”œβ”€β”€ prompt_design.md               # Prompt engineering design
β”‚   β”œβ”€β”€ observability_design.md        # Observability & tracing
β”‚   β”œβ”€β”€ evaluation_plan.md             # Evaluation plan
β”‚   β”œβ”€β”€ long_horizon_design.md         # Long-horizon task design
β”‚   β”œβ”€β”€ refactoring_journey.md         # Refactoring journey notes
β”‚   └── ref/                           # Reference architecture reports
β”‚
β”œβ”€β”€ examples/                # Usage examples (coming soon)
β”‚
β”œβ”€β”€ .agents/                 # Agent orchestration configs (internal)
β”œβ”€β”€ .codex/                  # Codex metadata (internal)
└── .claude/                 # Claude Code project settings (internal)

Quick Start

Environment Requirements

Dependency Version Required Notes
OS Linux / macOS βœ… WSL2 recommended on Windows
Python β‰₯ 3.10 βœ… 3.11+ preferred; sandbox Docker image uses 3.11
Node.js β‰₯ 18 ⚠️ Only needed for frontend development
npm (comes w/ Node) ⚠️ Frontend dependency management
Docker β‰₯ 24.0 ⚠️ Required for sandbox isolation; Local fallback works without
curl any ⚠️ Required by dev/check scripts
LLM API Key β€” βœ… At least one provider (Anthropic, OpenAI, MiniMax, DeepSeek…)
RAM β‰₯ 4 GB β€” 8 GB+ recommended when running frontend + backend
Disk β‰₯ 1 GB free β€” For .venv, node_modules, and runtime data

βœ… Required   ⚠️ Optional (missing features degrade gracefully)   β€” Informational

Supported LLM Providers: Anthropic, OpenAI, DeepSeek, MiniMax, SiliconFlow, Zhipu (GLM), DashScope (Qwen), Moonshot (Kimi), Google Gemini, Groq, OpenRouter, Ollama (local).

Install

git clone https://github.com/gzhzk/nanodeer
cd nanodeer

cp .env.example .env
# Edit .env with your API key

pip install -e .

Run

# Start backend API + frontend dev server
./scripts/dev.sh
# Frontend: http://127.0.0.1:20265
# Backend:  http://127.0.0.1:20266

Check

# Run Python tests and frontend lint when dependencies are installed
python -m pip install -e '.[dev]'
./scripts/check.sh

# Run a focused Python test file
./scripts/check.sh tests/test_agent/test_react.py

For manual debugging:

# Terminal 1: HTTP API server
.venv/bin/python -m nanodeer.cli.api

# Terminal 2: frontend
cd frontend
npm run dev

# Optional CLI REPL
nanodeer-repl

Frontend

cd frontend
npm install

# Pre-build CSS (required once, re-run when changing src/app/globals.css)
npm run build:css

# Start dev server
npm run dev
# Opens at http://127.0.0.1:20265

The frontend proxies /api/* to the backend at http://127.0.0.1:20266.

Configuration

Edit config.yaml to configure:

  • LLM provider (MiniMax, Anthropic, OpenAI, SiliconFlow, etc.)
  • Sandbox settings (Docker image, network mode)
  • Thread storage paths

Background

At the end of last year I started working on agent-related projects β€” my understanding was rough: just AI doing things for you. In early March my mentor mentioned "harness engineering is getting popular lately, maybe look into it." So I started searching for materials and picked up Claude Code along the way.

By late March, DeerFlow came onto my radar. ByteDance's open-source project showed me for the first time what a proper enterprise-grade Agent harness framework should look like β€” state machine, middleware chain, sandbox isolation, tiered memory, every piece in its right place.

The story might have ended there. But on the last evening of March, I attended ByteDance's campus recruiting talk. One thing that stuck with me was their motto β€” "Work with great people on challenging things." During the talk, a message flashed across my phone screen β€” Claude Code went open source. Something clicked in that moment. DeerFlow showed me what a framework should look like. Claude Code showed me what a product could feel like. With OpenClaw trending in China, everything suddenly connected. That night, back in my dorm, I wrote down the first draft.

The core idea: distill the patterns that work β€” native ReAct loop, Docker sandbox isolation, tiered memory, inline orchestration β€” into a focused, auditable foundation where every module has one job and concerns are handled inline.


Key Differentiators

NanoDeer is a lightweight Agent harness built from scratch. What makes it different from LangGraph, CrewAI, and AutoGen:

1. No LangGraph β€” Native ReAct Loop

No graph compilation, no nodes, no edges. Just a pure while True async loop with inline orchestration:

ContextManager.load() β†’ SandboxManager.acquire() β†’ LLM.ainvoke()
β†’ Clarification check β†’ [Tool loop + bash audit] β†’ Checkpoint β†’ loop or end

This is not a simplification for its own sake β€” it means you can read the entire execution path in one file (react.py), debug with standard Python tooling, and understand control flow without learning a graph DSL. No hidden state, no opaque serialization, no framework lock-in.

2. Inline Orchestration + WAIT Interception

Most Agent frameworks route middleware as pre/post hooks around the LLM call. NanoDeer has no middleware chain β€” all cross-cutting concerns are inline functions or standalone Managers:

Mechanism Implementation
WAIT _check_clarification() inline checks [CLARIFICATION] tag, sets next_action = WAIT
Context loading ContextManager.load() parallel-executes: mkdir, memory load, plan load, upload processing
Sandbox lifecycle SandboxManager.acquire()/release() idempotent container lifecycle management
Bash audit _bash_safe() inline regex, blocks dangerous patterns
LLM retry _call_with_retry() exponential backoff for 429/5xx/timeout
Loop convergence repeated identical tool calls and max-turn guard synthesize a final answer instead of spinning forever

3. HTTP SSE API

NanoDeer exposes a FastAPI server with Server-Sent Events for real-time streaming. The frontend (assistant-ui) connects via standard HTTP SSE β€” no custom protocols, no process management.

Browser (assistant-ui)  ── HTTP SSE ──  api.py  ──  NanoEngine  ──  ReActExecutor

This means:

  • Frontend can be any HTTP client β€” browser, curl, Postman
  • Standard SSE protocol, no custom transport
  • Independent deployment: API server can run as a service

4. Dual-Layer Sandbox Architecture

Three design layers, not one:

Layer File Role
Tool Routing sandbox/tools.py SandboxExecTool wraps 9 tools at factory assembly, routes to Docker or Local transparently
Path Translation sandbox/path.py Virtual /mnt/user-data/... ↔ physical {base_path}/{exec_id}/user-data/..., traversal-protected
Security Audit react.py _bash_safe() inline regex audits commands, blocks dangerous patterns

For glob and grep, paths are validated/transformed as paths while patterns are base64-encoded. This keeps Docker and Local fallback behavior aligned for /mnt/user-data/....


Architecture

5-Layer Overview

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Layer 5: HTTP API β€” FastAPI + SSE                                                  β”‚
    β”‚   api.py β€” /api/chat (SSE), /api/chat/cancel, /api/conversations                   β”‚
    β”‚   repl.py β€” Async CLI REPL for debugging                                           β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  calls engine.run_streaming()
                             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Layer 4: NanoEngine β€” Application Entry                                            β”‚
    β”‚   engine.py β€” creates ThreadState, calls executor                                  β”‚
    β”‚   App-layer compression lives here, not in middleware                              β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  calls executor.run_streaming()
                             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Layer 3: Execution Core                                                            β”‚
    β”‚   react.py   β€” Native async ReAct loop                                             β”‚
    β”‚   context.py β€” ContextManager                                                      β”‚
    β”‚   sandbox_manager.py β€” Sandbox lifecycle                                           β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  invokes tools through the execution loop
                             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Layer 2: Capabilities                                                              β”‚
    β”‚   tools/     β€” Built-in tools and execution surfaces                               β”‚
    β”‚   prompt.py  β€” Prompt construction                                                 β”‚
    β”‚   subagent/  β€” SubagentCoordinator                                                 β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚  tools.invoke()
                             β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Layer 1: Persistence / Isolation / Data                                            β”‚
    β”‚   sandbox/   β€” DockerSandboxProvider, Local fallback, path translation             β”‚
    β”‚   memory/    β€” File-based MemoryStore (3 tiers)                                    β”‚
    β”‚   checkpoint/β€” SqliteCheckpointer for session resume                               β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Execution Flow

User Input (HTTP / CLI REPL / Web UI)
  ↓
api.py receives HTTP POST /api/chat, calls NanoEngine
  ↓
NanoEngine.run_streaming() β†’ ReActExecutor.run()
  ↓
β”Œβ”€ ContextManager.load() (parallel I/O) ────────────────────────────────────┐
β”‚  _ensure_dirs()    Creates {thread_id}/user-data/{workspace,uploads,outputs} β”‚
β”‚  _load_memory()    MemoryLayers.inject() β€” L1-L4 layered memory           β”‚
β”‚  _load_plan()      Loads plans and step progress into context             β”‚
β”‚  _process_uploads  Writes uploaded files to uploads/                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ↓
β”Œβ”€ SandboxManager.acquire() (idempotent) ──────────────────────────────────┐
β”‚  Checks state.sandbox β†’ reuses or acquires fresh container               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ↓
LLM.ainvoke(prompt + messages)  ← with _call_with_retry() on 429/5xx/timeout
  ↓
β”Œβ”€ _check_clarification() (inline) ────────────────────────────────────────┐
β”‚  Detects [CLARIFICATION] tag β†’ sets WAIT β†’ return to user                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ↓
[no tool_calls? β†’ END β†’ checkpoint + absorb β†’ break]
  ↓
for each tool_call (individually, not batched):
  β”Œβ”€ _bash_safe() (inline audit) ──────────────────────────────────────────┐
  β”‚  Hard blocks: shell metachar, rm -rf /, curl|bash                      β”‚
  β”‚  Warns on: pip install, chmod 777                                      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ↓
  tool.ainvoke(args)  ← SandboxExecTool routes to Docker or Local
  ↓  (try/except catches ValidationError + generic errors)
β”Œβ”€ Persistence ────────────────────────────────────────────────────────────┐
β”‚  Checkpointer.save()  β†’ SQLite (messages + thread metadata)              β”‚
β”‚  ContextManager.absorb() β†’ episodic log (auto-appended)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  ↓
PROCESS β†’ next turn    END β†’ SandboxManager.release() + break

Key design decisions visible in this flow:

  • No middleware chain β€” all cross-cutting concerns are inline functions or standalone Managers
  • Sandbox release is END-only β€” PROCESS keeps the container alive across turns
  • SandboxManager.acquire() is idempotent β€” checks state.sandbox before acquiring
  • save_memory is not in SANDBOX_TOOL_CONFIGS β€” runs on host naturally, no interception needed
  • Checkpoint stores only messages + thread metadata β€” system_prompt/sandbox/next_action reconstructed at runtime

Storage Paths

All runtime data under ~/.nanodeer/. Harness and App layers maintain separate subtrees.

~/.nanodeer/
β”œβ”€β”€ memory/                  # Agent-maintained knowledge
β”‚   β”œβ”€β”€ USER.md              # User preferences and context (LLM writes)
β”‚   β”œβ”€β”€ MEMORY.md            # Legacy flat-file memory (LLM writes)
β”‚   β”œβ”€β”€ wiki/entries/        # Structured wiki entries (JSON, tagged)
β”‚   └── episodic/            # Session logs (auto-appended, daily files)
β”‚
β”œβ”€β”€ plans/
β”‚   β”œβ”€β”€ {plan_id}.json      # Full Plan document (goal, steps, status)
β”‚   └── index.json          # Plan index for fast listing
β”‚
β”œβ”€β”€ threads/
β”‚   β”œβ”€β”€ threads.db           # SQLite β€” ThreadState snapshots (resumable)
β”‚   └── {thread_id}/         # Per-thread sandbox (ephemeral)
β”‚       └── user-data/       # Volume-mounted to container /mnt/user-data/
β”‚           β”œβ”€β”€ workspace/
β”‚           β”œβ”€β”€ uploads/
β”‚           └── outputs/
β”‚
└── conversations/
    └── {thread_id}.json     # Metadata index (thread_id + title, no messages)
Path Persists Purpose
~/.nanodeer/memory/ Yes Agent knowledge (USER/MEMORY/wiki/episodic)
~/.nanodeer/plans/ Yes Plans with embedded steps
~/.nanodeer/threads/{id}/ No (ephemeral) Sandbox working directory
~/.nanodeer/threads/threads.db Yes SQLite session snapshots (resumable)
~/.nanodeer/conversations/ Yes Web UI session index (thread_id + metadata)

Signal & State Design

NanoDeer uses two data carriers with distinct lifetimes:

TurnSignals β€” ephemeral, fresh each turn:

Signal Written by Read by Effect
clarification_question react.py _check_clarification() App layer Display question to user, WAIT
memory_context MemoryLayers.inject() via ContextManager Prompt builder Inject memory into LLM context
plan_context ContextManager._load_plan() Prompt builder Inject plan + step progress into LLM context
uploaded_files_list ContextManager._scan_uploads() Prompt builder Inject uploaded file info

ThreadState β€” persistent across turns:

Field Role
messages Full conversation history (Human/AI/Tool)
next_action PROCESS β†’ continue loop; WAIT β†’ return to caller; END β†’ terminate
title Conversation title (for UI listing)
sandbox Container state (container_id, status; runtime only, not persisted)

Design Principles

  1. One-way dependency: Agent β†’ Harness. Harness has no knowledge of Agent's business logic.
  2. No middleware chain: All cross-cutting concerns are inline functions or standalone Managers. Zero indirection.
  3. Inline error handling: _call_with_retry() for LLM calls, try/except for tool execution.
  4. Compression is app-layer: Timing decided by NanoEngine, not auto-triggered in the ReAct loop.
  5. Prompt auto-detection: Sections render only when data is present AND feature flag is True.
  6. Sandbox + Host dual paths: Sensitive ops through containers, save_memory/plan tools directly on host.
  7. Native ReAct loop: No LangGraph dependency. A direct while True loop with retry, clarification, tool execution, and convergence guards instead of a graph compiler.
  8. Hybrid persistence: Memory/plan uses files (inspectable, auditable). Checkpoint uses SQLite (efficient queries).

Tools

Tool Category Sandbox
read_file, write_file, ls, glob, grep, edit_file File βœ… Docker/Local
bash, git, exec_python Shell βœ… Docker/Local
web_search, web_fetch, read_image External / uploads ❌ Host
save_memory, search_memory Memory ❌ Host
create_plan, add_step, update_step, list_plans Plan ❌ Host (direct write)
spawn_subagent, get_subagent_results Subagent βœ… Own sandbox per worker
invoke_skill Skills ❌ Host

Project Status & Roadmap

Current (v0.1.0) β€” Core framework stable:

  • βœ… Native ReAct loop with inline orchestration
  • βœ… Docker + Local sandbox with path isolation
  • βœ… 20 built-in tools
  • βœ… File-based memory/wiki and plan storage
  • βœ… SQLite checkpoint persistence for conversation resume
  • βœ… HTTP SSE API (FastAPI) + conversation management endpoints
  • βœ… Image upload bridge through the frontend/API into read_image
  • βœ… CLI REPL
  • βœ… SubagentCoordinator with constrained read-only workers
  • βœ… Skill workflow loader
  • βœ… assistant-ui frontend (Next.js + assistant-ui), including Projects/Plans/Memory/Wiki sidebar summary
  • βœ… Structured trace events and deterministic smoke benchmark suite

In progress / planned:

Area Status
Frontend polish and richer workspace views πŸ”„ In progress
Plan/Memory/Wiki detail pages wired to backend APIs πŸ”„ In progress
Inline: guardrail, timeout, fallback πŸ“ Planned
Inline: dangling tool call injection πŸ“ Planned
Broader benchmark task sets beyond smoke πŸ“ Planned
Long-horizon task loop πŸ“ Planned
γ€€β”œβ”€ Focus (focus-driven context injection) πŸ“ Planned
γ€€β”œβ”€ TurnBudget (turn/duration budget) πŸ“ Planned
γ€€β”œβ”€ Learning (error analysis + lesson extraction) πŸ“ Planned
γ€€β”œβ”€ Reflection (session-end reflection) πŸ“ Planned
 └─ Plan-Memory bridge (step self-judgment β†’ wiki) πŸ“ Planned
IM bot integration (Feishu/WeCom) πŸ“ Planned
Evaluation framework πŸ“ Planned
Multi-model comparison benchmarks πŸ“ Planned

Design Inspirations

Source What it taught me
DeerFlow Middleware chain + state machine; next_action signal routing
Claude Code Tool-first design, clarification-driven pauses via <clarification> tags
OpenClaw Layered memory (L1-L4); wiki-structured knowledge curated by the LLM
NanoClaw Docker sandbox isolation; per-thread containers, volume mounts, path translation

Acknowledgments

To my family β€” for their silent support and endless patience, which made this possible.

To my mentor β€” for opening the door to Agent and Harness Engineering, and encouraging me to explore.

Claude Code β€” my best coding companion, supercharging my AI workflow, and showing me that a product can be both powerful and elegant.

DeerFlow β€” for showing me what an enterprise-grade Agent framework truly looks like.

OpenClaw β€” for the layered memory and IM channel inspiration.

NanoClaw β€” for the Docker sandbox isolation pattern.

assistant-ui β€” for the beautiful and extensible React chat UI that powers the frontend.

DeepSeek β€” for providing the deepseek-v4-flash model with exceptional inference efficiency.

MiniMax β€” for providing the MiniMax-M2.7 model service that powers this project.

Andrej Karpathy β€” for the LLM wiki concept that inspired the wiki memory system: letting the LLM curate its own structured knowledge base.

License

This project is open source and available under the MIT License.

About

πŸš€ A 5-Layer AI Agent Harness Built from Scratch β€” Native ReAct, Context Managers, Sandbox Isolation, HTTP SSE API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors