A full-stack, multi-agent LLM system that searches literature, reads PDFs and web pages, reasons over a hybrid-retrieval knowledge base, writes academic prose, and plans multi-step research on its own β with human-in-the-loop control over what it does.
π δΈζηθ§ README.zh-CN.md
- π§ Intent-driven multi-agent orchestration β a dedicated Intent Agent routes every request, then an orchestrator dispatches one of 9 specialized agents across 8 LangGraph workflows (or a deterministic action handler).
- π§ A planning agent that composes its own tools β
plan β approve β execute β synthesize, choosing and chaining tools (paper search, web fetch, notesβ¦) at runtime instead of following a hard-coded script. - π Human-in-the-loop with durable checkpointing β the planning graph can pause at a plan-approval checkpoint and resume later, with state persisted at every node boundary via a Postgres-backed LangGraph checkpointer.
- π Hybrid-retrieval RAG β dense vectors (Chroma) + BM25 keyword search + neural reranking (FlashRank), over both a long-term knowledge base and a per-session temporary store.
- ποΈ Layered memory β short-term (with automatic conversation compression), working memory, and long-term user memory, threaded through every turn.
- π§° A pluggable tool & LLM layer β a central tool registry (paper search, PDF parsing, web scraping, image OCR+VLM, note CRUD) and per-agent LLM providers swappable between OpenAI and Anthropic.
- π Full-stack & multi-channel β async FastAPI backend with streaming over WebSocket, a Vite web UI, a
pywebviewdesktop app, and a QQ bot β all behind one channel abstraction, with JWT auth.
Real screenshots of the core features below. For extra impact, drop a short GIF at the top of this section (add it to
docs/screenshots/and reference it the same way).
Under the hood: how one research request flows
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β user βΈ θ°η δΈδΈ RAG ε¨ε»ηι’εηζζ°θΏε± β
β β
β βΈ intent ........ research_task β
β βΈ plan .......... [paper_search] + [web_search] (2 steps) β
β βΈ approve ....... β (auto / user-confirmed) β
β βΈ execute ....... 12 papers Β· 6 pages fetched β
β βΈ synthesize .... structured landscape + citations β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
flowchart TD
subgraph Channels
W[Web UI Β· Vite]
D[Desktop Β· pywebview]
Q[QQ Bot]
end
W & D & Q --> API[FastAPI + WebSocket<br/>streaming gateway]
API --> ORC[Orchestrator]
ORC --> IA[Intent Agent<br/>LLM routing + deterministic fallback]
IA --> R{Router}
R -->|graph workflows| WF[LangGraph Workflow Engine]
R -->|deterministic verbs| ACT[Action Handlers<br/>note Β· library ingest]
R -->|multi-step research| PA[Planning Agent<br/>plan βΈ approve βΈ execute βΈ synthesize]
WF --> AG[9 Specialized Agents]
PA --> AG
AG -.-> TR[Tool Registry]
AG -.-> RAG[(Hybrid RAG<br/>Chroma Β· BM25 Β· Rerank)]
AG -.-> MEM[(Layered Memory<br/>short Β· working Β· long-term)]
AG -.-> LLM[Per-agent LLM Providers<br/>OpenAI Β· Anthropic]
PA <-->|pause / resume| CKPT[(Postgres Checkpointer)]
Every message flows through the same disciplined pipeline:
- Intent recognition β the Intent Agent classifies the request into a route using session context, with a pure-keyword fallback if the LLM call fails.
- Routing β the orchestrator resolves the route to one of three execution modes:
- a workflow (a compiled LangGraph state machine),
- a deterministic action (plain verbs like note CRUD / library ingest β no graph overhead),
- or the planning agent for open-ended, multi-step research.
- Execution β agents call tools, retrieve context, and stream progress events back over WebSocket.
- Memory & continuity β outputs update short/working/long-term memory and session context, so follow-ups ("save that as a note", "expand this") resolve against the right task.
The parts that were genuinely hard β and the most interesting to talk through.
Natural-language routing is delegated to the Intent Agent (it has session context, active entities, and recent output in its prompt), but safety-critical and trivially-classifiable cases are handled deterministically β explicit UI markers, task continuation, and a full keyword fallback. This avoids the classic failure mode where an LLM router silently mis-routes a user mid-task. Deterministic verbs (note CRUD, library ingest) bypass the graph engine entirely as action handlers, keeping hot paths fast and predictable.
The research path is a 4-node LangGraph (plan β approve β execute β synthesize). The approve node is split from plan on purpose: LangGraph re-runs a node from its start on resume, and re-running the (expensive) planning LLM call would be wasteful β so the cheap approval gate is isolated. When enabled, it interrupt()s the graph, surfaces a plan card to the UI, and waits for the user to approve / modify / cancel, with an unattended-timeout default. State persists through a Postgres checkpointer, so a paused plan survives across requests.
Retrieval fuses dense (Chroma vector search), sparse (BM25 keyword), and neural reranking (FlashRank, with a CrossEncoder option). The system keeps a long-term knowledge base and a per-session temporary store, and decides between cached library context and fresh retrieval based on the query β so "this paper" follow-ups stay grounded in the right document.
Short-term memory holds recent turns and compresses itself once it grows past a threshold (older turns fold into a running summary); working memory carries per-task state; long-term memory captures durable user preferences. The Intent Agent and downstream agents all read from this so the system behaves coherently across a long session.
Tools are registered in a central Tool Registry with alias support, so a tool-calling agent can address paper_search / web_fetch / note_create by canonical name. Each agent can be wired to a different LLM provider/model (OpenAI or Anthropic), letting you put a cheap model on routing and a strong model on synthesis.
| Agent | Responsibility |
|---|---|
intent_agent |
Classifies each request into a workflow / action / planning route |
research_agent |
Multi-step planning agent; composes tools autonomously (planβexecuteβsynthesize) |
literature_agent |
Searches, filters, and downloads papers (arXiv + Semantic Scholar) |
rag_agent |
Hybrid retrieval + grounded reading/QA over the knowledge base or uploads |
web_agent |
Web search β page fetch β synthesized answer |
writing_agent |
Academic writing from user input / uploads / library / any mix |
note_agent |
Create / update / delete / search / embed research notes |
summary_agent |
Conversation & session summarization |
general_agent |
Open-ended reasoning, planning, and chat fallback |
LangGraph workflows (compiled state machines): paper_search, question_answer, web_search, academic_writing, image_understanding, conversation_summary, research_agent, general_agent.
Deterministic actions (direct handlers, no graph): note_action, library_ingest_action.
| Domain | Tools |
|---|---|
| Literature | paper search (arXiv, Semantic Scholar), semantic filter, PDF download |
| Documents | PDF/PPTX parsing (PyMuPDF + LlamaParse), chunking & indexing |
| Web | web search, page scrape, lightweight URL fetch |
| Vision | image understanding (OCR + VLM) |
| Knowledge | library add/search, RAG index & retrieval |
| Notes | full note CRUD + embedding |
| Layer | Technologies |
|---|---|
| Agents / Orchestration | LangGraph, custom orchestrator & router, Pydantic schemas |
| LLMs | OpenAI + Anthropic (pluggable per agent) |
| Retrieval | Chroma (dense), rank_bm25 (sparse), FlashRank (rerank), LangChain text splitters |
| Documents | PyMuPDF, python-pptx, LlamaIndex / LlamaParse |
| Backend | FastAPI, Uvicorn, async Python, WebSocket streaming |
| Storage | PostgreSQL (notes + LangGraph checkpointer), Chroma |
| Frontend | Vite SPA (ESM), pywebview desktop shell |
| Channels | Web, QQ bot (unified channel abstraction) |
| Auth | JWT, bcrypt, email verification (aiosmtplib) |
# 1. Install backend deps
pip install -r requirements.txt
# 2. Configure (copy and fill in API keys / DB url)
cp .env.example .env
# 3. Build the web frontend
cd web && npm install && npm run build && cd ..
# 4. Run
python web_server.py # web app β http://localhost:8000
# or
python desktop_app.py # desktop app (pywebview)Requires Python 3.10+, Node 18+, and a PostgreSQL instance. See
.env.examplefor the full configuration surface (LLM keys, per-agent models, DB, email, channels).
app/
βββ agents/ # 9 specialized agents (intent, research, rag, writing, β¦)
βββ orchestrator/ # routing, action handlers, HITL checkpoint logic
βββ workflows/ # LangGraph graph builders + registry
βββ rag/ # long-term & temporary retrieval, reranker
βββ memory/ # short-term / working / long-term memory
βββ tools/ # tool registry: search, pdf, web, image, notes, library
βββ channels/ # web + QQ channel adapters
βββ services/ # LLM providers, note service, β¦
βββ api/ # FastAPI server + WebSocket gateway
- Todo list & task board: Add a persistent frontend workspace for tasks, with filtering, priority, due dates, status transitions, and links to sessions, notes, and papers.
- MCP service: Expose paper search, knowledge base, notes, files, calendar, and other capabilities as an MCP server so external clients and in-app agents can share one tool protocol.
- Autonomous frontend workflow orchestration: Add a visual workflow canvas/node editor where users can compose agents, tools, inputs, outputs, and approval checkpoints into reusable workflows.
- Docker deployment: Provide
Dockerfile,docker-compose.yml, and dev/prod environment templates for FastAPI, Postgres, Chroma/vector storage, and frontend static assets. - Online web trial: Deploy a public demo/trial site with guest mode, sample data, quota limits, auth, and data isolation.
- Architecture rebuild: Rework the boundaries between agents, workflows, tools, memory, channels, and storage; separate core packages from app wiring and define cleaner plugin extension points.
- Streaming token-level output from the planning agent
- Pluggable retrieval backends (Qdrant / pgvector)
- Evaluation harness for RAG faithfulness & answer relevancy
- React frontend migration
Built as a deep exploration of agentic LLM system design β orchestration, planning, retrieval, and memory.





