A maintainable LangGraph prototype for AI-managed data sources. External agents query people records via MCP or CLI; a supervisor coordinates core lookups (via the core data agent specialist) and specialist handoff for non-core attributes. Public interfaces are query-only; data addition will return via internal agent coordination.
uv sync --all-extras
cp .env.example .env
# Query existing CRM seed record
uv run mycelium query --person-key "Nichanan Kesonpat"
# Same query with a stable conversation thread (echoed in JSON as thread_id)
uv run mycelium query --person-key "Nichanan Kesonpat" --thread-id "session-abc"
# Request non-core attributes (core record in results; message describes ongoing research)
uv run mycelium query --person-key "Nichanan Kesonpat" --attributes age x_handle
# MCP server (stdio) — query_person only
uv run mycelium-mcpNote: The mycelium CLI now exits promptly after printing the JSON response (previously it could hang on async checkpointer cleanup). This makes ad-hoc CLI verification and smoke checks fast.
Research latency: When OPENAI_API_KEY and TAVILY_API_KEY are set, the first query for a non-core attribute (e.g. email) may run synchronous LLM + web search on cache miss and can take tens of seconds. Future work will move that to async dispatch so the CLI returns faster while research continues in the background.
See docs/database-notes.md if you have an older data/mycelium.db from before the schema simplification.
CLI and MCP return PersonResponse JSON: results, message, debug, plus optional correlation fields:
{
"results": [{ "id": "…", "name": "…", "employer": "…" }],
"message": "Found core record for …",
"debug": "…",
"trace_id": null,
"thread_id": "session-abc"
}thread_id— Passed via--thread-id(CLI) or top-levelthread_idin MCP request JSON; used for session continuity and LangGraph checkpointing.trace_id— Populated when LangSmith tracing is enabled (LANGCHAIN_TRACING_V2); links the response to the run in LangSmith.
LangSmith provides observability for graph executions (supervisor routing, core lookups, etc.).
- Sign up for a free account at smith.langchain.com.
- Go to Settings → API Keys and create a new key. Choose Key Type: Personal Access Token (PAT) (not Service Key). This will produce a key starting with
lsv2_pt_. - Copy
.env.exampleto.envand fill in:LANGCHAIN_TRACING_V2=trueLANGCHAIN_API_KEY=lsv2_pt_...(paste your PAT)LANGCHAIN_PROJECT=mycelium(or your project name) No need to pre-create this project in the LangSmith UI. The first trace sent with this project name will automatically create a new project called "mycelium" (or whatever you set) under your workspace. You can later rename, organize, or add tags in the LangSmith dashboard if desired. This variable controls which "folder"/project your traces appear under in the LangSmith UI.
- (Optional) For full trace URLs in output, set
LANGSMITH_ORG_IDandLANGSMITH_PROJECT_ID. - Run commands as usual. Responses will include
trace_id, and the CLI will print a direct LangSmith trace URL when tracing is active.
To disable (no key needed, no data sent): set LANGCHAIN_TRACING_V2=false or unset it. trace_id will be null.
See docs/architecture.md and .env.example for more.
The Studio setup gives you a rich visual debugger for the graph (supervisor routing, core lookup path, state inspection, etc.).
The langgraph dev command runs your graph execution locally on your machine. The Studio UI (the visual part) is a web app at smith.langchain.com that connects to your local server via a tunnel (currently ngrok). This is the supported way to get the nice interactive graph view.
Recommended way to start:
./bin/run-studioThen in another terminal:
ngrok http 2024(This forces tracing off. The script starts the dev server on localhost; you expose it with ngrok.)
The terminal running the dev server will print the local address. ngrok will print the public https://...ngrok.io (or ngrok-free.app) URL.
Important: Tunnels are ephemeral. Every new ngrok session (or ./bin/run-studio restart) gives a new URL. Always use the URL from the current terminals. Do not reuse old ones from previous runs or old browser tabs.
Once connected you can send query-only PersonQuery inputs (inside MyceliumGraphState) and step through the supervisor and related nodes. Legacy enrich/validator nodes may still appear in the diagram until graph simplification (task 1070).
See .env.example and the troubleshooting notes below.
The langgraph.json has the graph entrypoint and expanded CORS settings for Studio (smith.langchain.com origins + methods/headers/credentials). If you change it, you must restart the dev server.
Troubleshooting "Failed to initialize Studio TypeError: Failed to fetch":
- The cloud Studio page cannot directly fetch from your localhost. You must use a tunnel (ngrok in the current setup).
- Make sure
LANGCHAIN_TRACING_V2=false(or unset) so the dev server doesn't try to phone home to LangSmith during startup. - After starting
./bin/run-studio+ngrok http 2024, copy the https ngrok URL. - In a separate browser tab, visit that plain ngrok URL first and complete the ngrok "Visit Site" / warning page until you see clean JSON
{"ok":true}. - Then go to https://smith.langchain.com/studio/ in a fresh tab.
- Click "Connect to a local server" and paste the current ngrok URL.
- Hard-refresh the Studio page (Cmd/Ctrl-Shift-R) if needed.
- If still issues, try a different browser (Chrome/Firefox are usually more lenient).
- Check terminal output for any server startup errors (e.g. port in use — use
--port 8001). - The CORS config in langgraph.json (full allow_methods, allow_headers, allow_credentials) allows the smith.langchain.com origins. If you edit it, restart the dev server.
For the specific error "Failed to connect to Agent Server because the domain 'xxx.ngrok.io' is not allowed" (or similar for any tunnel):
- This is the Studio UI's security check for the tunnel domain (tunnels change every run).
- On the error page you are on (the one with the URL you pasted), look for "Advanced Settings" (usually at the bottom or in the connection panel).
- In Advanced Settings, add the exact domain from the error (both the bare domain and the
https://version). - Save/apply, then click Connect or refresh.
- Next time you get a new ngrok URL, you'll need to add the new domain in Advanced Settings again (or use the "Connect to local server" flow each time).
This is normal for tunnel-based local dev. The terminal output from ngrok and langgraph dev will show the exact current URL.
For "Failed to initialize Studio" / "TypeError: Failed to fetch" / "ConnectionError: Unable to connect..." (the most common tunnel gotcha):
- You must have
./bin/run-studioactively running andngrok http 2024(or your tunnel) actively running in terminals right now. - Tunnels are ephemeral: Old URLs are dead once the ngrok session or dev server stops. You must use the URL from the current running sessions.
- Steps (official + proven flow):
- Start
./bin/run-studio, then in another terminal runngrok http 2024. - Note the current 🚀 API URL from ngrok (https://...ngrok.io).
- In a separate browser tab, visit that plain new API URL first. Complete any ngrok warning/visit page until it shows clean
{"ok":true}JSON. - Open a completely fresh tab to
https://smith.langchain.com/studio/(hard refresh or new tab; old tabs may have stale connections). - Click "Connect to a local server" (the manual button — do not just open an old pre-filled link or rely on auto-connect).
- Paste the current live ngrok URL.
- In Advanced Settings (if it complains about domain), add the new bare domain +
https://version. - Click Connect.
- Start
- The langgraph.json has expanded CORS — restart the dev server after editing it.
- Important for schema / form changes (e.g. in Studio's visual Input editor): If you edit Pydantic models like
src/models/state.py(Person fields, etc.), simply reloading the Studio page is not enough. The runninglanggraph devprocess has the old module/schema in memory. You must Ctrl-C the server, restart./bin/run-studio(new ngrok URL), re-warmup the URL, and reconnect in Studio. Only then will the Input editor reflect updated required/optional fields and defaults. - Try Incognito or Firefox.
- Verify locally the server responds (
curl http://127.0.0.1:2024/ok), then test the current ngrok URL directly in a browser tab.
flowchart TD
MCP[MCP Client] -->|JSON| MCPServer[mycelium_mcp/server.py]
CLI[main.py CLI] -->|JSON| Graph
MCPServer --> Graph[graphs/core.py]
Graph --> S[Supervisor]
S -->|target wiring| CDA[core_data_agent]
CDA --> CI[CoreIdentity facade]
CI --> DB[(mycelium.db)]
S -->|found| RES[results + message]
S -->|missing| MISS[empty results + message]
S -->|non-core attrs| NC[results + researching message]
Graph --> CP[(checkpoints.sqlite)]
| Layer | Path | Role |
|---|---|---|
| Models | src/models/state.py |
Person, PersonQuery, PersonResponse, graph state |
| Storage | src/storage/core.py |
SQLite core people table (id, name, employer) |
| Agents | supervisor.py, core_data.py, routing.py, responses.py |
Coordinator + core data specialist (+ legacy enrich/validator until 1070) |
| Graph | src/graphs/core.py |
LangGraph + async AsyncSqliteSaver checkpointer |
| MCP | src/mycelium_mcp/server.py |
query_person, list_specialist_routing |
| Seed | data/seed_crm.json |
457 contacts from raw_data.json (dedup: Andrea Kalmans → Lontra Ventures, Pete Townsend → Techstars) loaded on startup |
Core CRM fields are id, name, and employer only. When a query asks for anything else (e.g. age, x_handle):
- The supervisor returns the core person in
resultsand explains inmessagethat those attributes are still being researched. - No shared derivative-dataset tables or registry exist in Phase 1 — specialist agents are coordinated by the supervisor, not stored as formal datasets in core storage.
- Future phases will spawn real specialist agents per attribute domain.
See docs/architecture.md for current architecture and direction.
mycelium/
├── data/seed_crm.json
├── src/
│ ├── agents/
│ ├── graphs/core.py
│ ├── models/state.py
│ ├── storage/core.py
│ ├── mycelium_mcp/server.py
│ └── main.py
├── prompts/system/CORE_PROMPT.md
└── docs/architecture.md
Frequent/quick smoke tests (during dev, Cursor work, quick checks):
uv run pytest -m smoke -q
uv run ruff check src testsFull test suite (end of major changesets, before reviews, full verification):
uv run pytest -q
uv run ruff check src testsSee tests/ for @pytest.mark.smoke (fast unit) vs @pytest.mark.full (integration with real storage/graph).
For Cursor agents: See the "Test Execution Policy" in prompts/cursor/WORKFLOW.md (and the embedded rule in .cursor/rules/04-cursor-workflow.mdc). Default to smoke tests only. If adding a test, Grok determines the category for any new test; run full tests immediately for any full-suite test. See the policy for details.
## Status
MVP core flow: query-only MCP + CLI + SQLite persistence + supervisor graph. Next: wire `core_data_agent` in graph (1070/1100), real specialist spawning, vector search.