A FastAPI + React dashboard that turns NotebookLM into persistent, structured memory for AI coding agents. Track multiple projects, review agent checkpoints, query across notebooks, and browse all state as Obsidian-compatible markdown.
Blog post: The Missing Layer Between AI Coding Agents and Institutional Knowledge
AI coding agents like Claude Code, Codex, Kilo Code, and OpenCode start every session from scratch. They re-read files, re-discover architecture, and re-learn conventions. Hard-won debugging insights, design rationale, dependency gotchas, and domain context evaporate when a session ends.
NotebookLM as persistent memory solves this. Instead of starting cold, the agent queries a curated notebook for project context, team conventions, and past discoveries — then writes back what it learns during the session so the next one starts smarter.
Session 1 Session 2 Session 3
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Agent │──checkpoint──┐ │ Agent │──checkpoint──┐ │ Agent │
│ discovers │ │ │ starts │ │ │ starts │
│ gotcha X │ ▼ │ knowing X │ ▼ │ knowing │
│ │ ┌──────────┐ │ discovers │ ┌──────────┐ │ X, Y, and │
└───────────┘ │NotebookLM│ │ gotcha Y │ │NotebookLM│ │ external │
│ Project │ └───────────┘ │ Project │ │ research │
│ Memory │ │ Memory │ └───────────┘
└──────────┘ └──────────┘
▲ bootstrap ▲ bootstrap
│ │
Session 2 starts here Session 3 starts here
The hidden cost of stateless agents
AI coding agents are powerful but amnesiac. Every session starts from zero — the agent re-reads files, re-discovers architecture, re-learns conventions. This isn't just inefficient; it's compounding waste. Each session that discovers something non-obvious and then discards it forces the next session to rediscover it, or worse, to make assumptions and get it wrong.
The real cost isn't the wasted tokens or the extra minutes. It's the decisions the agent makes without context it should have had:
- It proposes an approach you rejected last week for reasons that aren't in the code
- It upgrades a dependency that was pinned for a reason no comment explains
- It re-investigates a bug that was already root-caused in a previous session
- It doesn't know that the staging environment has a 30-second query timeout that will break the migration it's writing
These aren't edge cases. They're the normal experience of using coding agents on real projects over time. The agent is smart enough to do the work — it just doesn't remember what it learned yesterday.
Flat files work for static facts — coding conventions, build commands, directory layout. But they break down for accumulated knowledge:
| Approach | Works for | Breaks when |
|---|---|---|
CLAUDE.md |
Static conventions, build commands | Knowledge grows beyond what fits in a prompt; no retrieval, agent reads everything every time |
.context/ folders |
Per-directory notes | No search across files; agent must know which file to read; no citation or sourcing |
| Vector DB / RAG | Semantic search over documents | Requires infrastructure; no built-in grounding or source verification; cold-start problem |
| NotebookLM | Grounded retrieval with citations over curated sources | Needs a management layer (that's what Engram provides) |
NotebookLM gives you retrieval-augmented generation with source grounding and citations — out of the box, no infrastructure. Every answer traces back to specific sources the agent (or you) added. When the agent says "the payment service returns 202 for async operations," the citation tells you where that claim comes from, so you can judge whether to trust it.
There's a less obvious but significant advantage: NotebookLM offloads both storage and analysis to Google's infrastructure. The coding agent doesn't process, index, or store your project knowledge — Google does. This changes the economics of agent memory:
Token savings. Without external memory, the agent's only option is to stuff context into its own prompt — pasting CLAUDE.md files, reading documentation, re-analyzing code on every session. That context costs tokens. With NotebookLM, the agent sends a short natural-language query and gets back a focused, grounded answer. A bootstrap query that returns a 200-word answer replaces what might otherwise be 5,000+ tokens of raw context files loaded into every session.
Analysis happens outside the agent's context window. When you add a 50-page design doc or a long Slack thread as a NotebookLM source, Google indexes and chunks it. The agent never sees the raw document — it sees NotebookLM's synthesized answer with citations. The heavy lifting of reading, understanding, and retrieving from large documents happens on Google's side, not inside your Claude or GPT session.
Research doesn't burn agent tokens. When the agent triggers external research (web search, Drive search), NotebookLM's research agent does the crawling, evaluating, and importing. The coding agent fires a single API call and gets the results. Compare this to an agent that tries to research by reading web pages directly — each page consumes context window, and the agent must decide what's relevant in real-time.
Storage is free and unlimited. NotebookLM notebooks have no meaningful source limit for this use case. You're not paying per-vector or per-document like you would with Pinecone, Weaviate, or any hosted vector DB. The knowledge base grows over time at zero marginal cost.
The net effect: the coding agent stays focused on what it's good at — reading your code, reasoning about changes, writing implementations — while Google handles the memory infrastructure. Each agent session is shorter, uses fewer tokens, and starts with better context.
Consider a team building a payments service. Here's what happens across sessions with Engram:
Session 1 — Initial implementation
The agent is asked to implement Stripe webhook handling. It bootstraps from the notebook but finds nothing — the notebook is empty. It writes the webhook handler, discovers that Stripe's signature verification has a 300-second clock tolerance, and that the staging environment's NTP sync drifts by up to 10 seconds. Before the session ends, it submits two checkpoints to the webhook queue:
"Stripe webhook signature verification tolerates 300s clock skew by default. Staging NTP drifts ~10s. This hasn't caused issues yet but will if we tighten the tolerance window. See Stripe docs section on webhook-signatures."
"The webhook handler parses event types from
event.typeand routes to per-type handler functions."
You approve the first — it's a genuine operational insight that isn't in any config file or code comment. You reject the second — that's just how the code works, and any future agent can see it by reading the handler. The queue keeps the notebook curated.
Session 2 — Adding idempotency (two weeks later)
A different agent session is asked to add idempotency keys to payment retries. It bootstraps:
"I'm about to add idempotency to payment retries. What should I know?"
NotebookLM returns the architecture source you seeded (which describes the event-driven payment flow and its at-least-once delivery guarantee). The answer covers the general retry design but lacks specifics. The agent asks a follow-up in the same conversation:
"What failure modes should I expect when Stripe's internal retry overlaps with our retry logic? Are there known race conditions?"
NotebookLM has a partial answer from the architecture doc — the system is designed to tolerate duplicate deliveries — but doesn't know about Stripe's specific idempotency key behavior. The agent triggers external research, importing Stripe's idempotency key documentation. Back in the same conversation with the enriched context:
"The imported docs mention idempotency keys expire after 24 hours. Does our retry window ever exceed that?"
NotebookLM connects this to the architecture source: the retry policy allows up to 72 hours for failed charge attempts. The agent now knows there's a real gap — retries after 24 hours would generate new charges instead of being deduplicated. It implements with key-rotation logic and checkpoints:
"Stripe idempotency keys expire after 24h but our retry window extends to 72h. Retries after key expiration silently create duplicate charges. Added key-rotation with generation tracking. See Stripe docs idempotency-keys#expiration."
Session 3 — Debugging a production issue (a month later)
A webhook starts failing intermittently in staging. A new agent session bootstraps and immediately gets the clock skew insight from Session 1, the idempotency-expiration gap from Session 2, and the Stripe docs imported during research. The agent could work backwards from git blame on the tolerance config, but it doesn't have to — the Session 1 checkpoint points directly at NTP drift as the likely cause. It checks the tolerance window first and finds that someone tightened it to 30 seconds without knowing about the staging drift.
Three sessions. Each one started smarter than the last. The third session solved in minutes what would have been a longer investigation — not because the answer was unfindable without memory, but because memory told it exactly where to look first.
Engram defines four skills that turn each coding session into a learning loop:
┌─────────────────────────────────────────────────────────────────────┐
│ Agent Session │
│ │
│ 1. Bootstrap 2. Iterative Research 3. Work │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ Query the │─────▶│ Follow-up Qs in │─────▶│ Write code │ │
│ │ notebook: │ │ same conversation│ │ with full │ │
│ │ "What should │ │ until you have │ │ context │ │
│ │ I know?" │ │ implementation │ │ │ │
│ └──────────────┘ │ detail │ └──────┬───────┘ │
│ │ │ │ │
│ │ If notebook │ 4. Checkpoint │
│ │ lacks answer: │ ┌──────▼───────┐ │
│ │ → External │ │ Write back │ │
│ │ research │ │ discoveries │ │
│ └──────────────────┘ │ for next │ │
│ │ session │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Bootstrap recovers context so the agent doesn't start cold. Iterative research digs deeper when the first answer isn't enough — follow-up questions in the same conversation, with gap detection for undefined references, missing implementation details, and uncovered edge cases. External research imports knowledge the codebase doesn't contain — docs, articles, internal specs. Checkpoints capture what the agent learned (debugging dead-ends, implicit coupling, performance constraints) so the next session starts smarter.
The test is simple: if a fresh git clone would give the agent the information, it doesn't belong in memory. The agent can read your source code, your config files, your tests. Duplicating that into NotebookLM just adds noise and makes bootstrap answers longer without making them smarter.
Memory is for the knowledge that lives between the lines of code — the things you'd tell a new teammate over coffee that they'd never figure out from reading the repo:
| The agent can get this from the code | The agent needs this from memory |
|---|---|
| You use event sourcing | You chose event sourcing because legal required full audit trails — CRUD was explicitly rejected |
pg is pinned at v8.11 |
v8.12 broke connection pooling with your PgBouncer config — three engineers spent a day on it |
| There's a 30s timeout in the retry logic | That timeout exists because the payment provider's API sometimes hangs for 20s under load, and 30s was chosen after a production incident |
| The auth test is flaky | The flakiness is timezone-dependent, not a race condition — two people already wasted time investigating concurrency |
The /api/search endpoint exists |
It's on the homepage critical path with a 200ms p99 budget — adding a database query requires a feature flag |
| There's a notification service | It imports billing.models.InvoiceLineItem directly — there's no test that catches schema changes breaking notifications |
The left column is what grep and cat give you. The right column is what only humans know — the why, the what went wrong, the don't touch this because. That's what makes the difference between an agent that writes correct code and an agent that writes code that survives contact with production.
Getting this wrong in either direction hurts. Put too much in memory (code structure, file listings, API signatures) and bootstrap answers become bloated — the agent spends tokens reading things it could have read from the repo. Put too little in memory and you're back to the amnesia problem. The sweet spot is knowledge that's high-value, non-obvious, and impossible to derive from the codebase alone.
Agents make mistakes. They may checkpoint something inaccurate, irrelevant, or obvious. The webhook queue lets you review every checkpoint before it enters the knowledge base — approve insights that are genuinely useful, reject noise. This keeps the notebook curated rather than filled with session dumps, and means you can trust the bootstrap answers that future sessions receive.
The rationale above focuses on agent memory — the core use case. But NotebookLM as a coding knowledge layer enables several other workflows:
General LLMs hallucinate function names and invent API parameters — especially for new or rapidly-changing libraries with limited training data. Upload official docs for a new SDK or internal API and the coding agent gets citation-backed answers instead of plausible fiction. When adopting unfamiliar tools, the agent queries NotebookLM for correct function signatures and parameter names rather than guessing from outdated training data.
Wired into the coding agent as a skill or tool, this becomes seamless: the agent queries NotebookLM mid-session — "What parameters does this endpoint accept?" — and gets a source-grounded answer without the developer leaving their editor.
One high-value memory pattern: storing ADRs as notebook sources. ADRs capture what was rejected and why — the knowledge most likely to be invisible in code. Without them, the agent re-proposes approaches your team already discarded for reasons no grep will reveal (legal constraints, failed performance tests, vendor limitations). With them, every bootstrap query includes the decision history before the agent writes a single line.
"Vibe-coding" — generating large amounts of code quickly with AI — creates a specific problem: the codebase grows faster than the developer's mental model. You end up with a working app but can't explain how half the files connect.
NotebookLM closes the gap: have the agent generate an architecture doc describing the project's logic, upload it as a source, then query it to trace file interactions and data flow. Tools like uithub can convert an entire GitHub repository into a single markdown document for upload, enabling architecture summaries and code review prep on unfamiliar codebases.
NotebookLM's Studio generation turns project knowledge into deliverables without manual writing:
- Technical manuals — Ingest a repository's sources and generate structured documentation in a single pass. A medium-sized repo can produce a comprehensive technical manual from one generation.
- Technical editing — Feed a draft user guide back into NotebookLM and instruct it to cross-reference against original sources, catching inaccuracies, outdated references, and gaps.
- Stakeholder reporting — Generate slide decks, audio overviews, or infographics that explain technical decisions and project status to non-technical audiences. The Audio Overview feature turns notebook contents into a conversational podcast-style summary.
Accessible through the Workspace's Notebook panel (Studio content generation and artifact download).
Notebooks can encode active project rules — deprecated modules, naming conventions, where specific logic must live. Unlike a flat CLAUDE.md with 200 rules that the agent reads in full every session, a notebook lets the agent query "what rules apply to payment handler modules?" and retrieve only the relevant subset with source citations. For large projects, this targeted retrieval scales better than static context files.
Per-project memory — the bootstrap/checkpoint loop described above — solves the amnesia problem for individual projects. But real engineering organizations don't have isolated projects. They have systems that interact, infrastructure that's shared, and conventions that span teams. A debugging insight from the payments service is relevant when the orders service hits the same symptom. The auth team's session-token gotchas matter when the mobile team implements token refresh.
Cross-notebook Q&A addresses this by querying multiple project notebooks in parallel. But a manual cross-notebook query is a point-in-time action — the insight it surfaces doesn't flow back into the individual project notebooks. The next session on the orders service still starts cold unless someone remembers to run the same cross-QA query again.
Memory evolution closes this loop. The term comes from A-MEM (Xu et al., 2025), an agentic memory system where integrating a new memory triggers updates to existing memories' context and links. The same principle applies here: a cross-QA result that synthesizes knowledge from three projects should become a source that enriches each of those three project notebooks, so future single-notebook bootstrap queries surface the cross-project insight without requiring a fresh cross-QA run.
Research supports this. Agent KB (2025) demonstrated that cross-domain knowledge reuse improved SWE-bench code repair resolution rates by 12 percentage points through a Reason-Retrieve-Refine pipeline. The Anthropic 2026 Agentic Coding Trends Report identifies memory as a top concern for production agentic workflows. And the growing demand for multi-agent coordination across codebases — where the developer becomes "a slow, error-prone message broker between two AI agents" — makes organizational memory infrastructure a prerequisite for scaling agentic coding beyond single repos.
There are seven trigger scenarios, mapped to where they occur in the agent session lifecycle:
At bootstrap (session start)
-
Shared infrastructure & environment knowledge. Every project deployed to the same staging/prod environment shares gotchas that no single repo contains — NTP drift, database timeouts, load balancer quirks. Trigger: agent starts a session on any project that touches shared infra.
-
Organizational conventions that span projects. PR conventions, testing philosophy ("never mock the database"), review expectations. These aren't per-project — they're team-level. Trigger: every session start.
Pre-implementation
-
Cross-service interface awareness. When an agent on Service A changes an API contract, the agent on Service B doesn't know. Trigger: agent is about to modify or consume an API endpoint, event schema, or shared data model.
-
Pattern transfer — reusing proven solutions. When an agent encounters a retry pattern, caching strategy, or auth flow, it retrieves how other projects already solved the same problem — including what didn't work. Trigger: agent is implementing a common pattern that another project has already solved.
During debugging
- Cross-cutting root cause knowledge. A root cause discovered in one project (clock skew in staging, a library bug, a vendor API quirk) applies to every project in the same environment. Trigger: agent is debugging a symptom that could have an infrastructure-level or dependency-level root cause.
Pre-migration / pre-upgrade
- Learning from other projects' migrations. When one team already migrated from Express to Fastify, their pitfalls are invaluable when the next team attempts the same migration. Trigger: agent is about to do a migration or major upgrade that another project has already attempted.
At checkpoint time (proactive push)
- Detecting cross-project relevance at write time. When an agent discovers something about shared infrastructure, a shared library, or a vendor API, the system should push that insight to every affected project's notebook — not just the one that discovered it. Trigger: agent checkpoints a discovery that involves shared dependencies, infrastructure, or organizational processes.
These scenarios cluster into two retrieval patterns:
| Pattern | Scenarios | Implementation |
|---|---|---|
| On-demand — relevant only when the task overlaps | 3 (interfaces), 4 (patterns), 6 (migrations) | Auto-triggered cross-QA at bootstrap when keyword triggers match the question |
| Broadcast — every project should see it | 1 (infra), 2 (conventions), 5 (cross-cutting bugs) | Cross-QA sync pushes synthesized answers back to each notebook as a source; cross-project checkpoints are pushed to all tracked notebooks on approval |
Three mechanisms work together:
-
Bootstrap with auto-trigger (
POST /api/bootstrap). When an agent starts a session, the bootstrap endpoint queries the project's own notebook and detects whether the question touches cross-project concerns (infrastructure, shared dependencies, patterns, debugging, conventions). If triggers match, it automatically queries other tracked project notebooks and synthesizes the combined context. The agent gets both project-specific and cross-project knowledge in a single call. -
Cross-QA sync (
POST /api/cross-qa/{id}/sync). After a cross-notebook query produces a synthesized answer, syncing pushes that answer back to each participating notebook as a NotebookLM source. Future single-notebook bootstrap queries on any of those projects will surface the cross-project insight without requiring a fresh cross-QA run. The cross-QA result evolves from a point-in-time answer into persistent organizational knowledge. -
Cross-project checkpoints (
cross_project: trueon webhook ingest). When an agent discovers something with cross-project implications — a shared library gotcha, an infrastructure quirk, a vendor API behavior — it submits the checkpoint withcross_project: true. On approval, the checkpoint is pushed to all tracked project notebooks (or a specified subset), not just the originating project. Every project benefits from the discovery.
Add a CLAUDE.md to your project root so Claude Code automatically uses NotebookLM memory. The four skills define a session lifecycle: bootstrap → iterative research → work → checkpoint.
# CLAUDE.md
## NotebookLM Memory
- **Project ID**: <ENGRAM_PROJECT_ID> (from the dashboard)
- **Notebook ID**: <PASTE_ID_HERE>
- **Engram URL**: http://localhost:8000
### 1. Bootstrap — before starting work
Query the project notebook and automatically pull cross-project knowledge when relevant:
curl -s -X POST http://localhost:8000/api/bootstrap \
-H "Content-Type: application/json" \
-d '{
"project_id": "<PROJECT_ID>",
"question": "I am about to [TASK]. What architectural decisions, gotchas, conventions, and recent changes should I know?",
"auto_cross_qa": true
}'
The bootstrap endpoint queries your project's notebook and, when the question touches
shared concerns (infrastructure, dependencies, debugging patterns, conventions), automatically
queries other tracked project notebooks and synthesizes the combined context.
Save the response's `conversation_id` for follow-ups.
### 2. Iterative research — when one answer isn't enough
Use the same conversation to ask follow-ups that build on previous answers:
notebooklm ask "You mentioned [concept]. What are the specific implementation details?" --notebook <ID> -c <CONVERSATION_ID> --json
After each answer, check:
- Are there concepts mentioned but not explained?
- Do you have enough detail to implement, or are you guessing?
- Were edge cases or failure modes covered?
### 3. External research — when the notebook lacks the answer
If a follow-up reveals a knowledge gap the notebook can't fill:
notebooklm source add-research "<specific query>" --notebook <ID> --from web --mode fast
notebooklm research wait --notebook <ID> --import-all
Then ask again in the same conversation to get enriched answers.
### 4. Checkpoint — when you discover something non-obvious
Submit discoveries for human review via the webhook queue:
curl -X POST http://localhost:8000/api/webhook/ingest \
-H "Content-Type: application/json" \
-H "X-API-Key: ${ENGRAM_WEBHOOK_KEY}" \
-d '{
"project_id": "<PROJECT_ID>",
"agent_id": "claude-code",
"action": "checkpoint",
"payload": {
"title": "YYYY-MM-DD — Category — Topic",
"category": "debugging",
"content": "The non-obvious thing future sessions need to know..."
},
"cross_project": false
}'
Set `"cross_project": true` when the discovery involves shared infrastructure, a shared
library, or a vendor API behavior that affects multiple projects. On approval, the checkpoint
will be pushed to all tracked project notebooks — not just the originating project.
Optionally set `"target_notebook_ids": ["id1", "id2"]` to limit which notebooks receive it.
Or write directly to the notebook (bypasses review):
Write a short summary to /tmp/checkpoint.md, then:
notebooklm source add /tmp/checkpoint.md --notebook <ID> --title "YYYY-MM-DD — Category — Topic"Four skills for the agent session lifecycle:
| Skill | Phase | What it does |
|---|---|---|
| Session Bootstrap | Session start | Query NotebookLM to recover project context, gotchas, decisions |
| Iterative Research | Before implementation | Multi-round follow-up questions with gap detection to get implementation-level detail |
| Research & Enrich | Knowledge gap | Use NotebookLM's research agent to import external sources |
| Session Checkpoint | During/after work | Write non-obvious discoveries back to NotebookLM |
Agents ask follow-up questions in the same conversation, building on each previous answer. The /api/notebooklm/ask endpoint accepts a conversationId parameter so each round has full context of what was already discussed. The skill includes gap detection patterns — checking after each answer for undefined references, missing implementation detail, uncovered edge cases, and cross-domain gaps — and defines when to branch into external research.
Round 1: "What auth patterns does this project use?"
→ JWT + refresh tokens + Redis blacklist
Round 2 (same conversation): "What's the Redis TTL strategy and failure mode?"
→ Specific TTL, fallback behavior, known edge cases
Round 3 (research branch): notebook doesn't know about clock skew
→ Import external sources, then ask again with enriched context
See all skill details in skills/notebooklm-session-bootstrap/SKILL.md, skills/notebooklm-iterative-research/SKILL.md, skills/notebooklm-research-enrich/SKILL.md, and skills/notebooklm-session-checkpoint/SKILL.md.
skills/youtube-search-transcript/SKILL.md— Find videos, group by topic, capture transcriptsskills/youtube-topic-search/SKILL.md— Lightweight search-and-rank
The home page shows all tracked projects as cards with at-a-glance metrics:
- Source count, session count, checkpoint count per project
- Freshness indicators for last checkpoint and last bootstrap (green/amber/red)
- Alert badges for stale sources or health issues
- Pending webhook count for checkpoints awaiting review
- Create new notebook directly from the dashboard, or track an existing NotebookLM notebook
- Live activity feed at the bottom showing real-time agent events via SSE
Drill into a single project to see:
- Session timeline — visual history of bootstrap queries and checkpoints
- Health alerts — stale sources, outdated checkpoints, missing bootstraps
- Activity log — timestamped events for this project
- Quick actions — run a health scan, dismiss alerts
The original NotebookLM RAG console with four panels:
| Panel | Purpose |
|---|---|
| YouTube Discovery | Search, fetch transcripts, auto-add to notebooks |
| Research Agent | Web/Drive research with fast/deep modes, auto-import |
| Q&A | Ask NotebookLM or OpenAI with inline citation tooltips |
| Notebook | Source management, Studio content generation, artifact download |
Citation tooltips show source title and excerpt on hover for [N] references. The enrichment pipeline recovers missing excerpts via keyword-based claim matching against source fulltext.
Ask a question across multiple notebooks simultaneously:
- Notebook selector split into Tracked Projects (green) and Other Notebooks (blue)
- Select all / clear for quick batch selection
- Concurrent queries — all selected notebooks are queried in parallel
- Per-notebook answer cards with citation tooltips and error handling
- Synthesized answer — OpenAI merges responses into a single coherent answer using human-readable notebook titles (not UUIDs)
- Sync to Notebooks — push a synthesized answer back to each participating notebook as a NotebookLM source, so future single-notebook bootstrap queries surface the cross-project insight
- Past queries saved to the vault and reloadable from history (synced status shown on history items)
Human-in-the-loop approval for agent-submitted checkpoints:
- Pending / Approved / Rejected filter tabs with live counts
- Approve or reject each checkpoint before it lands in NotebookLM
- Cross-project badge — items submitted with
cross_project: trueshow a purple badge indicating they'll be pushed to all tracked project notebooks (or a specified subset) on approval - Expandable payload view with raw JSON and content preview
- Real-time updates via SSE when new webhooks arrive
- Agent integration instructions with a ready-to-paste
curlexample forCLAUDE.md
All dashboard state is stored as markdown files with YAML frontmatter in vault/, organized by collection:
vault/
├── projects/ # One .md per tracked project
├── sessions/ # Session lifecycle records
├── checkpoints/ # Agent checkpoint snapshots
├── activity/ # Daily rolling activity logs (2026-04-12.md)
├── health/ # Health alert records
├── webhooks/ # Webhook queue items
├── cross-qa/ # Cross-notebook query results
└── bootstrap/ # Bootstrap query results (with cross-QA context)
Open vault/ as an Obsidian vault to browse, search, and edit all state with graph view, backlinks, and full-text search. Every file is human-readable and git-friendly.
┌─────────────────────────────────────────────┐
│ React Frontend │
│ ┌──────────┬─────────┬────────┬────────┐ │
User ── Browser ──────────│ │Dashboard │Workspace│Cross-QA│Webhooks│ │
│ └──────────┴─────────┴────────┴────────┘ │
│ React Router + Sidebar │
└──────────────────┬──────────────────────────┘
│ /api/*
┌──────────────────▼──────────────────────────┐
│ FastAPI Backend │
│ ┌────────────────────────────────────────┐ │
│ │ Routers: dashboard, sessions, activity,│ │
│ │ health, webhook, cross_qa, bootstrap │ │
│ │ + main API │ │
│ └───────────────┬────────────────────────┘ │
│ │ │
│ ┌───────────────▼────────────────────────┐ │
│ │ models.py (Pydantic request/response) │ │
│ └───────────────┬────────────────────────┘ │
│ │ │
│ ┌───────────────▼────────────────────────┐ │
│ │ vault.py (Obsidian-compatible storage) │ │
│ │ db.py (SQLite persistence layer) │ │
│ └───────────────┬────────────────────────┘ │
│ │ │
│ ┌───────────────▼────────────────────────┐ │
│ │ references.py (citation pipeline) │ │
│ └────────────────────────────────────────┘ │
└──────────┬──────────────┬───────────────────┘
│ │
notebooklm CLI OpenAI / YouTube APIs
│
Google NotebookLM API
python -m venv env
source env/bin/activate
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000cd frontend
npm install
npm run dev # Dev server on http://localhost:5173 with API proxy
npm run build # Production bundle served by FastAPI at http://localhost:8000Create .env at the project root:
OPENAI_API_KEY=sk-... # Required for synthesis in Cross-Q&A and OpenAI Q&A mode
OPENAI_MODEL=gpt-4o-mini # Optional, defaults to gpt-4o-mini
YOUTUBE_API_KEY=AIza... # Required for YouTube Discovery panel
The notebooklm CLI must be installed separately and on PATH (it is not included in requirements.txt):
pip install notebooklm-py
notebooklm auth login| Method | Path | Purpose |
|---|---|---|
GET |
/api/dashboard/projects |
List all tracked projects |
POST |
/api/dashboard/projects |
Track a project {title, notebook_id} |
GET |
/api/dashboard/projects/:id |
Get project detail with computed metrics |
PUT |
/api/dashboard/projects/:id |
Update project metadata |
DELETE |
/api/dashboard/projects/:id |
Untrack a project |
| Method | Path | Purpose |
|---|---|---|
POST |
/api/sessions |
Start a session {project_id, bootstrap_query?} |
PUT |
/api/sessions/:id |
End a session {bootstrap_answer?} |
GET |
/api/sessions?project_id= |
List sessions for a project |
GET |
/api/sessions/:id/timeline |
Get session detail with checkpoints |
| Method | Path | Purpose |
|---|---|---|
GET |
/api/activity |
List recent activity events |
POST |
/api/activity |
Create an activity event |
GET |
/api/activity/stream |
SSE stream of live events |
| Method | Path | Purpose |
|---|---|---|
GET |
/api/health/alerts?project_id= |
List health alerts |
POST |
/api/health/alerts/scan?project_id= |
Run a health scan on a project |
PUT |
/api/health/alerts/:id/dismiss |
Dismiss an alert |
DELETE |
/api/health/alerts/:id |
Delete an alert |
| Method | Path | Purpose |
|---|---|---|
POST |
/api/bootstrap |
Query project notebook + auto-trigger cross-QA {project_id, question, auto_cross_qa?} |
| Method | Path | Purpose |
|---|---|---|
POST |
/api/cross-qa |
Query multiple notebooks {question, notebook_ids, notebook_titles?, synthesize?, sync_to_notebooks?} |
GET |
/api/cross-qa |
List past cross-notebook queries |
GET |
/api/cross-qa/:id |
Get a single query result |
POST |
/api/cross-qa/:id/sync |
Push synthesized answer back to each participating notebook as a source |
| Method | Path | Purpose |
|---|---|---|
POST |
/api/webhook/ingest |
Agent submits a checkpoint for review {project_id, agent_id?, action, payload, cross_project?, target_notebook_ids?} |
GET |
/api/webhook/queue?status= |
List webhook items by status |
PUT |
/api/webhook/queue/:id/approve |
Approve a pending item (cross-project items push to all tracked notebooks) |
PUT |
/api/webhook/queue/:id/reject |
Reject a pending item |
GET |
/api/webhook/stream |
SSE stream of new webhook items |
| Method | Path | Purpose |
|---|---|---|
GET |
/api/notebooklm/notebooks |
List all NotebookLM notebooks |
POST |
/api/notebooklm/notebooks |
Create a notebook {title} |
GET |
/api/notebooklm/notebooks/:id/sources |
List sources for a notebook |
POST |
/api/notebooklm/sources |
Add a source {notebookId, url, title?} |
POST |
/api/notebooklm/ask |
Ask a question {notebookId, question, sourceIds?, conversationId?} |
POST |
/api/openai/ask |
OpenAI agent with NotebookLM grounding and conversation continuity |
POST |
/api/notebooklm/research |
Start a research task |
GET |
/api/notebooklm/research/status |
Poll research progress ?notebookId= |
POST |
/api/notebooklm/research/wait |
Wait for research completion and optionally import results |
POST |
/api/notebooklm/generate |
Generate a Studio artifact |
GET |
/api/notebooklm/artifacts |
List artifacts for a notebook |
GET |
/api/notebooklm/artifacts/:id |
Get a single artifact |
POST |
/api/notebooklm/artifacts/:id/wait |
Poll until artifact generation completes |
POST |
/api/notebooklm/download |
Download an artifact |
GET |
/api/notebooklm/download/file |
Serve a downloaded artifact file |
| Method | Path | Purpose |
|---|---|---|
POST |
/api/youtube/search |
Search YouTube {query, limit, skill} |
POST |
/api/youtube/transcript |
Fetch transcript {videoId, notebookId?, skill} |
| Method | Path | Purpose |
|---|---|---|
GET |
/healthz |
Health check |
- CLI:
notebooklm-py(v0.3.4+) — must be installed separately and on PATH - Backend: FastAPI, uvicorn, python-dotenv, pyyaml, openai, httpx, aiosqlite, google-api-python-client, google-auth-oauthlib, youtube-transcript-api
- Frontend: React 18, React Router, Vite, Tailwind CSS, classnames, react-markdown, recharts, remark-gfm
- External APIs: Google NotebookLM (via CLI), OpenAI, YouTube Data API v3