Skip to content

ngstcf/engram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Engram — Agent Memory Dashboard for NotebookLM

A FastAPI + React dashboard that turns NotebookLM into persistent, structured memory for AI coding agents. Track multiple projects, review agent checkpoints, query across notebooks, and browse all state as Obsidian-compatible markdown.

Blog post: The Missing Layer Between AI Coding Agents and Institutional Knowledge

The Problem: AI Coding Agents Are Amnesiac

AI coding agents like Claude Code, Codex, Kilo Code, and OpenCode start every session from scratch. They re-read files, re-discover architecture, and re-learn conventions. Hard-won debugging insights, design rationale, dependency gotchas, and domain context evaporate when a session ends.

NotebookLM as persistent memory solves this. Instead of starting cold, the agent queries a curated notebook for project context, team conventions, and past discoveries — then writes back what it learns during the session so the next one starts smarter.

Session 1                          Session 2                          Session 3
┌───────────┐                      ┌───────────┐                       ┌───────────┐
│ Agent     │──checkpoint──┐       │ Agent     │──checkpoint──┐        │ Agent     │
│ discovers │              │       │ starts    │              │        │ starts    │
│ gotcha X  │              ▼       │ knowing X │              ▼        │ knowing   │
│           │        ┌──────────┐  │ discovers │         ┌──────────┐  │ X, Y, and │
└───────────┘        │NotebookLM│  │ gotcha Y  │         │NotebookLM│  │ external  │
                     │ Project  │  └───────────┘         │ Project  │  │ research  │
                     │ Memory   │                        │ Memory   │  └───────────┘
                     └──────────┘                        └──────────┘
                          ▲ bootstrap                        ▲ bootstrap
                          │                                  │
                     Session 2 starts here              Session 3 starts here

Why This Matters for Agentic Coding

The hidden cost of stateless agents

AI coding agents are powerful but amnesiac. Every session starts from zero — the agent re-reads files, re-discovers architecture, re-learns conventions. This isn't just inefficient; it's compounding waste. Each session that discovers something non-obvious and then discards it forces the next session to rediscover it, or worse, to make assumptions and get it wrong.

The real cost isn't the wasted tokens or the extra minutes. It's the decisions the agent makes without context it should have had:

  • It proposes an approach you rejected last week for reasons that aren't in the code
  • It upgrades a dependency that was pinned for a reason no comment explains
  • It re-investigates a bug that was already root-caused in a previous session
  • It doesn't know that the staging environment has a 30-second query timeout that will break the migration it's writing

These aren't edge cases. They're the normal experience of using coding agents on real projects over time. The agent is smart enough to do the work — it just doesn't remember what it learned yesterday.

Why not just use CLAUDE.md or context files?

Flat files work for static facts — coding conventions, build commands, directory layout. But they break down for accumulated knowledge:

Approach Works for Breaks when
CLAUDE.md Static conventions, build commands Knowledge grows beyond what fits in a prompt; no retrieval, agent reads everything every time
.context/ folders Per-directory notes No search across files; agent must know which file to read; no citation or sourcing
Vector DB / RAG Semantic search over documents Requires infrastructure; no built-in grounding or source verification; cold-start problem
NotebookLM Grounded retrieval with citations over curated sources Needs a management layer (that's what Engram provides)

NotebookLM gives you retrieval-augmented generation with source grounding and citations — out of the box, no infrastructure. Every answer traces back to specific sources the agent (or you) added. When the agent says "the payment service returns 202 for async operations," the citation tells you where that claim comes from, so you can judge whether to trust it.

Offloading work to Google

There's a less obvious but significant advantage: NotebookLM offloads both storage and analysis to Google's infrastructure. The coding agent doesn't process, index, or store your project knowledge — Google does. This changes the economics of agent memory:

Token savings. Without external memory, the agent's only option is to stuff context into its own prompt — pasting CLAUDE.md files, reading documentation, re-analyzing code on every session. That context costs tokens. With NotebookLM, the agent sends a short natural-language query and gets back a focused, grounded answer. A bootstrap query that returns a 200-word answer replaces what might otherwise be 5,000+ tokens of raw context files loaded into every session.

Analysis happens outside the agent's context window. When you add a 50-page design doc or a long Slack thread as a NotebookLM source, Google indexes and chunks it. The agent never sees the raw document — it sees NotebookLM's synthesized answer with citations. The heavy lifting of reading, understanding, and retrieving from large documents happens on Google's side, not inside your Claude or GPT session.

Research doesn't burn agent tokens. When the agent triggers external research (web search, Drive search), NotebookLM's research agent does the crawling, evaluating, and importing. The coding agent fires a single API call and gets the results. Compare this to an agent that tries to research by reading web pages directly — each page consumes context window, and the agent must decide what's relevant in real-time.

Storage is free and unlimited. NotebookLM notebooks have no meaningful source limit for this use case. You're not paying per-vector or per-document like you would with Pinecone, Weaviate, or any hosted vector DB. The knowledge base grows over time at zero marginal cost.

The net effect: the coding agent stays focused on what it's good at — reading your code, reasoning about changes, writing implementations — while Google handles the memory infrastructure. Each agent session is shorter, uses fewer tokens, and starts with better context.

Example: a real project over three sessions

Consider a team building a payments service. Here's what happens across sessions with Engram:

Session 1 — Initial implementation

The agent is asked to implement Stripe webhook handling. It bootstraps from the notebook but finds nothing — the notebook is empty. It writes the webhook handler, discovers that Stripe's signature verification has a 300-second clock tolerance, and that the staging environment's NTP sync drifts by up to 10 seconds. Before the session ends, it submits two checkpoints to the webhook queue:

"Stripe webhook signature verification tolerates 300s clock skew by default. Staging NTP drifts ~10s. This hasn't caused issues yet but will if we tighten the tolerance window. See Stripe docs section on webhook-signatures."

"The webhook handler parses event types from event.type and routes to per-type handler functions."

You approve the first — it's a genuine operational insight that isn't in any config file or code comment. You reject the second — that's just how the code works, and any future agent can see it by reading the handler. The queue keeps the notebook curated.

Session 2 — Adding idempotency (two weeks later)

A different agent session is asked to add idempotency keys to payment retries. It bootstraps:

"I'm about to add idempotency to payment retries. What should I know?"

NotebookLM returns the architecture source you seeded (which describes the event-driven payment flow and its at-least-once delivery guarantee). The answer covers the general retry design but lacks specifics. The agent asks a follow-up in the same conversation:

"What failure modes should I expect when Stripe's internal retry overlaps with our retry logic? Are there known race conditions?"

NotebookLM has a partial answer from the architecture doc — the system is designed to tolerate duplicate deliveries — but doesn't know about Stripe's specific idempotency key behavior. The agent triggers external research, importing Stripe's idempotency key documentation. Back in the same conversation with the enriched context:

"The imported docs mention idempotency keys expire after 24 hours. Does our retry window ever exceed that?"

NotebookLM connects this to the architecture source: the retry policy allows up to 72 hours for failed charge attempts. The agent now knows there's a real gap — retries after 24 hours would generate new charges instead of being deduplicated. It implements with key-rotation logic and checkpoints:

"Stripe idempotency keys expire after 24h but our retry window extends to 72h. Retries after key expiration silently create duplicate charges. Added key-rotation with generation tracking. See Stripe docs idempotency-keys#expiration."

Session 3 — Debugging a production issue (a month later)

A webhook starts failing intermittently in staging. A new agent session bootstraps and immediately gets the clock skew insight from Session 1, the idempotency-expiration gap from Session 2, and the Stripe docs imported during research. The agent could work backwards from git blame on the tolerance config, but it doesn't have to — the Session 1 checkpoint points directly at NTP drift as the likely cause. It checks the tolerance window first and finds that someone tightened it to 30 seconds without knowing about the staging drift.

Three sessions. Each one started smarter than the last. The third session solved in minutes what would have been a longer investigation — not because the answer was unfindable without memory, but because memory told it exactly where to look first.

The session lifecycle

Engram defines four skills that turn each coding session into a learning loop:

┌─────────────────────────────────────────────────────────────────────┐
│                        Agent Session                                │
│                                                                     │
│  1. Bootstrap          2. Iterative Research     3. Work            │
│  ┌──────────────┐      ┌──────────────────┐      ┌──────────────┐   │
│  │ Query the    │─────▶│ Follow-up Qs in  │─────▶│ Write code   │   │
│  │ notebook:    │      │ same conversation│      │ with full    │   │
│  │ "What should │      │ until you have   │      │ context      │   │
│  │  I know?"    │      │ implementation   │      │              │   │
│  └──────────────┘      │ detail           │      └──────┬───────┘   │
│                        │                  │             │           │
│                        │ If notebook      │      4. Checkpoint      │
│                        │ lacks answer:    │      ┌──────▼───────┐   │
│                        │ → External       │      │ Write back   │   │
│                        │   research       │      │ discoveries  │   │
│                        └──────────────────┘      │ for next     │   │
│                                                  │ session      │   │
│                                                  └──────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Bootstrap recovers context so the agent doesn't start cold. Iterative research digs deeper when the first answer isn't enough — follow-up questions in the same conversation, with gap detection for undefined references, missing implementation details, and uncovered edge cases. External research imports knowledge the codebase doesn't contain — docs, articles, internal specs. Checkpoints capture what the agent learned (debugging dead-ends, implicit coupling, performance constraints) so the next session starts smarter.

What belongs in memory vs. what doesn't

The test is simple: if a fresh git clone would give the agent the information, it doesn't belong in memory. The agent can read your source code, your config files, your tests. Duplicating that into NotebookLM just adds noise and makes bootstrap answers longer without making them smarter.

Memory is for the knowledge that lives between the lines of code — the things you'd tell a new teammate over coffee that they'd never figure out from reading the repo:

The agent can get this from the code The agent needs this from memory
You use event sourcing You chose event sourcing because legal required full audit trails — CRUD was explicitly rejected
pg is pinned at v8.11 v8.12 broke connection pooling with your PgBouncer config — three engineers spent a day on it
There's a 30s timeout in the retry logic That timeout exists because the payment provider's API sometimes hangs for 20s under load, and 30s was chosen after a production incident
The auth test is flaky The flakiness is timezone-dependent, not a race condition — two people already wasted time investigating concurrency
The /api/search endpoint exists It's on the homepage critical path with a 200ms p99 budget — adding a database query requires a feature flag
There's a notification service It imports billing.models.InvoiceLineItem directly — there's no test that catches schema changes breaking notifications

The left column is what grep and cat give you. The right column is what only humans know — the why, the what went wrong, the don't touch this because. That's what makes the difference between an agent that writes correct code and an agent that writes code that survives contact with production.

Getting this wrong in either direction hurts. Put too much in memory (code structure, file listings, API signatures) and bootstrap answers become bloated — the agent spends tokens reading things it could have read from the repo. Put too little in memory and you're back to the amnesia problem. The sweet spot is knowledge that's high-value, non-obvious, and impossible to derive from the codebase alone.

Human-in-the-loop

Agents make mistakes. They may checkpoint something inaccurate, irrelevant, or obvious. The webhook queue lets you review every checkpoint before it enters the knowledge base — approve insights that are genuinely useful, reject noise. This keeps the notebook curated rather than filled with session dumps, and means you can trust the bootstrap answers that future sessions receive.


Use Cases

The rationale above focuses on agent memory — the core use case. But NotebookLM as a coding knowledge layer enables several other workflows:

Zero-hallucination technical research

General LLMs hallucinate function names and invent API parameters — especially for new or rapidly-changing libraries with limited training data. Upload official docs for a new SDK or internal API and the coding agent gets citation-backed answers instead of plausible fiction. When adopting unfamiliar tools, the agent queries NotebookLM for correct function signatures and parameter names rather than guessing from outdated training data.

Wired into the coding agent as a skill or tool, this becomes seamless: the agent queries NotebookLM mid-session — "What parameters does this endpoint accept?" — and gets a source-grounded answer without the developer leaving their editor.

Architecture decision records

One high-value memory pattern: storing ADRs as notebook sources. ADRs capture what was rejected and why — the knowledge most likely to be invisible in code. Without them, the agent re-proposes approaches your team already discarded for reasons no grep will reveal (legal constraints, failed performance tests, vendor limitations). With them, every bootstrap query includes the decision history before the agent writes a single line.

Codebase comprehension for vibe-coding

"Vibe-coding" — generating large amounts of code quickly with AI — creates a specific problem: the codebase grows faster than the developer's mental model. You end up with a working app but can't explain how half the files connect.

NotebookLM closes the gap: have the agent generate an architecture doc describing the project's logic, upload it as a source, then query it to trace file interactions and data flow. Tools like uithub can convert an entire GitHub repository into a single markdown document for upload, enabling architecture summaries and code review prep on unfamiliar codebases.

Automated technical documentation

NotebookLM's Studio generation turns project knowledge into deliverables without manual writing:

  • Technical manuals — Ingest a repository's sources and generate structured documentation in a single pass. A medium-sized repo can produce a comprehensive technical manual from one generation.
  • Technical editing — Feed a draft user guide back into NotebookLM and instruct it to cross-reference against original sources, catching inaccuracies, outdated references, and gaps.
  • Stakeholder reporting — Generate slide decks, audio overviews, or infographics that explain technical decisions and project status to non-technical audiences. The Audio Overview feature turns notebook contents into a conversational podcast-style summary.

Accessible through the Workspace's Notebook panel (Studio content generation and artifact download).

Enforcing project standards

Notebooks can encode active project rules — deprecated modules, naming conventions, where specific logic must live. Unlike a flat CLAUDE.md with 200 rules that the agent reads in full every session, a notebook lets the agent query "what rules apply to payment handler modules?" and retrieve only the relevant subset with source citations. For large projects, this targeted retrieval scales better than static context files.


Memory Evolution: From Project Memory to Organizational Knowledge

Per-project memory — the bootstrap/checkpoint loop described above — solves the amnesia problem for individual projects. But real engineering organizations don't have isolated projects. They have systems that interact, infrastructure that's shared, and conventions that span teams. A debugging insight from the payments service is relevant when the orders service hits the same symptom. The auth team's session-token gotchas matter when the mobile team implements token refresh.

Cross-notebook Q&A addresses this by querying multiple project notebooks in parallel. But a manual cross-notebook query is a point-in-time action — the insight it surfaces doesn't flow back into the individual project notebooks. The next session on the orders service still starts cold unless someone remembers to run the same cross-QA query again.

Memory evolution closes this loop. The term comes from A-MEM (Xu et al., 2025), an agentic memory system where integrating a new memory triggers updates to existing memories' context and links. The same principle applies here: a cross-QA result that synthesizes knowledge from three projects should become a source that enriches each of those three project notebooks, so future single-notebook bootstrap queries surface the cross-project insight without requiring a fresh cross-QA run.

Research supports this. Agent KB (2025) demonstrated that cross-domain knowledge reuse improved SWE-bench code repair resolution rates by 12 percentage points through a Reason-Retrieve-Refine pipeline. The Anthropic 2026 Agentic Coding Trends Report identifies memory as a top concern for production agentic workflows. And the growing demand for multi-agent coordination across codebases — where the developer becomes "a slow, error-prone message broker between two AI agents" — makes organizational memory infrastructure a prerequisite for scaling agentic coding beyond single repos.

When cross-project knowledge is needed

There are seven trigger scenarios, mapped to where they occur in the agent session lifecycle:

At bootstrap (session start)

  1. Shared infrastructure & environment knowledge. Every project deployed to the same staging/prod environment shares gotchas that no single repo contains — NTP drift, database timeouts, load balancer quirks. Trigger: agent starts a session on any project that touches shared infra.

  2. Organizational conventions that span projects. PR conventions, testing philosophy ("never mock the database"), review expectations. These aren't per-project — they're team-level. Trigger: every session start.

Pre-implementation

  1. Cross-service interface awareness. When an agent on Service A changes an API contract, the agent on Service B doesn't know. Trigger: agent is about to modify or consume an API endpoint, event schema, or shared data model.

  2. Pattern transfer — reusing proven solutions. When an agent encounters a retry pattern, caching strategy, or auth flow, it retrieves how other projects already solved the same problem — including what didn't work. Trigger: agent is implementing a common pattern that another project has already solved.

During debugging

  1. Cross-cutting root cause knowledge. A root cause discovered in one project (clock skew in staging, a library bug, a vendor API quirk) applies to every project in the same environment. Trigger: agent is debugging a symptom that could have an infrastructure-level or dependency-level root cause.

Pre-migration / pre-upgrade

  1. Learning from other projects' migrations. When one team already migrated from Express to Fastify, their pitfalls are invaluable when the next team attempts the same migration. Trigger: agent is about to do a migration or major upgrade that another project has already attempted.

At checkpoint time (proactive push)

  1. Detecting cross-project relevance at write time. When an agent discovers something about shared infrastructure, a shared library, or a vendor API, the system should push that insight to every affected project's notebook — not just the one that discovered it. Trigger: agent checkpoints a discovery that involves shared dependencies, infrastructure, or organizational processes.

How Engram implements memory evolution

These scenarios cluster into two retrieval patterns:

Pattern Scenarios Implementation
On-demand — relevant only when the task overlaps 3 (interfaces), 4 (patterns), 6 (migrations) Auto-triggered cross-QA at bootstrap when keyword triggers match the question
Broadcast — every project should see it 1 (infra), 2 (conventions), 5 (cross-cutting bugs) Cross-QA sync pushes synthesized answers back to each notebook as a source; cross-project checkpoints are pushed to all tracked notebooks on approval

Three mechanisms work together:

  1. Bootstrap with auto-trigger (POST /api/bootstrap). When an agent starts a session, the bootstrap endpoint queries the project's own notebook and detects whether the question touches cross-project concerns (infrastructure, shared dependencies, patterns, debugging, conventions). If triggers match, it automatically queries other tracked project notebooks and synthesizes the combined context. The agent gets both project-specific and cross-project knowledge in a single call.

  2. Cross-QA sync (POST /api/cross-qa/{id}/sync). After a cross-notebook query produces a synthesized answer, syncing pushes that answer back to each participating notebook as a NotebookLM source. Future single-notebook bootstrap queries on any of those projects will surface the cross-project insight without requiring a fresh cross-QA run. The cross-QA result evolves from a point-in-time answer into persistent organizational knowledge.

  3. Cross-project checkpoints (cross_project: true on webhook ingest). When an agent discovers something with cross-project implications — a shared library gotcha, an infrastructure quirk, a vendor API behavior — it submits the checkpoint with cross_project: true. On approval, the checkpoint is pushed to all tracked project notebooks (or a specified subset), not just the originating project. Every project benefits from the discovery.


Setting Up CLAUDE.md for Any Project

Add a CLAUDE.md to your project root so Claude Code automatically uses NotebookLM memory. The four skills define a session lifecycle: bootstrap → iterative research → work → checkpoint.

# CLAUDE.md

## NotebookLM Memory

- **Project ID**: <ENGRAM_PROJECT_ID>  (from the dashboard)
- **Notebook ID**: <PASTE_ID_HERE>
- **Engram URL**: http://localhost:8000

### 1. Bootstrap — before starting work
Query the project notebook and automatically pull cross-project knowledge when relevant:

curl -s -X POST http://localhost:8000/api/bootstrap \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "<PROJECT_ID>",
    "question": "I am about to [TASK]. What architectural decisions, gotchas, conventions, and recent changes should I know?",
    "auto_cross_qa": true
  }'

The bootstrap endpoint queries your project's notebook and, when the question touches
shared concerns (infrastructure, dependencies, debugging patterns, conventions), automatically
queries other tracked project notebooks and synthesizes the combined context.

Save the response's `conversation_id` for follow-ups.

### 2. Iterative research — when one answer isn't enough
Use the same conversation to ask follow-ups that build on previous answers:

notebooklm ask "You mentioned [concept]. What are the specific implementation details?" --notebook <ID> -c <CONVERSATION_ID> --json

After each answer, check:
- Are there concepts mentioned but not explained?
- Do you have enough detail to implement, or are you guessing?
- Were edge cases or failure modes covered?

### 3. External research — when the notebook lacks the answer
If a follow-up reveals a knowledge gap the notebook can't fill:

notebooklm source add-research "<specific query>" --notebook <ID> --from web --mode fast
notebooklm research wait --notebook <ID> --import-all

Then ask again in the same conversation to get enriched answers.

### 4. Checkpoint — when you discover something non-obvious
Submit discoveries for human review via the webhook queue:

curl -X POST http://localhost:8000/api/webhook/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: ${ENGRAM_WEBHOOK_KEY}" \
  -d '{
    "project_id": "<PROJECT_ID>",
    "agent_id": "claude-code",
    "action": "checkpoint",
    "payload": {
      "title": "YYYY-MM-DD — Category — Topic",
      "category": "debugging",
      "content": "The non-obvious thing future sessions need to know..."
    },
    "cross_project": false
  }'

Set `"cross_project": true` when the discovery involves shared infrastructure, a shared
library, or a vendor API behavior that affects multiple projects. On approval, the checkpoint
will be pushed to all tracked project notebooks — not just the originating project.
Optionally set `"target_notebook_ids": ["id1", "id2"]` to limit which notebooks receive it.

Or write directly to the notebook (bypasses review):

Write a short summary to /tmp/checkpoint.md, then:
notebooklm source add /tmp/checkpoint.md --notebook <ID> --title "YYYY-MM-DD — Category — Topic"

Skill Catalog

NotebookLM Memory Skills

Four skills for the agent session lifecycle:

Skill Phase What it does
Session Bootstrap Session start Query NotebookLM to recover project context, gotchas, decisions
Iterative Research Before implementation Multi-round follow-up questions with gap detection to get implementation-level detail
Research & Enrich Knowledge gap Use NotebookLM's research agent to import external sources
Session Checkpoint During/after work Write non-obvious discoveries back to NotebookLM

Iterative Research (skills/notebooklm-iterative-research/SKILL.md)

Agents ask follow-up questions in the same conversation, building on each previous answer. The /api/notebooklm/ask endpoint accepts a conversationId parameter so each round has full context of what was already discussed. The skill includes gap detection patterns — checking after each answer for undefined references, missing implementation detail, uncovered edge cases, and cross-domain gaps — and defines when to branch into external research.

Round 1: "What auth patterns does this project use?"
  → JWT + refresh tokens + Redis blacklist

Round 2 (same conversation): "What's the Redis TTL strategy and failure mode?"
  → Specific TTL, fallback behavior, known edge cases

Round 3 (research branch): notebook doesn't know about clock skew
  → Import external sources, then ask again with enriched context

See all skill details in skills/notebooklm-session-bootstrap/SKILL.md, skills/notebooklm-iterative-research/SKILL.md, skills/notebooklm-research-enrich/SKILL.md, and skills/notebooklm-session-checkpoint/SKILL.md.

YouTube Discovery Skills

  • skills/youtube-search-transcript/SKILL.md — Find videos, group by topic, capture transcripts
  • skills/youtube-topic-search/SKILL.md — Lightweight search-and-rank

Dashboard Features

Multi-Project Dashboard (/dashboard)

The home page shows all tracked projects as cards with at-a-glance metrics:

  • Source count, session count, checkpoint count per project
  • Freshness indicators for last checkpoint and last bootstrap (green/amber/red)
  • Alert badges for stale sources or health issues
  • Pending webhook count for checkpoints awaiting review
  • Create new notebook directly from the dashboard, or track an existing NotebookLM notebook
  • Live activity feed at the bottom showing real-time agent events via SSE

Project Detail (/project/:id)

Drill into a single project to see:

  • Session timeline — visual history of bootstrap queries and checkpoints
  • Health alerts — stale sources, outdated checkpoints, missing bootstraps
  • Activity log — timestamped events for this project
  • Quick actions — run a health scan, dismiss alerts

Workspace (/workspace)

The original NotebookLM RAG console with four panels:

Panel Purpose
YouTube Discovery Search, fetch transcripts, auto-add to notebooks
Research Agent Web/Drive research with fast/deep modes, auto-import
Q&A Ask NotebookLM or OpenAI with inline citation tooltips
Notebook Source management, Studio content generation, artifact download

Citation tooltips show source title and excerpt on hover for [N] references. The enrichment pipeline recovers missing excerpts via keyword-based claim matching against source fulltext.

Cross-Notebook Q&A (/cross-qa)

Ask a question across multiple notebooks simultaneously:

  • Notebook selector split into Tracked Projects (green) and Other Notebooks (blue)
  • Select all / clear for quick batch selection
  • Concurrent queries — all selected notebooks are queried in parallel
  • Per-notebook answer cards with citation tooltips and error handling
  • Synthesized answer — OpenAI merges responses into a single coherent answer using human-readable notebook titles (not UUIDs)
  • Sync to Notebooks — push a synthesized answer back to each participating notebook as a NotebookLM source, so future single-notebook bootstrap queries surface the cross-project insight
  • Past queries saved to the vault and reloadable from history (synced status shown on history items)

Webhook Queue (/webhooks)

Human-in-the-loop approval for agent-submitted checkpoints:

  • Pending / Approved / Rejected filter tabs with live counts
  • Approve or reject each checkpoint before it lands in NotebookLM
  • Cross-project badge — items submitted with cross_project: true show a purple badge indicating they'll be pushed to all tracked project notebooks (or a specified subset) on approval
  • Expandable payload view with raw JSON and content preview
  • Real-time updates via SSE when new webhooks arrive
  • Agent integration instructions with a ready-to-paste curl example for CLAUDE.md

Storage: Obsidian Vault

All dashboard state is stored as markdown files with YAML frontmatter in vault/, organized by collection:

vault/
├── projects/        # One .md per tracked project
├── sessions/        # Session lifecycle records
├── checkpoints/     # Agent checkpoint snapshots
├── activity/        # Daily rolling activity logs (2026-04-12.md)
├── health/          # Health alert records
├── webhooks/        # Webhook queue items
├── cross-qa/        # Cross-notebook query results
└── bootstrap/       # Bootstrap query results (with cross-QA context)

Open vault/ as an Obsidian vault to browse, search, and edit all state with graph view, backlinks, and full-text search. Every file is human-readable and git-friendly.


Architecture

                          ┌─────────────────────────────────────────────┐
                          │              React Frontend                 │
                          │  ┌──────────┬─────────┬────────┬────────┐   │
User ── Browser ──────────│  │Dashboard │Workspace│Cross-QA│Webhooks│   │
                          │  └──────────┴─────────┴────────┴────────┘   │
                          │         React Router + Sidebar              │
                          └──────────────────┬──────────────────────────┘
                                             │ /api/*
                          ┌──────────────────▼──────────────────────────┐
                          │              FastAPI Backend                │
                          │  ┌────────────────────────────────────────┐ │
                          │  │ Routers: dashboard, sessions, activity,│ │
                          │  │ health, webhook, cross_qa, bootstrap   │ │
                          │  │ + main API                             │ │
                          │  └───────────────┬────────────────────────┘ │
                          │                  │                          │
                          │  ┌───────────────▼────────────────────────┐ │
                          │  │ models.py (Pydantic request/response)  │ │
                          │  └───────────────┬────────────────────────┘ │
                          │                  │                          │
                          │  ┌───────────────▼────────────────────────┐ │
                          │  │ vault.py (Obsidian-compatible storage) │ │
                          │  │ db.py (SQLite persistence layer)       │ │
                          │  └───────────────┬────────────────────────┘ │
                          │                  │                          │
                          │  ┌───────────────▼────────────────────────┐ │
                          │  │ references.py (citation pipeline)      │ │
                          │  └────────────────────────────────────────┘ │
                          └──────────┬──────────────┬───────────────────┘
                                     │              │
                          notebooklm CLI    OpenAI / YouTube APIs
                                     │
                          Google NotebookLM API

Quick Start

Backend

python -m venv env
source env/bin/activate
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev        # Dev server on http://localhost:5173 with API proxy
npm run build      # Production bundle served by FastAPI at http://localhost:8000

Environment

Create .env at the project root:

OPENAI_API_KEY=sk-...        # Required for synthesis in Cross-Q&A and OpenAI Q&A mode
OPENAI_MODEL=gpt-4o-mini     # Optional, defaults to gpt-4o-mini
YOUTUBE_API_KEY=AIza...       # Required for YouTube Discovery panel

The notebooklm CLI must be installed separately and on PATH (it is not included in requirements.txt):

pip install notebooklm-py
notebooklm auth login

API Endpoints

Dashboard & Project Management

Method Path Purpose
GET /api/dashboard/projects List all tracked projects
POST /api/dashboard/projects Track a project {title, notebook_id}
GET /api/dashboard/projects/:id Get project detail with computed metrics
PUT /api/dashboard/projects/:id Update project metadata
DELETE /api/dashboard/projects/:id Untrack a project

Sessions

Method Path Purpose
POST /api/sessions Start a session {project_id, bootstrap_query?}
PUT /api/sessions/:id End a session {bootstrap_answer?}
GET /api/sessions?project_id= List sessions for a project
GET /api/sessions/:id/timeline Get session detail with checkpoints

Activity

Method Path Purpose
GET /api/activity List recent activity events
POST /api/activity Create an activity event
GET /api/activity/stream SSE stream of live events

Health

Method Path Purpose
GET /api/health/alerts?project_id= List health alerts
POST /api/health/alerts/scan?project_id= Run a health scan on a project
PUT /api/health/alerts/:id/dismiss Dismiss an alert
DELETE /api/health/alerts/:id Delete an alert

Bootstrap

Method Path Purpose
POST /api/bootstrap Query project notebook + auto-trigger cross-QA {project_id, question, auto_cross_qa?}

Cross-Notebook Q&A

Method Path Purpose
POST /api/cross-qa Query multiple notebooks {question, notebook_ids, notebook_titles?, synthesize?, sync_to_notebooks?}
GET /api/cross-qa List past cross-notebook queries
GET /api/cross-qa/:id Get a single query result
POST /api/cross-qa/:id/sync Push synthesized answer back to each participating notebook as a source

Webhook Queue

Method Path Purpose
POST /api/webhook/ingest Agent submits a checkpoint for review {project_id, agent_id?, action, payload, cross_project?, target_notebook_ids?}
GET /api/webhook/queue?status= List webhook items by status
PUT /api/webhook/queue/:id/approve Approve a pending item (cross-project items push to all tracked notebooks)
PUT /api/webhook/queue/:id/reject Reject a pending item
GET /api/webhook/stream SSE stream of new webhook items

NotebookLM (proxied CLI)

Method Path Purpose
GET /api/notebooklm/notebooks List all NotebookLM notebooks
POST /api/notebooklm/notebooks Create a notebook {title}
GET /api/notebooklm/notebooks/:id/sources List sources for a notebook
POST /api/notebooklm/sources Add a source {notebookId, url, title?}
POST /api/notebooklm/ask Ask a question {notebookId, question, sourceIds?, conversationId?}
POST /api/openai/ask OpenAI agent with NotebookLM grounding and conversation continuity
POST /api/notebooklm/research Start a research task
GET /api/notebooklm/research/status Poll research progress ?notebookId=
POST /api/notebooklm/research/wait Wait for research completion and optionally import results
POST /api/notebooklm/generate Generate a Studio artifact
GET /api/notebooklm/artifacts List artifacts for a notebook
GET /api/notebooklm/artifacts/:id Get a single artifact
POST /api/notebooklm/artifacts/:id/wait Poll until artifact generation completes
POST /api/notebooklm/download Download an artifact
GET /api/notebooklm/download/file Serve a downloaded artifact file

YouTube

Method Path Purpose
POST /api/youtube/search Search YouTube {query, limit, skill}
POST /api/youtube/transcript Fetch transcript {videoId, notebookId?, skill}

System

Method Path Purpose
GET /healthz Health check

Dependencies

  • CLI: notebooklm-py (v0.3.4+) — must be installed separately and on PATH
  • Backend: FastAPI, uvicorn, python-dotenv, pyyaml, openai, httpx, aiosqlite, google-api-python-client, google-auth-oauthlib, youtube-transcript-api
  • Frontend: React 18, React Router, Vite, Tailwind CSS, classnames, react-markdown, recharts, remark-gfm
  • External APIs: Google NotebookLM (via CLI), OpenAI, YouTube Data API v3

About

Agent memory dashboard for AI coding agents. Turns NotebookLM into persistent, structured memory across sessions—track projects, bootstrap context, checkpoint discoveries, review submissions, and query across notebooks. React + FastAPI with Obsidian‑compatible storage, cross‑notebook Q&A, webhooks, and a content studio.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors