🎨 Flipbook Canvas

English · 中文

✨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge — one click at a time.

💡 Inspired by and a re-implementation of the product idea behind flipbook.page — credit to the original team for the click-to-explore canvas concept.

A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.

✨ Why this is fun

Most "AI画图" demos stop at one image. This one turns each image into a playable knowledge surface:

🖱️ Long-press anywhere on a picture → the model reads what's under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
📚 Encyclopedia-style output — every node ships with a 150–220-char caption and 20–40 in-image labels (place names, dates, numbers…), all OCR'd back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
🌳 Infinite tree of canvases — every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.

📸 Screenshots

_{Click-to-explore — long-press any region to drill in}	_{End-to-end pipeline — search → planner → ImageGen → drill-down}
_{Gallery + canvas — every canvas is persisted, shareable, replayable}

🚀 Highlights

🖱️ Click-to-explore: long-press (2 s) anywhere on a node's image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
⚡ Per-node parallelism: up to 4 different spots in parallel per parent (configurable). Each in-flight click streams a phase chip (Inferring label… → Searching the web… → Generating image…) on the hotspot. Hit the cap and the cursor turns into ⌛.
📖 Encyclopedia register: planner produces 150–220 char captions with 20–40 in-image text fragments — like reading a richly annotated diagram in a children's encyclopedia.
🌐 Web-search augmented: a "decide-then-search" gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a 📚 hover badge over the canvas.
🎬 Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
🔗 Share as preview: any canvas → read-only ?s=<token> URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones.
📺 Fullscreen casting: ⛶ requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
🔤 Selectable in-image text: every label baked into the diagram is OCR'd with Apple Vision (zh-Hans + en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth.
📱 Mobile responsive: top bar collapses to icons, single-column gallery, smaller hotspots and pending bubbles.

🤖 Multimodal × Mainstream LLMs

Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:

Modality	What it does	Pluggable into
📝 Text / JSON LLM	planner, click-label inference, decide-then-search verdict	any chat-completion-style model
🖼️ Image generation	turns a structured prompt into a 2752×1536 annotated diagram with bake-in text labels	OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider
🌐 Web search	rephrased query → top-N normalized results → planner context + 📚 sources panel	any search backend
👁️ OCR (Apple Vision)	`zh-Hans` + `en-US` recognition over every generated PNG, projected as a selectable HTML overlay	local, no API keys needed

The image layer is a provider chain (IMAGE_PROVIDER=...,svg) — first enabled provider wins, svg is always appended last as a placeholder so the UI never breaks. Adding a new model is a single file:

// server/src/generation/providers/<name>.js
export default {
  name: 'my-model',
  enabled(config) { return Boolean(config.MY_API_KEY); },
  async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
    // call your model, write <hash>.png into outputDir, push phase events
  },
};

Out of the box:

Provider	Trigger to enable	Status
`openai`	`OPENAI_API_KEY` set	🔌 stub — implement in `providers/openai.js`
`nanobanana`	`NANOBANANA_API_KEY` or `GEMINI_API_KEY`	🔌 stub
`seeddance`	`SEEDDANCE_API_KEY` or `ARK_API_KEY`	🔌 stub
`codebuddy`	`ENABLE_CODEBUDDY=1`	✅ reference impl (used in the demo gif)
`svg`	always	✅ fallback placeholder

🎯 The reference implementation wires the codebuddy CLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives in server/src/codebuddyClient.js and is a useful template if you ever shell out to any CLI-based model.

🐦 Walkthrough — generating a woodpecker flipbook from zero

Type 啄木鸟 (woodpecker) into the top bar and watch the entire pipeline run: decide-then-search → planner → ImageGen → click to drill into the tongue anatomy / nest cavity / ant-foraging zones, each spawning its own annotated diagram with its own sources.

🗂️ Layout

.
├── prompts/                        # system / planner / click-label / image-prompt / decide-search
├── scripts/sync-prompts.mjs
├── server/
│   └── src/
│       ├── routes/                 # canvas, click, events (SSE), assets, share
│       ├── generation/
│       │   ├── pipeline.js         # generateRoot + expandFromClick + per-node concurrency
│       │   ├── decideSearch.js     # decide-then-search gate
│       │   ├── webSearch.js        # WebSearch subprocess + result normaliser
│       │   ├── queue.js            # PerCanvasQueue / Semaphore / PerKeySemaphore
│       │   ├── planner.js / clickLabel.js
│       │   ├── image.js            # provider-chain orchestrator
│       │   └── providers/          # codebuddy, openai, nanobanana, seeddance, svg
│       ├── db/                     # Sequelize models + hydrateFromDisk
│       ├── store/                  # filesystem layer
│       ├── sse/                    # event hub
│       └── codebuddyClient.js      # reference CLI-subprocess wrapper
└── web/                            # Vite + React + TS

💾 Storage

📁 Filesystem (source of truth for big artifacts): server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}.
🗃️ SQLite (server/data/flipbook.sqlite, via Sequelize): metadata index — Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runs hydrateFromDisk() to rebuild this index if it's missing.

🛠️ Develop

npm install
npm run dev           # server on :8787 + Vite on :5173 in parallel

Open http://127.0.0.1:5173.

By default ENABLE_CODEBUDDY=0 (stub mode — fast, SVG placeholders, no LLM). Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner + ImageGen + WebSearch:

ENABLE_CODEBUDDY=1 npm run dev:server

⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs). ImageGen produces 2752×1536 PNG (~6 MB).

Per-node parallelism

Up to 4 click expansions per parent node run in parallel; excess clicks queue. Different parents and different canvases run independently. A per-parent write lock serializes only the short read-modify-write of the parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).

🔍 Web search

A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the LLM with the proposed subject and asks: do recent / authoritative sources materially improve this node? The default leans yes — only clearly abstract / timeless subjects skip search. When yes:

The web-search backend runs with the rephrased query.
Results are normalised into {title, url, snippet, source}.
Top results are passed into the planner prompt.
Sources are persisted both into nodes/<hash>.json and into the SQLite Sources table.
The frontend renders a 📚 badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).

🔗 Share / preview links

POST /api/canvas/:id/share → {token, url}. Reuses an existing token for the same canvas.
GET /api/share/:token → {canvasId, topic, readOnly:true}.
Frontend: opening …?s=<token> puts the UI in read-only preview mode — no topic input, no clicks on the image, "👁 Preview" badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.

📺 Fullscreen / casting

⛶ button in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn't supported.
👁 / 🚫 button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.
Long-press hint is suppressed in fullscreen by default; the press still works.

🧹 Cleaning local state

npm run clean:data    # reset server/data (all canvases)
npm run clean:dist    # reset web/dist
npm run clean         # both

📦 Build for production

npm run build         # builds web/dist
npm start             # serves web/dist + API from :8787

⚙️ Configuration (env)

Var	Default	Purpose
`PORT`	8787	server port
`HOST`	127.0.0.1	server bind
`DATA_DIR`	`server/data`	canvas state on disk
`PROMPTS_DIR`	`prompts`	prompt files
`DB_PATH`	`<DATA_DIR>/flipbook.sqlite`	SQLite file
`MAX_PARALLEL_CLICKS_PER_NODE`	4	concurrent click expansions per parent
`PLANNER_TIMEOUT_MS`	90000	per-call planner timeout
`IMAGE_TIMEOUT_MS`	180000	per-call ImageGen timeout
`WEB_SEARCH_TIMEOUT_MS`	60000	per-call WebSearch timeout
`IMAGE_PROVIDER`	`codebuddy`	provider chain (e.g. `openai,nanobanana,svg`)
`IMAGE_SIZE`	`1920x1080`	requested size (provider may pick its own)
`ENABLE_CODEBUDDY`	0	flip to 1 to enable the reference CLI provider
`ENABLE_WEB_SEARCH`	follows `ENABLE_CODEBUDDY`	force-disable with `0`
`ENABLE_OCR`	1	run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to `0` to skip
`OCR_TIMEOUT_MS`	25000	per-call OCR timeout
`OCR_MIN_CONFIDENCE`	0.4	drop OCR spans below this confidence

English · 中文

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs/assets		docs/assets
prompts		prompts
scripts		scripts
server		server
web		web
.gitignore		.gitignore
README.md		README.md
README.zh.md		README.zh.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎨 Flipbook Canvas

✨ Why this is fun

📸 Screenshots

🚀 Highlights

🤖 Multimodal × Mainstream LLMs

🐦 Walkthrough — generating a woodpecker flipbook from zero

🗂️ Layout

💾 Storage

🛠️ Develop

Per-node parallelism

🔍 Web search

🔗 Share / preview links

📺 Fullscreen / casting

🧹 Cleaning local state

📦 Build for production

⚙️ Configuration (env)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎨 Flipbook Canvas

✨ Why this is fun

📸 Screenshots

🚀 Highlights

🤖 Multimodal × Mainstream LLMs

🐦 Walkthrough — generating a woodpecker flipbook from zero

🗂️ Layout

💾 Storage

🛠️ Develop

Per-node parallelism

🔍 Web search

🔗 Share / preview links

📺 Fullscreen / casting

🧹 Cleaning local state

📦 Build for production

⚙️ Configuration (env)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages