English Β· δΈζ
β¨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge β one click at a time.
π‘ Inspired by and a re-implementation of the product idea behind flipbook.page β credit to the original team for the click-to-explore canvas concept.
A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.
Most "AIη»εΎ" demos stop at one image. This one turns each image into a playable knowledge surface:
- π±οΈ Long-press anywhere on a picture β the model reads what's under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
- π Encyclopedia-style output β every node ships with a 150β220-char caption and 20β40 in-image labels (place names, dates, numbersβ¦), all OCR'd back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
- π³ Infinite tree of canvases β every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.
Click-to-explore β long-press any region to drill in |
End-to-end pipeline β search β planner β ImageGen β drill-down |
Gallery + canvas β every canvas is persisted, shareable, replayable |
|
- π±οΈ Click-to-explore: long-press (2 s) anywhere on a node's image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
- β‘ Per-node parallelism: up to 4 different spots in parallel per parent
(configurable). Each in-flight click streams a phase chip
(
Inferring labelβ¦βSearching the webβ¦βGenerating imageβ¦) on the hotspot. Hit the cap and the cursor turns into β. - π Encyclopedia register: planner produces 150β220 char captions with 20β40 in-image text fragments β like reading a richly annotated diagram in a children's encyclopedia.
- π Web-search augmented: a "decide-then-search" gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a π hover badge over the canvas.
- π¬ Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
- π Share as preview: any canvas β read-only
?s=<token>URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones. - πΊ Fullscreen casting: βΆ requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
- π€ Selectable in-image text: every label baked into the diagram is OCR'd
with Apple Vision (
zh-Hans+en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth. - π± Mobile responsive: top bar collapses to icons, single-column gallery, smaller hotspots and pending bubbles.
Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:
| Modality | What it does | Pluggable into |
|---|---|---|
| π Text / JSON LLM | planner, click-label inference, decide-then-search verdict | any chat-completion-style model |
| πΌοΈ Image generation | turns a structured prompt into a 2752Γ1536 annotated diagram with bake-in text labels | OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider |
| π Web search | rephrased query β top-N normalized results β planner context + π sources panel | any search backend |
| ποΈ OCR (Apple Vision) | zh-Hans + en-US recognition over every generated PNG, projected as a selectable HTML overlay |
local, no API keys needed |
The image layer is a provider chain (IMAGE_PROVIDER=...,svg) β first
enabled provider wins, svg is always appended last as a placeholder so the
UI never breaks. Adding a new model is a single file:
// server/src/generation/providers/<name>.js
export default {
name: 'my-model',
enabled(config) { return Boolean(config.MY_API_KEY); },
async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
// call your model, write <hash>.png into outputDir, push phase events
},
};Out of the box:
| Provider | Trigger to enable | Status |
|---|---|---|
openai |
OPENAI_API_KEY set |
π stub β implement in providers/openai.js |
nanobanana |
NANOBANANA_API_KEY or GEMINI_API_KEY |
π stub |
seeddance |
SEEDDANCE_API_KEY or ARK_API_KEY |
π stub |
codebuddy |
ENABLE_CODEBUDDY=1 |
β reference impl (used in the demo gif) |
svg |
always | β fallback placeholder |
π― The reference implementation wires the
codebuddyCLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives inserver/src/codebuddyClient.jsand is a useful template if you ever shell out to any CLI-based model.
Type εζ¨ιΈ (woodpecker) into the top bar and watch the entire pipeline run:
decide-then-search β planner β ImageGen β click to drill into the tongue
anatomy / nest cavity / ant-foraging zones, each spawning its own annotated
diagram with its own sources.
.
βββ prompts/ # system / planner / click-label / image-prompt / decide-search
βββ scripts/sync-prompts.mjs
βββ server/
β βββ src/
β βββ routes/ # canvas, click, events (SSE), assets, share
β βββ generation/
β β βββ pipeline.js # generateRoot + expandFromClick + per-node concurrency
β β βββ decideSearch.js # decide-then-search gate
β β βββ webSearch.js # WebSearch subprocess + result normaliser
β β βββ queue.js # PerCanvasQueue / Semaphore / PerKeySemaphore
β β βββ planner.js / clickLabel.js
β β βββ image.js # provider-chain orchestrator
β β βββ providers/ # codebuddy, openai, nanobanana, seeddance, svg
β βββ db/ # Sequelize models + hydrateFromDisk
β βββ store/ # filesystem layer
β βββ sse/ # event hub
β βββ codebuddyClient.js # reference CLI-subprocess wrapper
βββ web/ # Vite + React + TS
- π Filesystem (source of truth for big artifacts):
server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}. - ποΈ SQLite (
server/data/flipbook.sqlite, via Sequelize): metadata index β Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runshydrateFromDisk()to rebuild this index if it's missing.
npm install
npm run dev # server on :8787 + Vite on :5173 in parallelOpen http://127.0.0.1:5173.
By default ENABLE_CODEBUDDY=0 (stub mode β fast, SVG placeholders, no LLM).
Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner +
ImageGen + WebSearch:
ENABLE_CODEBUDDY=1 npm run dev:serverβ±οΈ With the reference provider, each node takes ~70β95 s end-to-end (planner ~25 s + ImageGen ~50β60 s including cold start; +5β15 s if web search runs). ImageGen produces 2752Γ1536 PNG (~6 MB).
Up to 4 click expansions per parent node run in parallel; excess clicks
queue. Different parents and different canvases run independently. A
per-parent write lock serializes only the short read-modify-write of the
parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).
A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the
LLM with the proposed subject and asks: do recent / authoritative sources
materially improve this node? The default leans yes β only clearly
abstract / timeless subjects skip search. When yes:
- The web-search backend runs with the rephrased query.
- Results are normalised into
{title, url, snippet, source}. - Top results are passed into the planner prompt.
- Sources are persisted both into
nodes/<hash>.jsonand into the SQLiteSourcestable. - The frontend renders a π badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).
POST /api/canvas/:id/shareβ{token, url}. Reuses an existing token for the same canvas.GET /api/share/:tokenβ{canvasId, topic, readOnly:true}.- Frontend: opening
β¦?s=<token>puts the UI in read-only preview mode β no topic input, no clicks on the image, "π Preview" badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.
βΆbutton in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn't supported.π/π«button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.- Long-press hint is suppressed in fullscreen by default; the press still works.
npm run clean:data # reset server/data (all canvases)
npm run clean:dist # reset web/dist
npm run clean # bothnpm run build # builds web/dist
npm start # serves web/dist + API from :8787| Var | Default | Purpose |
|---|---|---|
PORT |
8787 | server port |
HOST |
127.0.0.1 | server bind |
DATA_DIR |
server/data |
canvas state on disk |
PROMPTS_DIR |
prompts |
prompt files |
DB_PATH |
<DATA_DIR>/flipbook.sqlite |
SQLite file |
MAX_PARALLEL_CLICKS_PER_NODE |
4 | concurrent click expansions per parent |
PLANNER_TIMEOUT_MS |
90000 | per-call planner timeout |
IMAGE_TIMEOUT_MS |
180000 | per-call ImageGen timeout |
WEB_SEARCH_TIMEOUT_MS |
60000 | per-call WebSearch timeout |
IMAGE_PROVIDER |
codebuddy |
provider chain (e.g. openai,nanobanana,svg) |
IMAGE_SIZE |
1920x1080 |
requested size (provider may pick its own) |
ENABLE_CODEBUDDY |
0 | flip to 1 to enable the reference CLI provider |
ENABLE_WEB_SEARCH |
follows ENABLE_CODEBUDDY |
force-disable with 0 |
ENABLE_OCR |
1 | run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to 0 to skip |
OCR_TIMEOUT_MS |
25000 | per-call OCR timeout |
OCR_MIN_CONFIDENCE |
0.4 | drop OCR spans below this confidence |
English Β· δΈζ


