Skip to content

imcuttle/flipbook-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎨 Flipbook Canvas

English Β· δΈ­ζ–‡

Node React Vite Express TypeScript SQLite Multimodal PRs Welcome GitHub stars

✨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge β€” one click at a time.

πŸ’‘ Inspired by and a re-implementation of the product idea behind flipbook.page β€” credit to the original team for the click-to-explore canvas concept.

A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.


✨ Why this is fun

Most "AIη”»ε›Ύ" demos stop at one image. This one turns each image into a playable knowledge surface:

  • πŸ–±οΈ Long-press anywhere on a picture β†’ the model reads what's under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
  • πŸ“š Encyclopedia-style output β€” every node ships with a 150–220-char caption and 20–40 in-image labels (place names, dates, numbers…), all OCR'd back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
  • 🌳 Infinite tree of canvases β€” every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.

πŸ“Έ Screenshots

Click-to-explore demo
Click-to-explore β€” long-press any region to drill in
Woodpecker walkthrough
End-to-end pipeline β€” search β†’ planner β†’ ImageGen β†’ drill-down
Gallery and canvas
Gallery + canvas β€” every canvas is persisted, shareable, replayable

πŸš€ Highlights

  • πŸ–±οΈ Click-to-explore: long-press (2 s) anywhere on a node's image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
  • ⚑ Per-node parallelism: up to 4 different spots in parallel per parent (configurable). Each in-flight click streams a phase chip (Inferring label… β†’ Searching the web… β†’ Generating image…) on the hotspot. Hit the cap and the cursor turns into βŒ›.
  • πŸ“– Encyclopedia register: planner produces 150–220 char captions with 20–40 in-image text fragments β€” like reading a richly annotated diagram in a children's encyclopedia.
  • 🌐 Web-search augmented: a "decide-then-search" gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a πŸ“š hover badge over the canvas.
  • 🎬 Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
  • πŸ”— Share as preview: any canvas β†’ read-only ?s=<token> URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones.
  • πŸ“Ί Fullscreen casting: β›Ά requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
  • πŸ”€ Selectable in-image text: every label baked into the diagram is OCR'd with Apple Vision (zh-Hans + en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth.
  • πŸ“± Mobile responsive: top bar collapses to icons, single-column gallery, smaller hotspots and pending bubbles.

πŸ€– Multimodal Γ— Mainstream LLMs

Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:

Modality What it does Pluggable into
πŸ“ Text / JSON LLM planner, click-label inference, decide-then-search verdict any chat-completion-style model
πŸ–ΌοΈ Image generation turns a structured prompt into a 2752Γ—1536 annotated diagram with bake-in text labels OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider
🌐 Web search rephrased query β†’ top-N normalized results β†’ planner context + πŸ“š sources panel any search backend
πŸ‘οΈ OCR (Apple Vision) zh-Hans + en-US recognition over every generated PNG, projected as a selectable HTML overlay local, no API keys needed

The image layer is a provider chain (IMAGE_PROVIDER=...,svg) β€” first enabled provider wins, svg is always appended last as a placeholder so the UI never breaks. Adding a new model is a single file:

// server/src/generation/providers/<name>.js
export default {
  name: 'my-model',
  enabled(config) { return Boolean(config.MY_API_KEY); },
  async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
    // call your model, write <hash>.png into outputDir, push phase events
  },
};

Out of the box:

Provider Trigger to enable Status
openai OPENAI_API_KEY set πŸ”Œ stub β€” implement in providers/openai.js
nanobanana NANOBANANA_API_KEY or GEMINI_API_KEY πŸ”Œ stub
seeddance SEEDDANCE_API_KEY or ARK_API_KEY πŸ”Œ stub
codebuddy ENABLE_CODEBUDDY=1 βœ… reference impl (used in the demo gif)
svg always βœ… fallback placeholder

🎯 The reference implementation wires the codebuddy CLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives in server/src/codebuddyClient.js and is a useful template if you ever shell out to any CLI-based model.


🐦 Walkthrough β€” generating a woodpecker flipbook from zero

Type ε•„ζœ¨ιΈŸ (woodpecker) into the top bar and watch the entire pipeline run: decide-then-search β†’ planner β†’ ImageGen β†’ click to drill into the tongue anatomy / nest cavity / ant-foraging zones, each spawning its own annotated diagram with its own sources.


πŸ—‚οΈ Layout

.
β”œβ”€β”€ prompts/                        # system / planner / click-label / image-prompt / decide-search
β”œβ”€β”€ scripts/sync-prompts.mjs
β”œβ”€β”€ server/
β”‚   └── src/
β”‚       β”œβ”€β”€ routes/                 # canvas, click, events (SSE), assets, share
β”‚       β”œβ”€β”€ generation/
β”‚       β”‚   β”œβ”€β”€ pipeline.js         # generateRoot + expandFromClick + per-node concurrency
β”‚       β”‚   β”œβ”€β”€ decideSearch.js     # decide-then-search gate
β”‚       β”‚   β”œβ”€β”€ webSearch.js        # WebSearch subprocess + result normaliser
β”‚       β”‚   β”œβ”€β”€ queue.js            # PerCanvasQueue / Semaphore / PerKeySemaphore
β”‚       β”‚   β”œβ”€β”€ planner.js / clickLabel.js
β”‚       β”‚   β”œβ”€β”€ image.js            # provider-chain orchestrator
β”‚       β”‚   └── providers/          # codebuddy, openai, nanobanana, seeddance, svg
β”‚       β”œβ”€β”€ db/                     # Sequelize models + hydrateFromDisk
β”‚       β”œβ”€β”€ store/                  # filesystem layer
β”‚       β”œβ”€β”€ sse/                    # event hub
β”‚       └── codebuddyClient.js      # reference CLI-subprocess wrapper
└── web/                            # Vite + React + TS

πŸ’Ύ Storage

  • πŸ“ Filesystem (source of truth for big artifacts): server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}.
  • πŸ—ƒοΈ SQLite (server/data/flipbook.sqlite, via Sequelize): metadata index β€” Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runs hydrateFromDisk() to rebuild this index if it's missing.

πŸ› οΈ Develop

npm install
npm run dev           # server on :8787 + Vite on :5173 in parallel

Open http://127.0.0.1:5173.

By default ENABLE_CODEBUDDY=0 (stub mode β€” fast, SVG placeholders, no LLM). Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner + ImageGen + WebSearch:

ENABLE_CODEBUDDY=1 npm run dev:server

⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs). ImageGen produces 2752Γ—1536 PNG (~6 MB).

Per-node parallelism

Up to 4 click expansions per parent node run in parallel; excess clicks queue. Different parents and different canvases run independently. A per-parent write lock serializes only the short read-modify-write of the parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).

πŸ” Web search

A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the LLM with the proposed subject and asks: do recent / authoritative sources materially improve this node? The default leans yes β€” only clearly abstract / timeless subjects skip search. When yes:

  1. The web-search backend runs with the rephrased query.
  2. Results are normalised into {title, url, snippet, source}.
  3. Top results are passed into the planner prompt.
  4. Sources are persisted both into nodes/<hash>.json and into the SQLite Sources table.
  5. The frontend renders a πŸ“š badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).

πŸ”— Share / preview links

  • POST /api/canvas/:id/share β†’ {token, url}. Reuses an existing token for the same canvas.
  • GET /api/share/:token β†’ {canvasId, topic, readOnly:true}.
  • Frontend: opening …?s=<token> puts the UI in read-only preview mode β€” no topic input, no clicks on the image, "πŸ‘ Preview" badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.

πŸ“Ί Fullscreen / casting

  • β›Ά button in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn't supported.
  • πŸ‘ / 🚫 button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.
  • Long-press hint is suppressed in fullscreen by default; the press still works.

🧹 Cleaning local state

npm run clean:data    # reset server/data (all canvases)
npm run clean:dist    # reset web/dist
npm run clean         # both

πŸ“¦ Build for production

npm run build         # builds web/dist
npm start             # serves web/dist + API from :8787

βš™οΈ Configuration (env)

Var Default Purpose
PORT 8787 server port
HOST 127.0.0.1 server bind
DATA_DIR server/data canvas state on disk
PROMPTS_DIR prompts prompt files
DB_PATH <DATA_DIR>/flipbook.sqlite SQLite file
MAX_PARALLEL_CLICKS_PER_NODE 4 concurrent click expansions per parent
PLANNER_TIMEOUT_MS 90000 per-call planner timeout
IMAGE_TIMEOUT_MS 180000 per-call ImageGen timeout
WEB_SEARCH_TIMEOUT_MS 60000 per-call WebSearch timeout
IMAGE_PROVIDER codebuddy provider chain (e.g. openai,nanobanana,svg)
IMAGE_SIZE 1920x1080 requested size (provider may pick its own)
ENABLE_CODEBUDDY 0 flip to 1 to enable the reference CLI provider
ENABLE_WEB_SEARCH follows ENABLE_CODEBUDDY force-disable with 0
ENABLE_OCR 1 run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to 0 to skip
OCR_TIMEOUT_MS 25000 per-call OCR timeout
OCR_MIN_CONFIDENCE 0.4 drop OCR spans below this confidence

English Β· δΈ­ζ–‡

About

🎨 Flipbook Canvas β€” Click-to-explore knowledge picture-book. Long-press any image to spawn an annotated child diagram, powered by a pluggable multimodal pipeline (text LLM + image gen + web search + OCR) wired to mainstream models (OpenAI / Gemini / Seedream / …). | η‚Ήε‡»εΌζŽ’η΄’ηš„ηŸ₯θ―†η”»ε†ŒοΌšι•ΏζŒ‰ε›Ύη‰‡ε³ε―η”ŸζˆεΈ¦ζ–‡ε­—ζ ‡ζ³¨ηš„ε­ε›ΎοΌŒε€šζ¨‘ζ€ζ΅ζ°΄ηΊΏδΈ²θ”δΈ»ζ΅ε€§ζ¨‘εž‹γ€‚

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors