From dbf9eea479c87fb807aa1a21184c41b22e9695c2 Mon Sep 17 00:00:00 2001 From: Universe Date: Sun, 22 Mar 2026 17:57:21 +0900 Subject: [PATCH] docs: document viewport culling investigation and findings Linear O(n) viewport culling was benchmarked against real-world SVG fixtures (up to 300K nodes). It regresses 8-13% on dense scenes where most nodes are visible. Updated optimization.md item 12 to note spatial index requirement. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../feat-2d/investigation-viewport-culling.md | 62 +++++++++++++++++++ docs/wg/feat-2d/optimization.md | 19 +----- 2 files changed, 63 insertions(+), 18 deletions(-) create mode 100644 docs/wg/feat-2d/investigation-viewport-culling.md diff --git a/docs/wg/feat-2d/investigation-viewport-culling.md b/docs/wg/feat-2d/investigation-viewport-culling.md new file mode 100644 index 0000000000..6b125d95fa --- /dev/null +++ b/docs/wg/feat-2d/investigation-viewport-culling.md @@ -0,0 +1,62 @@ +--- +title: "Investigation: Viewport Culling & Camera Caching" +status: rejected +date: 2026-03-22 +--- + +# Investigation: Viewport Culling & Camera Caching + +## Hypothesis + +During view-only camera transforms (pan/zoom), skip drawing layers whose bounds fall outside the visible viewport. Cache Camera2D derived values (view matrix, inverse, zoom, world rect) to avoid redundant per-frame math. + +## What Was Tried + +1. **Camera2D caching** — `warm_cache()` precomputes view matrix, inverse, zoom, and world rect once per mutation. Read accessors (`view_matrix()`, `rect()`, `get_zoom()`, `screen_to_canvas_point()`) return cached fields in O(1). + +2. **Viewport culling** — Before each `draw_layer` call, check if the layer's render bounds (from `GeometryCache`) intersect the camera's world rect. Skip layers that are entirely off-screen. + +## Results + +### Synthetic scenes (Criterion, CPU raster, 1920x1080) + +Sparse grids where most nodes are off-screen: + +| Metric | 100 nodes | 1K nodes | 10K nodes | +| ------ | --------- | -------- | --------- | +| Pan | ~same | **−46%** | **−85%** | +| Zoom | ~same | **−32%** | **−81%** | + +### Real-world SVGs (headless, CPU raster, 1920x1080) + +Dense content where most nodes overlap the viewport: + +| Scene | Nodes | Pan Δ | Zoom Δ | +| -------------------------------- | ----- | ---------- | ---------- | +| Koppen-Geiger climate map (96MB) | 235K | **+8.7%** | **+13.3%** | +| San Francisco Bay map (40MB) | 85K | **+11.0%** | −7.3% | +| Lorenz 3D attractor (20MB) | 300K | +3.5% | ~same | +| Lyon fortification map (30MB) | 34 | −2.0% | −3.0% | +| Propane flame contours (30MB) | 1.8K | −6.5% | −3.3% | + +## Why It Failed on Real Content + +Linear viewport culling is **O(n) per frame** — every node's bounds are checked against the viewport. For dense scenes (maps, scientific visualizations), nearly all nodes pass the intersection test, so the check is pure overhead. + +The synthetic benchmarks were misleading: a sparse grid at 10K nodes has ~90% off-screen at any given viewport, so culling skips most work. Real documents are the opposite — content is concentrated in the viewport. + +## Conclusion + +- **Camera caching**: safe but negligible (~30ns/frame savings vs 200ms+ frame times) +- **Linear viewport culling**: net negative on real content. Do not adopt without a spatial index. +- **Actual bottleneck**: Skia path rasterization dominates frame time on large scenes (235K paths = 800ms). CPU-side culling cannot fix this. + +## What Would Actually Help + +Per items 6, 12, and 36 in `optimization.md`: + +- **Spatial index** (R-tree/quadtree, item 36) would make culling O(log n) instead of O(n) +- **Tile-based raster cache** (item 6) would avoid re-rasterizing static content on camera change +- **SkPicture caching** (item 5) with dirty-region invalidation would let Skia replay recorded ops instead of re-drawing paths + +The draw stage (Skia path rasterization) is where 95%+ of frame time goes on large scenes. Optimizations must target that. diff --git a/docs/wg/feat-2d/optimization.md b/docs/wg/feat-2d/optimization.md index 388668e21a..ab09a50f34 100644 --- a/docs/wg/feat-2d/optimization.md +++ b/docs/wg/feat-2d/optimization.md @@ -11,17 +11,14 @@ A summary of all discussed optimization techniques for achieving high-performanc ## Transform & Geometry 1. **Transform Cache** - - Store `local_transform` and derived `world_transform`. - Use dirty flags and top-down updates. 2. **Geometry Cache** - - Cache `local_bounds`, `world_bounds`. - Used for culling, layout, and hit-testing. 3. **Flat Scene Graph + Parent Pointers** - - Flat arena with parent/children relationships. - Enables O(1) access and traversal. @@ -30,17 +27,14 @@ A summary of all discussed optimization techniques for achieving high-performanc ## Rendering Pipeline 4. **GPU Acceleration (Skia Backend::GL/Vulkan)** - - Use hardware compositing, filters, transforms. 5. **Scene-Level Picture Caching** - - Use `SkPicture` to record full-scene vector draw ops. - Serves as the always-up-to-date canonical snapshot. - Resolution-independent; ideal for rerendering or tile regeneration. 6. **Tile-Based Raster Cache (Hybrid Rendering)** - - Render the full viewport, take snapshot. debounced (after no more changes. e.g. 150ms) - Divide the snapshot into fixed-size tiles (e.g., 512×512). - When new area discovered, render the cached, non-overlapping parts with tile cache. only render newly discovered area. @@ -48,24 +42,19 @@ A summary of all discussed optimization techniques for achieving high-performanc - Optional padding per tile to account for effects (blur, shadows). 7. **Dynamic Mode Switching (Picture vs Tile)** - - Render from `SkPicture` directly during normal zoom or active edits. - Fallback to raster tiles for zoomed-out or complex views. - Tile invalidation/redraw is driven by zoom level, camera transform, or frame budget. 8. **Dirty & Re-Cache Strategy** - - Nodes marked dirty will trigger re-recording of affected picture regions or tiles. - Use change tracking to only re-record minimum needed areas. - Recording large subtrees is expensive—optimize granularity based on tree structure. 9. **Scene Cache Config / Strategy** - - Defines how scene caching is organized. - Properties include: - - `depth`: - - `0` → Entire scene is one cache. - `1` → Cache per top-level container. - `n` → Cache at depth `n`, chunking deeper layers. @@ -83,10 +72,8 @@ A summary of all discussed optimization techniques for achieving high-performanc - Cache accessors like `get_picture_cache_by_id()` support scoped re-rendering. 10. **Will-Change Optimization** - - Nodes marked with "will-change" are expected to become dirty soon. - Examples: - - Image node waiting on async src resolution - Text node waiting on font availability @@ -94,9 +81,7 @@ A summary of all discussed optimization techniques for achieving high-performanc - Prevents re-recording full subtrees—minimizes recording cost. 11. **Flattened Render Command List** - - Scene is compiled into a flat list of `RenderCommand` structs with resolved: - - Transform - Clip bounds - Opacity @@ -128,9 +113,8 @@ A summary of all discussed optimization techniques for achieving high-performanc - This model is essential for dynamic caching, parallel planning, and GPU-aware scheduling. 12. **Dirty-Region Culling** - - Use camera’s `visible_rect` to cull `world_bounds`. - - Optional: accelerate with quadtree or BVH. + - **Requires spatial index** (quadtree or BVH, see item 36). Linear O(n) culling was benchmarked and causes 8-13% regression on dense real-world scenes (235K nodes) because the per-node bounds check adds overhead when most nodes are visible. See `investigation-viewport-culling.md` for full data. 13. **Minimize Canvas State Changes** @@ -309,7 +293,6 @@ Even if content is temporarily low-res, the tool still feels precise. ## Text & Glyph Optimization 29. **Glyph Cache (Atlas or Paragraph Caching)** - - Cache rasterized or vector glyphs used across the document. - Prevents redundant layout or rendering of text. - Essential for high-DPI or frequently zoomed views.