perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks by softmarshmallow · Pull Request #619 · gridaco/grida

softmarshmallow · 2026-04-01T11:16:17Z

Summary

perf(cg): Skip the O(N) prefill loop on cache-warm frames via a generation counter on PictureCache. On 135K-node scenes, eliminates ~800µs of HashMap lookups per warm frame (p50 pan: 111µs → 76µs, -32%).
refactor(cg): Extract render cost prediction into an isolated runtime::cost_prediction module. Computes predicted_cost_us in FramePlan and displays it in the devtools overlay (pred: Xµs (Y×)) — instrumentation only, no behavioral changes.
docs(cg): Add render-cost-prediction.md reference with measured GPU operation costs, Skia blur analysis, blend mode tiers, and cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/HEURISTIC.
bench(cg): Add three Skia validation benchmarks — skia_bench_cost_model (per-effect cost), skia_bench_cache_blit (cache hit vs miss ratio), skia_bench_scene_scale (full pipeline at 1K–136K nodes).
feat(tests): Add box-margin.html fixture demonstrating CSS margin collapse, negative margins, and auto margins.

Measured improvements (01-135k.perf.grida)

Metric	Before	After	Change
`rt_pan_fast_fit` p50	111µs	76µs	-32%
`rt_pan_fast_fit` p95	263µs	153µs	-42%
`pan_settle_slow_fit` settle	1034µs	709µs	-31%
Criterion `large_baseline/pan`	—	—	-14.0% (p < 0.01)

Changed files

crates/grida-canvas/src/cache/picture.rs — generation counter on PictureCache
crates/grida-canvas/src/runtime/cost_prediction.rs — new cost prediction module
crates/grida-canvas/src/runtime/scene.rs — prefill skip logic, cost prediction integration
crates/grida-canvas/src/window/application.rs — devtools overlay for predicted cost
crates/grida-canvas/examples/skia_bench/ — three new validation benchmarks
docs/wg/feat-2d/render-cost-prediction.md — reference document
docs/wg/feat-2d/optimization.md — updated optimization notes
fixtures/test-html/L0/box-margin.html — CSS margin test fixture

Summary by CodeRabbit

Release Notes

New Features
- Added rendering cost prediction model to estimate GPU operation overhead.
- Added GPU benchmarking tools for performance profiling.
Performance Improvements
- Optimized frame rendering by caching predictions and skipping redundant prefill operations.
- Updated developer stats overlay to display predicted rendering costs alongside frame metrics.
Documentation
- Added optimization and cost prediction methodology documentation.

Add a monotonically increasing generation counter to PictureCache that increments on any mutation (insert, invalidate). The prefill loop now compares the current generation, variant key, and layer count against the last successful prefill — when all match, the entire O(N) iteration is skipped in O(1). On 135K-node scenes, this eliminates ~800µs of HashMap lookups per cache-warm frame. Measured improvements on 01-135k.perf.grida: - rt_pan_fast_fit p50: 111µs → 76µs (-32%) - rt_pan_fast_fit p95: 263µs → 153µs (-42%) - pan_settle_slow_fit settle: 1034µs → 709µs (-31%) - Criterion large_baseline/pan: -14.0% (p < 0.01) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

perf(cg): skip prefill loop on cache-warm frames via generation tracking

Add reference document (render-cost-prediction.md) with measured GPU operation costs, Skia blur algorithm analysis, blend mode tiers, and cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/ HEURISTIC. Key finding: fixed per-operation overhead (FBO allocation, ~11-110µs) dominates over pixel-proportional cost at typical node sizes. Add three validation benchmarks: - skia_bench_cost_model: per-effect cost at 50²–4000², linearity, blur radius dependence, two-component formula extraction - skia_bench_cache_blit: cache hit vs miss ratio (~0.05×), blit constancy - skia_bench_scene_scale: full Renderer pipeline at 1K–136K nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-01T11:16:22Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	Apr 1, 2026 0:25am

6 Skipped Deployments

Project	Deployment	Updated (UTC)
code	Ignored	Apr 1, 2026 0:25am
legacy	Ignored	Apr 1, 2026 0:25am
backgrounds	Skipped	Apr 1, 2026 0:25am
blog	Skipped	Apr 1, 2026 0:25am
grida	Skipped	Apr 1, 2026 0:25am
viewer	Skipped	Apr 1, 2026 0:25am

coderabbitai · 2026-04-01T11:16:25Z

Walkthrough

The PR introduces a GPU rendering cost prediction framework with per-node fixed overhead estimation, picture cache generation tracking to skip redundant prefill iterations, three GPU benchmark examples measuring Skia performance under varying conditions, and supporting documentation describing the cost model and cache optimization.

Changes

Cohort / File(s)	Summary
Build Configuration `crates/grida-canvas/Cargo.toml`	Added three new example targets (`skia_bench_cost_model`, `skia_bench_cache_blit`, `skia_bench_scene_scale`) gated behind `native-gl-context` feature.
GPU Benchmark Examples `crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs`, `...skia_bench_cache_blit.rs`, `...skia_bench_scene_scale.rs`	Added three comprehensive Skia GPU benchmarks measuring cost modeling (variant/effect overhead, linearity, blur scaling), cache blit performance (hit/miss ratios), and full-scene rendering cost under increasing node counts.
Cache Generation Tracking `crates/grida-canvas/src/cache/picture.rs`	Added `generation: u64` counter field incremented on cache mutations/invalidations, exposed via `generation()` getter, and added `variant_store_is_empty()` query method.
Cost Prediction Module `crates/grida-canvas/src/runtime/cost_prediction.rs`, `crates/grida-canvas/src/runtime/mod.rs`	Introduced new `cost_prediction` module with `blur_cost_us()` and `estimate_node_cost()` computing per-node fixed overhead based on effects (blur, shadows, backdrop blur, glass), blend/opacity isolation, and cache state.
Frame Planning and Prefill Optimization `crates/grida-canvas/src/runtime/scene.rs`	Added `predicted_cost_us: f64` to `FramePlan`; implemented prefill-skip fast path using generation/variant-key/layer-count tracking to avoid O(N) iteration when cache state is unchanged; prefill now populates predicted cost by summing `estimate_node_cost()` across live and promoted nodes.
Devtools Overlay `crates/grida-canvas/src/window/application.rs`	Updated frame-stats overlay to display predicted rendering cost (`pred: <cost>µs (<ratio>×)`) alongside measured frame duration ratio.
Documentation `docs/wg/feat-2d/optimization.md`, `docs/wg/feat-2d/render-cost-prediction.md`	Added new doc on "Picture Cache Prefill Skip (Generation Tracking)" optimization; rewrote cost-prediction doc from pixel fill-rate model to overhead-centric model with per-operation fixed costs, revised blur scaling, and per-visible-node frame cost formula.
Test Fixture `fixtures/test-html/L0/box-margin.html`	Added standalone HTML page demonstrating CSS margin behaviors (collapsing, auto-centering, inline handling, parent–child collapse).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

perf(cg): skip prefill loop on cache-warm frames via generation tracking #618: Directly related—both add generation counter to PictureCache and implement prefill-skip tracking/early-return fast path in runtime/scene.rs.
perf(cg): unify picture cache variants for effects-free nodes #614: Related—both modify picture-cache variant-unification logic and cache/prefill codepaths for effect-free node optimization.
canary: rendering fixes, Figma import improvements, and frame plan optimizations #597: Related—both modify runtime/scene.rs frame planning and window/application.rs overlay metrics for frame statistics display.

Suggested labels

cg, canvas

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title comprehensively covers the three main changes: render cost prediction module, prefill skip optimization, and Skia benchmarks, directly matching the PR objectives and changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch canary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Move render cost estimation into isolated `runtime::cost_prediction` module. Compute `predicted_cost_us` in FramePlan and display in devtools overlay as `pred: Xµs (Y×)` for correlation against actual frame times. This is instrumentation only — no behavioral changes to pan/zoom cache blits, stable/unstable promotion, downscale rendering, or effect quality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This new HTML fixture illustrates various CSS margin behaviors, including margin collapse, negative margins, and auto margins. It provides visual examples and descriptions to aid understanding of how margins interact in different scenarios, enhancing the testing framework for CSS-related features.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32e43feba3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T12:37:14Z

crates/grida-canvas/src/runtime/scene.rs

+        if current_gen == self.last_prefill_generation
+            && effective_key_for_tracking == self.last_prefill_variant_key
+            && layer_count == self.last_prefill_layer_count
+        {


Track visible-node identity before skipping prefill

The new fast-path returns when cache generation, variant key, and layer_count match, but layer_count does not guarantee the same visible nodes. During a pan over a uniform-density scene, the viewport can switch to a different set of nodes with the same count, so this guard skips prefill even though those nodes have no recorded pictures; since the draw fallback renders misses directly (without inserting into scene_cache.picture), the misses can persist across frames until an unrelated cache mutation occurs, causing sustained cache-hit collapse and interaction slowdowns.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

crates/grida-canvas/src/cache/picture.rs (1)

121-124: Avoid bumping generation on no-op node invalidation.

invalidate_node currently increments generation even when the node is absent from both stores, which can unnecessarily disable the prefill-skip fast path on subsequent frames.

♻️ Suggested refinement

 pub fn invalidate_node(&mut self, id: NodeId) {
-    self.default_store.remove(&id);
-    self.variant_store.retain(|&(nid, _), _| nid != id);
-    self.generation = self.generation.wrapping_add(1);
+    let mut changed = self.default_store.remove(&id).is_some();
+    let before = self.variant_store.len();
+    self.variant_store.retain(|&(nid, _), _| nid != id);
+    changed |= self.variant_store.len() != before;
+    if changed {
+        self.generation = self.generation.wrapping_add(1);
+    }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/cache/picture.rs` around lines 121 - 124,
invalidate_node is incrementing self.generation even when no entry is removed;
change it so generation is only bumped when the call actually removes something.
Use the return value of self.default_store.remove(&id) (Option) and check
whether self.variant_store changed (e.g., capture its length before and after
retain or use a drain_filter/retain pattern that records removals) and only call
self.generation = self.generation.wrapping_add(1) when either store had entries
removed; reference the invalidate_node method and the fields default_store,
variant_store, and generation.

crates/grida-canvas/src/runtime/cost_prediction.rs (1)

116-137: Noise effects are currently unaccounted in node-cost estimation.

The estimator sums several effect classes but omits LayerEffects.noises, which can materially underpredict mixed-effect scenes and reduce overlay/model trust.

Also applies to: 152-153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/runtime/cost_prediction.rs` around lines 116 - 137,
The cost estimator omits noise effects from LayerEffects; update the runtime
cost calculation to include effects.noises similar to the shadows block: iterate
over &effects.noises, match the noise variant (e.g., FilterNoiseEffect::Grain /
other variants) and when a noise entry is active add the appropriate noise cost
constant (e.g., COST_NOISE_US) possibly combined with any parameter-based cost
(max with a computed noise_cost function or blur_cost_us if applicable) to the
cost accumulator; apply the same inclusion at the other occurrence noted (around
the later block at 152-153) so noises are considered in both places.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs`:
- Around line 197-203: The benchmark currently records miss_us by timing only
bench_draw (a direct draw) then separately calls capture_to_image to populate
the cache, so the reported "miss" time omits the populate cost; update the miss
measurement to include the offscreen render + snapshot by timing the combined
operations (call bench_draw and capture_to_image inside the same timed block
that sets miss_us), or alternatively rename the output/columns to reflect that
miss_us is actually "draw_only"; locate the timing variables around bench_draw,
capture_to_image, and cached_image (and the equivalent block later around the
second benchmark) and adjust them consistently so the printed "hit vs miss"
ratio measures draw+populate vs blit or rename both labels to avoid misleading
results.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs`:
- Around line 26-29: The benchmark currently uses fixed constants W and H = 1000
so large test sizes (e.g., 2000, 4000) are clipped and the area model is
invalid; change the benchmark to either (a) create the Skia render surface for
each test using the current test size (use the loop variable that represents
sample size — e.g., the size or sample_size used in the benchmark) so the
surface dimensions match the test, or (b) set W and H to the maximum test
dimension (the largest sample size) before running all tests so size*size equals
the true visible pixels; update surface construction code (where the Skia
surface/context is created) rather than relying on the fixed W/H constants.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs`:
- Line 267: The printed section label is incorrect: change the string in the
println call that currently reads "SECTION 1: Frame Time vs. Node Count
(unstable frames, full Renderer pipeline)" to indicate stable frames instead
(e.g., "SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer
pipeline)") so it matches the actual run mode using queue_stable and the
surrounding stable-frame comments in skia_bench_scene_scale.rs; update any
nearby comments if needed to keep labels consistent with the queue_stable
execution path.

In `@crates/grida-canvas/src/runtime/scene.rs`:
- Line 1128: Renderer::render_frame currently zeroes predicted_cost_us only on
the deferred-plan path; update both cache-hit return branches inside
Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same
metric reset you applied in flush_with_plan), ensuring any cache-hit early
returns return a plan/overlay with predicted_cost_us = 0.0; locate the two
cache-hit return points in render_frame and add the same assignment to the
returned Plan/Overlay data structure.
- Around line 366-372: The current warm-path check for skipping prefill uses
last_prefill_generation, last_prefill_variant_key and last_prefill_layer_count
but needs a stable fingerprint of the actually visible node ids/indices to avoid
skipping when the visible window changes; add a new field (e.g.
last_prefill_visible_fingerprint: u64) to the Scene state and compute a stable
hash/fingerprint of the set/sequence of visible node ids or indices at the time
of prefill (use a deterministic order), then include that fingerprint in every
warm-path comparison where last_prefill_generation/variant_key/layer_count are
compared (including the other affected blocks around the indicated ranges) so
the prefill is only skipped when generation, variant key, layer count AND
visible-fingerprint all match. Ensure the fingerprint is updated whenever
prefill runs and used in the skip decision.

In `@docs/wg/feat-2d/render-cost-prediction.md`:
- Around line 103-107: The markdown has three fenced code blocks missing
language identifiers (MD040); add a language tag such as "text" to each opening
triple-backtick for the blocks containing the lines starting with "σ ≤ 4.0 and
small kernel  →  single 2D convolution pass (≤28 samples)", the block starting
with "frame_cost ≈ Σ visible_nodes(", and the block starting with
"save_layer_overhead_us  = measured via single save_layer + draw + restore"
(also apply the same fix to the other occurrences noted around lines 246-251 and
263-266): change ``` to ```text for each of those fenced code blocks so
markdownlint no longer flags MD040.

---

Nitpick comments:
In `@crates/grida-canvas/src/cache/picture.rs`:
- Around line 121-124: invalidate_node is incrementing self.generation even when
no entry is removed; change it so generation is only bumped when the call
actually removes something. Use the return value of
self.default_store.remove(&id) (Option) and check whether self.variant_store
changed (e.g., capture its length before and after retain or use a
drain_filter/retain pattern that records removals) and only call self.generation
= self.generation.wrapping_add(1) when either store had entries removed;
reference the invalidate_node method and the fields default_store,
variant_store, and generation.

In `@crates/grida-canvas/src/runtime/cost_prediction.rs`:
- Around line 116-137: The cost estimator omits noise effects from LayerEffects;
update the runtime cost calculation to include effects.noises similar to the
shadows block: iterate over &effects.noises, match the noise variant (e.g.,
FilterNoiseEffect::Grain / other variants) and when a noise entry is active add
the appropriate noise cost constant (e.g., COST_NOISE_US) possibly combined with
any parameter-based cost (max with a computed noise_cost function or
blur_cost_us if applicable) to the cost accumulator; apply the same inclusion at
the other occurrence noted (around the later block at 152-153) so noises are
considered in both places.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cccc6f1c-9096-4695-8808-a8951c515a17

📥 Commits

Reviewing files that changed from the base of the PR and between 2ababaf and 32e43fe.

📒 Files selected for processing (12)

crates/grida-canvas/Cargo.toml
crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs
crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs
crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs
crates/grida-canvas/src/cache/picture.rs
crates/grida-canvas/src/runtime/cost_prediction.rs
crates/grida-canvas/src/runtime/mod.rs
crates/grida-canvas/src/runtime/scene.rs
crates/grida-canvas/src/window/application.rs
docs/wg/feat-2d/optimization.md
docs/wg/feat-2d/render-cost-prediction.md
fixtures/test-html/L0/box-margin.html

coderabbitai · 2026-04-01T12:44:54Z

crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs

+            // Cache miss: full rasterize
+            let miss_us = bench_draw(surface, &|canvas| {
+                (effect.draw)(canvas, dst_rect);
+            });
+
+            // Capture to GPU texture
+            let cached_image = capture_to_image(surface, size, &*effect.draw);


⚠️ Potential issue | 🟠 Major

Miss(µs) is timing a direct draw, not a cache miss.

miss_us is recorded before the offscreen render + snapshot needed to populate the cache. That means the printed ratio is comparing blit vs draw, not blit vs miss + populate, so the "cache hit vs. miss" numbers are systematically optimistic. Either time the populate step inside the miss path or rename the benchmark/columns to reflect what is actually being measured.

Also applies to: 210-215

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs` around lines 197 - 203, The benchmark currently records miss_us by timing only bench_draw (a direct draw) then separately calls capture_to_image to populate the cache, so the reported "miss" time omits the populate cost; update the miss measurement to include the offscreen render + snapshot by timing the combined operations (call bench_draw and capture_to_image inside the same timed block that sets miss_us), or alternatively rename the output/columns to reflect that miss_us is actually "draw_only"; locate the timing variables around bench_draw, capture_to_image, and cached_image (and the equivalent block later around the second benchmark) and adjust them consistently so the printed "hit vs miss" ratio measures draw+populate vs blit or rename both labels to avoid misleading results.

coderabbitai · 2026-04-01T12:44:54Z

crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs

+    const W: i32 = 1000;
+    const H: i32 = 1000;
+    const WARMUP: u32 = 10;
+    const ITERS: u32 = 50;


⚠️ Potential issue | 🟠 Major

The largest samples overflow the benchmark surface, so the area model is wrong.

With W/H = 1000, the 2000 and 4000 cases are clipped by the render target while the later analysis still treats them as 4M and 16M visible pixels. That invalidates the upper end of the linearity table and any fill-rate / C_fixed + area × C_per_pixel numbers derived from those samples. Build the surface per test size, or make the benchmark surface at least as large as the maximum size before using size² as the ground-truth area.

Also applies to: 49-57, 366-367

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs` around lines 26 - 29, The benchmark currently uses fixed constants W and H = 1000 so large test sizes (e.g., 2000, 4000) are clipped and the area model is invalid; change the benchmark to either (a) create the Skia render surface for each test using the current test size (use the loop variable that represents sample size — e.g., the size or sample_size used in the benchmark) so the surface dimensions match the test, or (b) set W and H to the maximum test dimension (the largest sample size) before running all tests so size*size equals the true visible pixels; update surface construction code (where the Skia surface/context is created) rather than relying on the fixed W/H constants.

coderabbitai · 2026-04-01T12:44:54Z

crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs

+
+    println!();
+    println!("═══════════════════════════════════════════════════════════════════════════════════════════════════");
+    println!("  SECTION 1: Frame Time vs. Node Count (unstable frames, full Renderer pipeline)");


⚠️ Potential issue | 🟡 Minor

Section label says “unstable” but the benchmark runs stable frames.

Line 267 should match the actual run mode (queue_stable + stable-frame comments) to avoid misleading report consumers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs` at line 267, The printed section label is incorrect: change the string in the println call that currently reads "SECTION 1: Frame Time vs. Node Count (unstable frames, full Renderer pipeline)" to indicate stable frames instead (e.g., "SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer pipeline)") so it matches the actual run mode using queue_stable and the surrounding stable-frame comments in skia_bench_scene_scale.rs; update any nearby comments if needed to keep labels consistent with the queue_stable execution path.

coderabbitai · 2026-04-01T12:44:54Z

crates/grida-canvas/src/runtime/scene.rs

+    /// Picture cache generation + variant key at the time of the last
+    /// successful prefill. When the cache generation and variant key
+    /// match, the prefill loop can be skipped entirely — all pictures
+    /// are already cached from a previous frame.
+    last_prefill_generation: u64,
+    last_prefill_variant_key: u64,
+    last_prefill_layer_count: usize,


⚠️ Potential issue | 🟠 Major

Track a fingerprint of the visible nodes, not just layer_count.

generation + variant_key + layer_count is not enough to prove this plan is warm. A pan from one 5k-layer window to another 5k-layer window with no picture-cache mutations will hit this fast path even if none of the newly visible nodes have been recorded yet, so prefill gets skipped on a cold frame. Including a stable fingerprint of the visible ids/indices in the tracking key would make the skip condition sound again.

Also applies to: 401-429, 480-489

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/src/runtime/scene.rs` around lines 366 - 372, The current warm-path check for skipping prefill uses last_prefill_generation, last_prefill_variant_key and last_prefill_layer_count but needs a stable fingerprint of the actually visible node ids/indices to avoid skipping when the visible window changes; add a new field (e.g. last_prefill_visible_fingerprint: u64) to the Scene state and compute a stable hash/fingerprint of the set/sequence of visible node ids or indices at the time of prefill (use a deterministic order), then include that fingerprint in every warm-path comparison where last_prefill_generation/variant_key/layer_count are compared (including the other affected blocks around the indicated ranges) so the prefill is only skipped when generation, variant key, layer count AND visible-fingerprint all match. Ensure the fingerprint is updated whenever prefill runs and used in the skip decision.

coderabbitai · 2026-04-01T12:44:54Z

crates/grida-canvas/src/runtime/scene.rs

                        compositor_indices: Vec::new(),
                        display_list_duration: Duration::ZERO,
                        display_list_size_estimated: 0,
+                        predicted_cost_us: 0.0,


⚠️ Potential issue | 🟡 Minor

Zero the metric on render_frame() cache hits too.

These assignments only cover the deferred-plan path. flush_with_plan() goes straight through render_frame(), and its pan/zoom cache-hit returns still hand back the original plan unchanged, so the overlay can show a full-draw prediction on an O(1) blit frame.

🔧 Minimal follow-up

- return FrameFlushStats { - frame: plan, + let mut frame = plan.clone(); + frame.predicted_cost_us = 0.0; + return FrameFlushStats { + frame,

Apply the same adjustment in both cache-hit return branches inside Renderer::render_frame.

Also applies to: 1186-1186, 1204-1204

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/src/runtime/scene.rs` at line 1128, Renderer::render_frame currently zeroes predicted_cost_us only on the deferred-plan path; update both cache-hit return branches inside Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same metric reset you applied in flush_with_plan), ensuring any cache-hit early returns return a plan/overlay with predicted_cost_us = 0.0; locate the two cache-hit return points in render_frame and add the same assignment to the returned Plan/Overlay data structure.

coderabbitai · 2026-04-01T12:44:55Z

docs/wg/feat-2d/render-cost-prediction.md

 ```
-save_layer_cost = layer_bounds_area × zoom² × 2  (write to offscreen + read back)
+σ ≤ 4.0 and small kernel  →  single 2D convolution pass (≤28 samples)
+σ ≤ 4.0                   →  two separable 1D passes
+σ > 4.0                   →  downsample until σ ≤ 4.0, blur, upsample (recursive)
 ```


⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks (MD040).

Three fenced blocks are missing language tags, which is currently flagged by markdownlint.

🛠️ Minimal fix

-``` +```text σ ≤ 4.0 and small kernel → single 2D convolution pass (≤28 samples) σ ≤ 4.0 → two separable 1D passes σ > 4.0 → downsample until σ ≤ 4.0, blur, upsample (recursive)

- +text
frame_cost ≈ Σ visible_nodes(
if cache_hit: ~5 µs
if cache_miss: C_fixed(effect_type)
)

-``` +```text save_layer_overhead_us = measured via single save_layer + draw + restore pixels_per_ms = measured via full-screen solid rect

</details> Also applies to: 246-251, 263-266 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.0)</summary> [warning] 103-103: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/wg/feat-2d/render-cost-prediction.md around lines 103 - 107, The
markdown has three fenced code blocks missing language identifiers (MD040); add
a language tag such as "text" to each opening triple-backtick for the blocks
containing the lines starting with "σ ≤ 4.0 and small kernel → single 2D
convolution pass (≤28 samples)", the block starting with "frame_cost ≈ Σ
visible_nodes(", and the block starting with "save_layer_overhead_us = measured
via single save_layer + draw + restore" (also apply the same fix to the other
occurrences noted around lines 246-251 and 263-266): change totext for
each of those fenced code blocks so markdownlint no longer flags MD040.

</details>  

softmarshmallow and others added 4 commits March 31, 2026 02:15

style(cg): cargo fmt – fix formatting in scene.rs

491413e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #618 from gridaco/feature/stupefied-mendeleev

855e48c

perf(cg): skip prefill loop on cache-warm frames via generation tracking

vercel bot deployed to Preview – blog April 1, 2026 12:09 View deployment

vercel bot deployed to Preview – viewer April 1, 2026 12:10 View deployment

Merge remote-tracking branch 'origin/main' into canary

6392699

vercel bot deployed to Preview – docs April 1, 2026 12:11 View deployment

vercel bot deployed to Preview – grida April 1, 2026 12:11 View deployment

vercel bot deployed to Preview – blog April 1, 2026 12:13 View deployment

vercel bot deployed to Preview – viewer April 1, 2026 12:14 View deployment

vercel bot deployed to Preview – docs April 1, 2026 12:14 View deployment

vercel bot deployed to Preview – grida April 1, 2026 12:15 View deployment

softmarshmallow changed the title ~~Canary~~ perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks Apr 1, 2026

softmarshmallow added the performance label Apr 1, 2026

style(cg): cargo fmt

32e43fe

vercel bot temporarily deployed to Preview – backgrounds April 1, 2026 12:24 Inactive

vercel bot temporarily deployed to Preview – blog April 1, 2026 12:24 Inactive

vercel bot temporarily deployed to Preview – grida April 1, 2026 12:24 Inactive

vercel bot temporarily deployed to Preview – viewer April 1, 2026 12:24 Inactive

vercel bot deployed to Preview – docs April 1, 2026 12:25 View deployment

softmarshmallow marked this pull request as ready for review April 1, 2026 12:33

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

softmarshmallow merged commit 051c321 into main Apr 1, 2026
14 checks passed

coderabbitai bot mentioned this pull request Apr 2, 2026

feat(canvas): add MarkdownNode with direct Skia rendering #620

Merged

4 tasks

This was referenced Apr 4, 2026

feat(canvas): HTML+CSS embed renderer with Chromium-aligned architecture #622

Merged

perf(canvas): AA toggle, GPU sync for benchmarks, AA cost findings #633

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619

perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619
softmarshmallow merged 8 commits intomainfrom
canary

softmarshmallow commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

coderabbitai bot Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

softmarshmallow commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured improvements (01-135k.perf.grida)

Changed files

Summary by CodeRabbit

Release Notes

Uh oh!

vercel bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

softmarshmallow commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Apr 1, 2026 •

edited

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading