Skip to content

perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619

Merged
softmarshmallow merged 8 commits intomainfrom
canary
Apr 1, 2026
Merged

perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619
softmarshmallow merged 8 commits intomainfrom
canary

Conversation

@softmarshmallow
Copy link
Copy Markdown
Member

@softmarshmallow softmarshmallow commented Apr 1, 2026

Summary

  • perf(cg): Skip the O(N) prefill loop on cache-warm frames via a generation counter on PictureCache. On 135K-node scenes, eliminates ~800µs of HashMap lookups per warm frame (p50 pan: 111µs → 76µs, -32%).
  • refactor(cg): Extract render cost prediction into an isolated runtime::cost_prediction module. Computes predicted_cost_us in FramePlan and displays it in the devtools overlay (pred: Xµs (Y×)) — instrumentation only, no behavioral changes.
  • docs(cg): Add render-cost-prediction.md reference with measured GPU operation costs, Skia blur analysis, blend mode tiers, and cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/HEURISTIC.
  • bench(cg): Add three Skia validation benchmarks — skia_bench_cost_model (per-effect cost), skia_bench_cache_blit (cache hit vs miss ratio), skia_bench_scene_scale (full pipeline at 1K–136K nodes).
  • feat(tests): Add box-margin.html fixture demonstrating CSS margin collapse, negative margins, and auto margins.

Measured improvements (01-135k.perf.grida)

Metric Before After Change
rt_pan_fast_fit p50 111µs 76µs -32%
rt_pan_fast_fit p95 263µs 153µs -42%
pan_settle_slow_fit settle 1034µs 709µs -31%
Criterion large_baseline/pan -14.0% (p < 0.01)

Changed files

  • crates/grida-canvas/src/cache/picture.rs — generation counter on PictureCache
  • crates/grida-canvas/src/runtime/cost_prediction.rs — new cost prediction module
  • crates/grida-canvas/src/runtime/scene.rs — prefill skip logic, cost prediction integration
  • crates/grida-canvas/src/window/application.rs — devtools overlay for predicted cost
  • crates/grida-canvas/examples/skia_bench/ — three new validation benchmarks
  • docs/wg/feat-2d/render-cost-prediction.md — reference document
  • docs/wg/feat-2d/optimization.md — updated optimization notes
  • fixtures/test-html/L0/box-margin.html — CSS margin test fixture

Summary by CodeRabbit

Release Notes

  • New Features

    • Added rendering cost prediction model to estimate GPU operation overhead.
    • Added GPU benchmarking tools for performance profiling.
  • Performance Improvements

    • Optimized frame rendering by caching predictions and skipping redundant prefill operations.
    • Updated developer stats overlay to display predicted rendering costs alongside frame metrics.
  • Documentation

    • Added optimization and cost prediction methodology documentation.

softmarshmallow and others added 4 commits March 31, 2026 02:15
Add a monotonically increasing generation counter to PictureCache that
increments on any mutation (insert, invalidate). The prefill loop now
compares the current generation, variant key, and layer count against
the last successful prefill — when all match, the entire O(N) iteration
is skipped in O(1).

On 135K-node scenes, this eliminates ~800µs of HashMap lookups per
cache-warm frame. Measured improvements on 01-135k.perf.grida:
- rt_pan_fast_fit p50: 111µs → 76µs (-32%)
- rt_pan_fast_fit p95: 263µs → 153µs (-42%)
- pan_settle_slow_fit settle: 1034µs → 709µs (-31%)
- Criterion large_baseline/pan: -14.0% (p < 0.01)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(cg): skip prefill loop on cache-warm frames via generation tracking
Add reference document (render-cost-prediction.md) with measured GPU
operation costs, Skia blur algorithm analysis, blend mode tiers, and
cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/
HEURISTIC. Key finding: fixed per-operation overhead (FBO allocation,
~11-110µs) dominates over pixel-proportional cost at typical node sizes.

Add three validation benchmarks:
- skia_bench_cost_model: per-effect cost at 50²–4000², linearity, blur
  radius dependence, two-component formula extraction
- skia_bench_cache_blit: cache hit vs miss ratio (~0.05×), blit constancy
- skia_bench_scene_scale: full Renderer pipeline at 1K–136K nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 1, 2026 0:25am
6 Skipped Deployments
Project Deployment Actions Updated (UTC)
code Ignored Ignored Apr 1, 2026 0:25am
legacy Ignored Ignored Apr 1, 2026 0:25am
backgrounds Skipped Skipped Apr 1, 2026 0:25am
blog Skipped Skipped Apr 1, 2026 0:25am
grida Skipped Skipped Apr 1, 2026 0:25am
viewer Skipped Skipped Apr 1, 2026 0:25am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Walkthrough

The PR introduces a GPU rendering cost prediction framework with per-node fixed overhead estimation, picture cache generation tracking to skip redundant prefill iterations, three GPU benchmark examples measuring Skia performance under varying conditions, and supporting documentation describing the cost model and cache optimization.

Changes

Cohort / File(s) Summary
Build Configuration
crates/grida-canvas/Cargo.toml
Added three new example targets (skia_bench_cost_model, skia_bench_cache_blit, skia_bench_scene_scale) gated behind native-gl-context feature.
GPU Benchmark Examples
crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs, ...skia_bench_cache_blit.rs, ...skia_bench_scene_scale.rs
Added three comprehensive Skia GPU benchmarks measuring cost modeling (variant/effect overhead, linearity, blur scaling), cache blit performance (hit/miss ratios), and full-scene rendering cost under increasing node counts.
Cache Generation Tracking
crates/grida-canvas/src/cache/picture.rs
Added generation: u64 counter field incremented on cache mutations/invalidations, exposed via generation() getter, and added variant_store_is_empty() query method.
Cost Prediction Module
crates/grida-canvas/src/runtime/cost_prediction.rs, crates/grida-canvas/src/runtime/mod.rs
Introduced new cost_prediction module with blur_cost_us() and estimate_node_cost() computing per-node fixed overhead based on effects (blur, shadows, backdrop blur, glass), blend/opacity isolation, and cache state.
Frame Planning and Prefill Optimization
crates/grida-canvas/src/runtime/scene.rs
Added predicted_cost_us: f64 to FramePlan; implemented prefill-skip fast path using generation/variant-key/layer-count tracking to avoid O(N) iteration when cache state is unchanged; prefill now populates predicted cost by summing estimate_node_cost() across live and promoted nodes.
Devtools Overlay
crates/grida-canvas/src/window/application.rs
Updated frame-stats overlay to display predicted rendering cost (pred: <cost>µs (<ratio>×)) alongside measured frame duration ratio.
Documentation
docs/wg/feat-2d/optimization.md, docs/wg/feat-2d/render-cost-prediction.md
Added new doc on "Picture Cache Prefill Skip (Generation Tracking)" optimization; rewrote cost-prediction doc from pixel fill-rate model to overhead-centric model with per-operation fixed costs, revised blur scaling, and per-visible-node frame cost formula.
Test Fixture
fixtures/test-html/L0/box-margin.html
Added standalone HTML page demonstrating CSS margin behaviors (collapsing, auto-centering, inline handling, parent–child collapse).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

cg, canvas

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively covers the three main changes: render cost prediction module, prefill skip optimization, and Skia benchmarks, directly matching the PR objectives and changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch canary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Move render cost estimation into isolated `runtime::cost_prediction`
module. Compute `predicted_cost_us` in FramePlan and display in devtools
overlay as `pred: Xµs (Y×)` for correlation against actual frame times.

This is instrumentation only — no behavioral changes to pan/zoom cache
blits, stable/unstable promotion, downscale rendering, or effect quality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This new HTML fixture illustrates various CSS margin behaviors, including margin collapse, negative margins, and auto margins. It provides visual examples and descriptions to aid understanding of how margins interact in different scenarios, enhancing the testing framework for CSS-related features.
@vercel vercel bot temporarily deployed to Preview – backgrounds April 1, 2026 12:24 Inactive
@vercel vercel bot temporarily deployed to Preview – blog April 1, 2026 12:24 Inactive
@vercel vercel bot temporarily deployed to Preview – grida April 1, 2026 12:24 Inactive
@vercel vercel bot temporarily deployed to Preview – viewer April 1, 2026 12:24 Inactive
@softmarshmallow softmarshmallow marked this pull request as ready for review April 1, 2026 12:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32e43feba3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +424 to +427
if current_gen == self.last_prefill_generation
&& effective_key_for_tracking == self.last_prefill_variant_key
&& layer_count == self.last_prefill_layer_count
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Track visible-node identity before skipping prefill

The new fast-path returns when cache generation, variant key, and layer_count match, but layer_count does not guarantee the same visible nodes. During a pan over a uniform-density scene, the viewport can switch to a different set of nodes with the same count, so this guard skips prefill even though those nodes have no recorded pictures; since the draw fallback renders misses directly (without inserting into scene_cache.picture), the misses can persist across frames until an unrelated cache mutation occurs, causing sustained cache-hit collapse and interaction slowdowns.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
crates/grida-canvas/src/cache/picture.rs (1)

121-124: Avoid bumping generation on no-op node invalidation.

invalidate_node currently increments generation even when the node is absent from both stores, which can unnecessarily disable the prefill-skip fast path on subsequent frames.

♻️ Suggested refinement
 pub fn invalidate_node(&mut self, id: NodeId) {
-    self.default_store.remove(&id);
-    self.variant_store.retain(|&(nid, _), _| nid != id);
-    self.generation = self.generation.wrapping_add(1);
+    let mut changed = self.default_store.remove(&id).is_some();
+    let before = self.variant_store.len();
+    self.variant_store.retain(|&(nid, _), _| nid != id);
+    changed |= self.variant_store.len() != before;
+    if changed {
+        self.generation = self.generation.wrapping_add(1);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/cache/picture.rs` around lines 121 - 124,
invalidate_node is incrementing self.generation even when no entry is removed;
change it so generation is only bumped when the call actually removes something.
Use the return value of self.default_store.remove(&id) (Option) and check
whether self.variant_store changed (e.g., capture its length before and after
retain or use a drain_filter/retain pattern that records removals) and only call
self.generation = self.generation.wrapping_add(1) when either store had entries
removed; reference the invalidate_node method and the fields default_store,
variant_store, and generation.
crates/grida-canvas/src/runtime/cost_prediction.rs (1)

116-137: Noise effects are currently unaccounted in node-cost estimation.

The estimator sums several effect classes but omits LayerEffects.noises, which can materially underpredict mixed-effect scenes and reduce overlay/model trust.

Also applies to: 152-153

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/runtime/cost_prediction.rs` around lines 116 - 137,
The cost estimator omits noise effects from LayerEffects; update the runtime
cost calculation to include effects.noises similar to the shadows block: iterate
over &effects.noises, match the noise variant (e.g., FilterNoiseEffect::Grain /
other variants) and when a noise entry is active add the appropriate noise cost
constant (e.g., COST_NOISE_US) possibly combined with any parameter-based cost
(max with a computed noise_cost function or blur_cost_us if applicable) to the
cost accumulator; apply the same inclusion at the other occurrence noted (around
the later block at 152-153) so noises are considered in both places.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs`:
- Around line 197-203: The benchmark currently records miss_us by timing only
bench_draw (a direct draw) then separately calls capture_to_image to populate
the cache, so the reported "miss" time omits the populate cost; update the miss
measurement to include the offscreen render + snapshot by timing the combined
operations (call bench_draw and capture_to_image inside the same timed block
that sets miss_us), or alternatively rename the output/columns to reflect that
miss_us is actually "draw_only"; locate the timing variables around bench_draw,
capture_to_image, and cached_image (and the equivalent block later around the
second benchmark) and adjust them consistently so the printed "hit vs miss"
ratio measures draw+populate vs blit or rename both labels to avoid misleading
results.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs`:
- Around line 26-29: The benchmark currently uses fixed constants W and H = 1000
so large test sizes (e.g., 2000, 4000) are clipped and the area model is
invalid; change the benchmark to either (a) create the Skia render surface for
each test using the current test size (use the loop variable that represents
sample size — e.g., the size or sample_size used in the benchmark) so the
surface dimensions match the test, or (b) set W and H to the maximum test
dimension (the largest sample size) before running all tests so size*size equals
the true visible pixels; update surface construction code (where the Skia
surface/context is created) rather than relying on the fixed W/H constants.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs`:
- Line 267: The printed section label is incorrect: change the string in the
println call that currently reads "SECTION 1: Frame Time vs. Node Count
(unstable frames, full Renderer pipeline)" to indicate stable frames instead
(e.g., "SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer
pipeline)") so it matches the actual run mode using queue_stable and the
surrounding stable-frame comments in skia_bench_scene_scale.rs; update any
nearby comments if needed to keep labels consistent with the queue_stable
execution path.

In `@crates/grida-canvas/src/runtime/scene.rs`:
- Line 1128: Renderer::render_frame currently zeroes predicted_cost_us only on
the deferred-plan path; update both cache-hit return branches inside
Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same
metric reset you applied in flush_with_plan), ensuring any cache-hit early
returns return a plan/overlay with predicted_cost_us = 0.0; locate the two
cache-hit return points in render_frame and add the same assignment to the
returned Plan/Overlay data structure.
- Around line 366-372: The current warm-path check for skipping prefill uses
last_prefill_generation, last_prefill_variant_key and last_prefill_layer_count
but needs a stable fingerprint of the actually visible node ids/indices to avoid
skipping when the visible window changes; add a new field (e.g.
last_prefill_visible_fingerprint: u64) to the Scene state and compute a stable
hash/fingerprint of the set/sequence of visible node ids or indices at the time
of prefill (use a deterministic order), then include that fingerprint in every
warm-path comparison where last_prefill_generation/variant_key/layer_count are
compared (including the other affected blocks around the indicated ranges) so
the prefill is only skipped when generation, variant key, layer count AND
visible-fingerprint all match. Ensure the fingerprint is updated whenever
prefill runs and used in the skip decision.

In `@docs/wg/feat-2d/render-cost-prediction.md`:
- Around line 103-107: The markdown has three fenced code blocks missing
language identifiers (MD040); add a language tag such as "text" to each opening
triple-backtick for the blocks containing the lines starting with "σ ≤ 4.0 and
small kernel  →  single 2D convolution pass (≤28 samples)", the block starting
with "frame_cost ≈ Σ visible_nodes(", and the block starting with
"save_layer_overhead_us  = measured via single save_layer + draw + restore"
(also apply the same fix to the other occurrences noted around lines 246-251 and
263-266): change ``` to ```text for each of those fenced code blocks so
markdownlint no longer flags MD040.

---

Nitpick comments:
In `@crates/grida-canvas/src/cache/picture.rs`:
- Around line 121-124: invalidate_node is incrementing self.generation even when
no entry is removed; change it so generation is only bumped when the call
actually removes something. Use the return value of
self.default_store.remove(&id) (Option) and check whether self.variant_store
changed (e.g., capture its length before and after retain or use a
drain_filter/retain pattern that records removals) and only call self.generation
= self.generation.wrapping_add(1) when either store had entries removed;
reference the invalidate_node method and the fields default_store,
variant_store, and generation.

In `@crates/grida-canvas/src/runtime/cost_prediction.rs`:
- Around line 116-137: The cost estimator omits noise effects from LayerEffects;
update the runtime cost calculation to include effects.noises similar to the
shadows block: iterate over &effects.noises, match the noise variant (e.g.,
FilterNoiseEffect::Grain / other variants) and when a noise entry is active add
the appropriate noise cost constant (e.g., COST_NOISE_US) possibly combined with
any parameter-based cost (max with a computed noise_cost function or
blur_cost_us if applicable) to the cost accumulator; apply the same inclusion at
the other occurrence noted (around the later block at 152-153) so noises are
considered in both places.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cccc6f1c-9096-4695-8808-a8951c515a17

📥 Commits

Reviewing files that changed from the base of the PR and between 2ababaf and 32e43fe.

📒 Files selected for processing (12)
  • crates/grida-canvas/Cargo.toml
  • crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs
  • crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs
  • crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs
  • crates/grida-canvas/src/cache/picture.rs
  • crates/grida-canvas/src/runtime/cost_prediction.rs
  • crates/grida-canvas/src/runtime/mod.rs
  • crates/grida-canvas/src/runtime/scene.rs
  • crates/grida-canvas/src/window/application.rs
  • docs/wg/feat-2d/optimization.md
  • docs/wg/feat-2d/render-cost-prediction.md
  • fixtures/test-html/L0/box-margin.html

Comment on lines +197 to +203
// Cache miss: full rasterize
let miss_us = bench_draw(surface, &|canvas| {
(effect.draw)(canvas, dst_rect);
});

// Capture to GPU texture
let cached_image = capture_to_image(surface, size, &*effect.draw);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Miss(µs) is timing a direct draw, not a cache miss.

miss_us is recorded before the offscreen render + snapshot needed to populate the cache. That means the printed ratio is comparing blit vs draw, not blit vs miss + populate, so the "cache hit vs. miss" numbers are systematically optimistic. Either time the populate step inside the miss path or rename the benchmark/columns to reflect what is actually being measured.

Also applies to: 210-215

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs` around
lines 197 - 203, The benchmark currently records miss_us by timing only
bench_draw (a direct draw) then separately calls capture_to_image to populate
the cache, so the reported "miss" time omits the populate cost; update the miss
measurement to include the offscreen render + snapshot by timing the combined
operations (call bench_draw and capture_to_image inside the same timed block
that sets miss_us), or alternatively rename the output/columns to reflect that
miss_us is actually "draw_only"; locate the timing variables around bench_draw,
capture_to_image, and cached_image (and the equivalent block later around the
second benchmark) and adjust them consistently so the printed "hit vs miss"
ratio measures draw+populate vs blit or rename both labels to avoid misleading
results.

Comment on lines +26 to +29
const W: i32 = 1000;
const H: i32 = 1000;
const WARMUP: u32 = 10;
const ITERS: u32 = 50;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The largest samples overflow the benchmark surface, so the area model is wrong.

With W/H = 1000, the 2000 and 4000 cases are clipped by the render target while the later analysis still treats them as 4M and 16M visible pixels. That invalidates the upper end of the linearity table and any fill-rate / C_fixed + area × C_per_pixel numbers derived from those samples. Build the surface per test size, or make the benchmark surface at least as large as the maximum size before using size² as the ground-truth area.

Also applies to: 49-57, 366-367

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs` around
lines 26 - 29, The benchmark currently uses fixed constants W and H = 1000 so
large test sizes (e.g., 2000, 4000) are clipped and the area model is invalid;
change the benchmark to either (a) create the Skia render surface for each test
using the current test size (use the loop variable that represents sample size —
e.g., the size or sample_size used in the benchmark) so the surface dimensions
match the test, or (b) set W and H to the maximum test dimension (the largest
sample size) before running all tests so size*size equals the true visible
pixels; update surface construction code (where the Skia surface/context is
created) rather than relying on the fixed W/H constants.


println!();
println!("═══════════════════════════════════════════════════════════════════════════════════════════════════");
println!(" SECTION 1: Frame Time vs. Node Count (unstable frames, full Renderer pipeline)");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Section label says “unstable” but the benchmark runs stable frames.

Line 267 should match the actual run mode (queue_stable + stable-frame comments) to avoid misleading report consumers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs` at line
267, The printed section label is incorrect: change the string in the println
call that currently reads "SECTION 1: Frame Time vs. Node Count (unstable
frames, full Renderer pipeline)" to indicate stable frames instead (e.g.,
"SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer pipeline)")
so it matches the actual run mode using queue_stable and the surrounding
stable-frame comments in skia_bench_scene_scale.rs; update any nearby comments
if needed to keep labels consistent with the queue_stable execution path.

Comment on lines +366 to +372
/// Picture cache generation + variant key at the time of the last
/// successful prefill. When the cache generation and variant key
/// match, the prefill loop can be skipped entirely — all pictures
/// are already cached from a previous frame.
last_prefill_generation: u64,
last_prefill_variant_key: u64,
last_prefill_layer_count: usize,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Track a fingerprint of the visible nodes, not just layer_count.

generation + variant_key + layer_count is not enough to prove this plan is warm. A pan from one 5k-layer window to another 5k-layer window with no picture-cache mutations will hit this fast path even if none of the newly visible nodes have been recorded yet, so prefill gets skipped on a cold frame. Including a stable fingerprint of the visible ids/indices in the tracking key would make the skip condition sound again.

Also applies to: 401-429, 480-489

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/runtime/scene.rs` around lines 366 - 372, The current
warm-path check for skipping prefill uses last_prefill_generation,
last_prefill_variant_key and last_prefill_layer_count but needs a stable
fingerprint of the actually visible node ids/indices to avoid skipping when the
visible window changes; add a new field (e.g. last_prefill_visible_fingerprint:
u64) to the Scene state and compute a stable hash/fingerprint of the
set/sequence of visible node ids or indices at the time of prefill (use a
deterministic order), then include that fingerprint in every warm-path
comparison where last_prefill_generation/variant_key/layer_count are compared
(including the other affected blocks around the indicated ranges) so the prefill
is only skipped when generation, variant key, layer count AND
visible-fingerprint all match. Ensure the fingerprint is updated whenever
prefill runs and used in the skip decision.

compositor_indices: Vec::new(),
display_list_duration: Duration::ZERO,
display_list_size_estimated: 0,
predicted_cost_us: 0.0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Zero the metric on render_frame() cache hits too.

These assignments only cover the deferred-plan path. flush_with_plan() goes straight through render_frame(), and its pan/zoom cache-hit returns still hand back the original plan unchanged, so the overlay can show a full-draw prediction on an O(1) blit frame.

🔧 Minimal follow-up
-                    return FrameFlushStats {
-                        frame: plan,
+                    let mut frame = plan.clone();
+                    frame.predicted_cost_us = 0.0;
+                    return FrameFlushStats {
+                        frame,

Apply the same adjustment in both cache-hit return branches inside Renderer::render_frame.

Also applies to: 1186-1186, 1204-1204

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grida-canvas/src/runtime/scene.rs` at line 1128,
Renderer::render_frame currently zeroes predicted_cost_us only on the
deferred-plan path; update both cache-hit return branches inside
Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same
metric reset you applied in flush_with_plan), ensuring any cache-hit early
returns return a plan/overlay with predicted_cost_us = 0.0; locate the two
cache-hit return points in render_frame and add the same assignment to the
returned Plan/Overlay data structure.

Comment on lines 103 to 107
```
save_layer_cost = layer_bounds_area × zoom² × 2 (write to offscreen + read back)
σ ≤ 4.0 and small kernel → single 2D convolution pass (≤28 samples)
σ ≤ 4.0 → two separable 1D passes
σ > 4.0 → downsample until σ ≤ 4.0, blur, upsample (recursive)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks (MD040).

Three fenced blocks are missing language tags, which is currently flagged by markdownlint.

🛠️ Minimal fix
-```
+```text
 σ ≤ 4.0 and small kernel  →  single 2D convolution pass (≤28 samples)
 σ ≤ 4.0                   →  two separable 1D passes
 σ > 4.0                   →  downsample until σ ≤ 4.0, blur, upsample (recursive)

- +text
frame_cost ≈ Σ visible_nodes(
if cache_hit: ~5 µs
if cache_miss: C_fixed(effect_type)
)


-```
+```text
save_layer_overhead_us  = measured via single save_layer + draw + restore
pixels_per_ms           = measured via full-screen solid rect
</details>



Also applies to: 246-251, 263-266

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/wg/feat-2d/render-cost-prediction.md around lines 103 - 107, The
markdown has three fenced code blocks missing language identifiers (MD040); add
a language tag such as "text" to each opening triple-backtick for the blocks
containing the lines starting with "σ ≤ 4.0 and small kernel → single 2D
convolution pass (≤28 samples)", the block starting with "frame_cost ≈ Σ
visible_nodes(", and the block starting with "save_layer_overhead_us = measured
via single save_layer + draw + restore" (also apply the same fix to the other
occurrences noted around lines 246-251 and 263-266): change totext for
each of those fenced code blocks so markdownlint no longer flags MD040.


</details>

<!-- fingerprinting:phantom:poseidon:hawk:3e738bfb-ef61-4909-9c79-7fe710fc08bd -->

<!-- This is an auto-generated comment by CodeRabbit -->

@softmarshmallow softmarshmallow merged commit 051c321 into main Apr 1, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant