perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619
perf(cg): render cost prediction module, prefill skip optimization, and Skia benchmarks#619softmarshmallow merged 8 commits intomainfrom
Conversation
Add a monotonically increasing generation counter to PictureCache that increments on any mutation (insert, invalidate). The prefill loop now compares the current generation, variant key, and layer count against the last successful prefill — when all match, the entire O(N) iteration is skipped in O(1). On 135K-node scenes, this eliminates ~800µs of HashMap lookups per cache-warm frame. Measured improvements on 01-135k.perf.grida: - rt_pan_fast_fit p50: 111µs → 76µs (-32%) - rt_pan_fast_fit p95: 263µs → 153µs (-42%) - pan_settle_slow_fit settle: 1034µs → 709µs (-31%) - Criterion large_baseline/pan: -14.0% (p < 0.01) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
perf(cg): skip prefill loop on cache-warm frames via generation tracking
Add reference document (render-cost-prediction.md) with measured GPU operation costs, Skia blur algorithm analysis, blend mode tiers, and cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/ HEURISTIC. Key finding: fixed per-operation overhead (FBO allocation, ~11-110µs) dominates over pixel-proportional cost at typical node sizes. Add three validation benchmarks: - skia_bench_cost_model: per-effect cost at 50²–4000², linearity, blur radius dependence, two-component formula extraction - skia_bench_cache_blit: cache hit vs miss ratio (~0.05×), blit constancy - skia_bench_scene_scale: full Renderer pipeline at 1K–136K nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughThe PR introduces a GPU rendering cost prediction framework with per-node fixed overhead estimation, picture cache generation tracking to skip redundant prefill iterations, three GPU benchmark examples measuring Skia performance under varying conditions, and supporting documentation describing the cost model and cache optimization. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move render cost estimation into isolated `runtime::cost_prediction` module. Compute `predicted_cost_us` in FramePlan and display in devtools overlay as `pred: Xµs (Y×)` for correlation against actual frame times. This is instrumentation only — no behavioral changes to pan/zoom cache blits, stable/unstable promotion, downscale rendering, or effect quality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This new HTML fixture illustrates various CSS margin behaviors, including margin collapse, negative margins, and auto margins. It provides visual examples and descriptions to aid understanding of how margins interact in different scenarios, enhancing the testing framework for CSS-related features.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32e43feba3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if current_gen == self.last_prefill_generation | ||
| && effective_key_for_tracking == self.last_prefill_variant_key | ||
| && layer_count == self.last_prefill_layer_count | ||
| { |
There was a problem hiding this comment.
Track visible-node identity before skipping prefill
The new fast-path returns when cache generation, variant key, and layer_count match, but layer_count does not guarantee the same visible nodes. During a pan over a uniform-density scene, the viewport can switch to a different set of nodes with the same count, so this guard skips prefill even though those nodes have no recorded pictures; since the draw fallback renders misses directly (without inserting into scene_cache.picture), the misses can persist across frames until an unrelated cache mutation occurs, causing sustained cache-hit collapse and interaction slowdowns.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
crates/grida-canvas/src/cache/picture.rs (1)
121-124: Avoid bumping generation on no-op node invalidation.
invalidate_nodecurrently incrementsgenerationeven when the node is absent from both stores, which can unnecessarily disable the prefill-skip fast path on subsequent frames.♻️ Suggested refinement
pub fn invalidate_node(&mut self, id: NodeId) { - self.default_store.remove(&id); - self.variant_store.retain(|&(nid, _), _| nid != id); - self.generation = self.generation.wrapping_add(1); + let mut changed = self.default_store.remove(&id).is_some(); + let before = self.variant_store.len(); + self.variant_store.retain(|&(nid, _), _| nid != id); + changed |= self.variant_store.len() != before; + if changed { + self.generation = self.generation.wrapping_add(1); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/src/cache/picture.rs` around lines 121 - 124, invalidate_node is incrementing self.generation even when no entry is removed; change it so generation is only bumped when the call actually removes something. Use the return value of self.default_store.remove(&id) (Option) and check whether self.variant_store changed (e.g., capture its length before and after retain or use a drain_filter/retain pattern that records removals) and only call self.generation = self.generation.wrapping_add(1) when either store had entries removed; reference the invalidate_node method and the fields default_store, variant_store, and generation.crates/grida-canvas/src/runtime/cost_prediction.rs (1)
116-137: Noise effects are currently unaccounted in node-cost estimation.The estimator sums several effect classes but omits
LayerEffects.noises, which can materially underpredict mixed-effect scenes and reduce overlay/model trust.Also applies to: 152-153
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/grida-canvas/src/runtime/cost_prediction.rs` around lines 116 - 137, The cost estimator omits noise effects from LayerEffects; update the runtime cost calculation to include effects.noises similar to the shadows block: iterate over &effects.noises, match the noise variant (e.g., FilterNoiseEffect::Grain / other variants) and when a noise entry is active add the appropriate noise cost constant (e.g., COST_NOISE_US) possibly combined with any parameter-based cost (max with a computed noise_cost function or blur_cost_us if applicable) to the cost accumulator; apply the same inclusion at the other occurrence noted (around the later block at 152-153) so noises are considered in both places.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs`:
- Around line 197-203: The benchmark currently records miss_us by timing only
bench_draw (a direct draw) then separately calls capture_to_image to populate
the cache, so the reported "miss" time omits the populate cost; update the miss
measurement to include the offscreen render + snapshot by timing the combined
operations (call bench_draw and capture_to_image inside the same timed block
that sets miss_us), or alternatively rename the output/columns to reflect that
miss_us is actually "draw_only"; locate the timing variables around bench_draw,
capture_to_image, and cached_image (and the equivalent block later around the
second benchmark) and adjust them consistently so the printed "hit vs miss"
ratio measures draw+populate vs blit or rename both labels to avoid misleading
results.
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs`:
- Around line 26-29: The benchmark currently uses fixed constants W and H = 1000
so large test sizes (e.g., 2000, 4000) are clipped and the area model is
invalid; change the benchmark to either (a) create the Skia render surface for
each test using the current test size (use the loop variable that represents
sample size — e.g., the size or sample_size used in the benchmark) so the
surface dimensions match the test, or (b) set W and H to the maximum test
dimension (the largest sample size) before running all tests so size*size equals
the true visible pixels; update surface construction code (where the Skia
surface/context is created) rather than relying on the fixed W/H constants.
In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs`:
- Line 267: The printed section label is incorrect: change the string in the
println call that currently reads "SECTION 1: Frame Time vs. Node Count
(unstable frames, full Renderer pipeline)" to indicate stable frames instead
(e.g., "SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer
pipeline)") so it matches the actual run mode using queue_stable and the
surrounding stable-frame comments in skia_bench_scene_scale.rs; update any
nearby comments if needed to keep labels consistent with the queue_stable
execution path.
In `@crates/grida-canvas/src/runtime/scene.rs`:
- Line 1128: Renderer::render_frame currently zeroes predicted_cost_us only on
the deferred-plan path; update both cache-hit return branches inside
Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same
metric reset you applied in flush_with_plan), ensuring any cache-hit early
returns return a plan/overlay with predicted_cost_us = 0.0; locate the two
cache-hit return points in render_frame and add the same assignment to the
returned Plan/Overlay data structure.
- Around line 366-372: The current warm-path check for skipping prefill uses
last_prefill_generation, last_prefill_variant_key and last_prefill_layer_count
but needs a stable fingerprint of the actually visible node ids/indices to avoid
skipping when the visible window changes; add a new field (e.g.
last_prefill_visible_fingerprint: u64) to the Scene state and compute a stable
hash/fingerprint of the set/sequence of visible node ids or indices at the time
of prefill (use a deterministic order), then include that fingerprint in every
warm-path comparison where last_prefill_generation/variant_key/layer_count are
compared (including the other affected blocks around the indicated ranges) so
the prefill is only skipped when generation, variant key, layer count AND
visible-fingerprint all match. Ensure the fingerprint is updated whenever
prefill runs and used in the skip decision.
In `@docs/wg/feat-2d/render-cost-prediction.md`:
- Around line 103-107: The markdown has three fenced code blocks missing
language identifiers (MD040); add a language tag such as "text" to each opening
triple-backtick for the blocks containing the lines starting with "σ ≤ 4.0 and
small kernel → single 2D convolution pass (≤28 samples)", the block starting
with "frame_cost ≈ Σ visible_nodes(", and the block starting with
"save_layer_overhead_us = measured via single save_layer + draw + restore"
(also apply the same fix to the other occurrences noted around lines 246-251 and
263-266): change ``` to ```text for each of those fenced code blocks so
markdownlint no longer flags MD040.
---
Nitpick comments:
In `@crates/grida-canvas/src/cache/picture.rs`:
- Around line 121-124: invalidate_node is incrementing self.generation even when
no entry is removed; change it so generation is only bumped when the call
actually removes something. Use the return value of
self.default_store.remove(&id) (Option) and check whether self.variant_store
changed (e.g., capture its length before and after retain or use a
drain_filter/retain pattern that records removals) and only call self.generation
= self.generation.wrapping_add(1) when either store had entries removed;
reference the invalidate_node method and the fields default_store,
variant_store, and generation.
In `@crates/grida-canvas/src/runtime/cost_prediction.rs`:
- Around line 116-137: The cost estimator omits noise effects from LayerEffects;
update the runtime cost calculation to include effects.noises similar to the
shadows block: iterate over &effects.noises, match the noise variant (e.g.,
FilterNoiseEffect::Grain / other variants) and when a noise entry is active add
the appropriate noise cost constant (e.g., COST_NOISE_US) possibly combined with
any parameter-based cost (max with a computed noise_cost function or
blur_cost_us if applicable) to the cost accumulator; apply the same inclusion at
the other occurrence noted (around the later block at 152-153) so noises are
considered in both places.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cccc6f1c-9096-4695-8808-a8951c515a17
📒 Files selected for processing (12)
crates/grida-canvas/Cargo.tomlcrates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rscrates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rscrates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rscrates/grida-canvas/src/cache/picture.rscrates/grida-canvas/src/runtime/cost_prediction.rscrates/grida-canvas/src/runtime/mod.rscrates/grida-canvas/src/runtime/scene.rscrates/grida-canvas/src/window/application.rsdocs/wg/feat-2d/optimization.mddocs/wg/feat-2d/render-cost-prediction.mdfixtures/test-html/L0/box-margin.html
| // Cache miss: full rasterize | ||
| let miss_us = bench_draw(surface, &|canvas| { | ||
| (effect.draw)(canvas, dst_rect); | ||
| }); | ||
|
|
||
| // Capture to GPU texture | ||
| let cached_image = capture_to_image(surface, size, &*effect.draw); |
There was a problem hiding this comment.
Miss(µs) is timing a direct draw, not a cache miss.
miss_us is recorded before the offscreen render + snapshot needed to populate the cache. That means the printed ratio is comparing blit vs draw, not blit vs miss + populate, so the "cache hit vs. miss" numbers are systematically optimistic. Either time the populate step inside the miss path or rename the benchmark/columns to reflect what is actually being measured.
Also applies to: 210-215
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cache_blit.rs` around
lines 197 - 203, The benchmark currently records miss_us by timing only
bench_draw (a direct draw) then separately calls capture_to_image to populate
the cache, so the reported "miss" time omits the populate cost; update the miss
measurement to include the offscreen render + snapshot by timing the combined
operations (call bench_draw and capture_to_image inside the same timed block
that sets miss_us), or alternatively rename the output/columns to reflect that
miss_us is actually "draw_only"; locate the timing variables around bench_draw,
capture_to_image, and cached_image (and the equivalent block later around the
second benchmark) and adjust them consistently so the printed "hit vs miss"
ratio measures draw+populate vs blit or rename both labels to avoid misleading
results.
| const W: i32 = 1000; | ||
| const H: i32 = 1000; | ||
| const WARMUP: u32 = 10; | ||
| const ITERS: u32 = 50; |
There was a problem hiding this comment.
The largest samples overflow the benchmark surface, so the area model is wrong.
With W/H = 1000, the 2000 and 4000 cases are clipped by the render target while the later analysis still treats them as 4M and 16M visible pixels. That invalidates the upper end of the linearity table and any fill-rate / C_fixed + area × C_per_pixel numbers derived from those samples. Build the surface per test size, or make the benchmark surface at least as large as the maximum size before using size² as the ground-truth area.
Also applies to: 49-57, 366-367
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grida-canvas/examples/skia_bench/skia_bench_cost_model.rs` around
lines 26 - 29, The benchmark currently uses fixed constants W and H = 1000 so
large test sizes (e.g., 2000, 4000) are clipped and the area model is invalid;
change the benchmark to either (a) create the Skia render surface for each test
using the current test size (use the loop variable that represents sample size —
e.g., the size or sample_size used in the benchmark) so the surface dimensions
match the test, or (b) set W and H to the maximum test dimension (the largest
sample size) before running all tests so size*size equals the true visible
pixels; update surface construction code (where the Skia surface/context is
created) rather than relying on the fixed W/H constants.
|
|
||
| println!(); | ||
| println!("═══════════════════════════════════════════════════════════════════════════════════════════════════"); | ||
| println!(" SECTION 1: Frame Time vs. Node Count (unstable frames, full Renderer pipeline)"); |
There was a problem hiding this comment.
Section label says “unstable” but the benchmark runs stable frames.
Line 267 should match the actual run mode (queue_stable + stable-frame comments) to avoid misleading report consumers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grida-canvas/examples/skia_bench/skia_bench_scene_scale.rs` at line
267, The printed section label is incorrect: change the string in the println
call that currently reads "SECTION 1: Frame Time vs. Node Count (unstable
frames, full Renderer pipeline)" to indicate stable frames instead (e.g.,
"SECTION 1: Frame Time vs. Node Count (stable frames, full Renderer pipeline)")
so it matches the actual run mode using queue_stable and the surrounding
stable-frame comments in skia_bench_scene_scale.rs; update any nearby comments
if needed to keep labels consistent with the queue_stable execution path.
| /// Picture cache generation + variant key at the time of the last | ||
| /// successful prefill. When the cache generation and variant key | ||
| /// match, the prefill loop can be skipped entirely — all pictures | ||
| /// are already cached from a previous frame. | ||
| last_prefill_generation: u64, | ||
| last_prefill_variant_key: u64, | ||
| last_prefill_layer_count: usize, |
There was a problem hiding this comment.
Track a fingerprint of the visible nodes, not just layer_count.
generation + variant_key + layer_count is not enough to prove this plan is warm. A pan from one 5k-layer window to another 5k-layer window with no picture-cache mutations will hit this fast path even if none of the newly visible nodes have been recorded yet, so prefill gets skipped on a cold frame. Including a stable fingerprint of the visible ids/indices in the tracking key would make the skip condition sound again.
Also applies to: 401-429, 480-489
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grida-canvas/src/runtime/scene.rs` around lines 366 - 372, The current
warm-path check for skipping prefill uses last_prefill_generation,
last_prefill_variant_key and last_prefill_layer_count but needs a stable
fingerprint of the actually visible node ids/indices to avoid skipping when the
visible window changes; add a new field (e.g. last_prefill_visible_fingerprint:
u64) to the Scene state and compute a stable hash/fingerprint of the
set/sequence of visible node ids or indices at the time of prefill (use a
deterministic order), then include that fingerprint in every warm-path
comparison where last_prefill_generation/variant_key/layer_count are compared
(including the other affected blocks around the indicated ranges) so the prefill
is only skipped when generation, variant key, layer count AND
visible-fingerprint all match. Ensure the fingerprint is updated whenever
prefill runs and used in the skip decision.
| compositor_indices: Vec::new(), | ||
| display_list_duration: Duration::ZERO, | ||
| display_list_size_estimated: 0, | ||
| predicted_cost_us: 0.0, |
There was a problem hiding this comment.
Zero the metric on render_frame() cache hits too.
These assignments only cover the deferred-plan path. flush_with_plan() goes straight through render_frame(), and its pan/zoom cache-hit returns still hand back the original plan unchanged, so the overlay can show a full-draw prediction on an O(1) blit frame.
🔧 Minimal follow-up
- return FrameFlushStats {
- frame: plan,
+ let mut frame = plan.clone();
+ frame.predicted_cost_us = 0.0;
+ return FrameFlushStats {
+ frame,Apply the same adjustment in both cache-hit return branches inside Renderer::render_frame.
Also applies to: 1186-1186, 1204-1204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grida-canvas/src/runtime/scene.rs` at line 1128,
Renderer::render_frame currently zeroes predicted_cost_us only on the
deferred-plan path; update both cache-hit return branches inside
Renderer::render_frame so they also set predicted_cost_us to 0.0 (the same
metric reset you applied in flush_with_plan), ensuring any cache-hit early
returns return a plan/overlay with predicted_cost_us = 0.0; locate the two
cache-hit return points in render_frame and add the same assignment to the
returned Plan/Overlay data structure.
| ``` | ||
| save_layer_cost = layer_bounds_area × zoom² × 2 (write to offscreen + read back) | ||
| σ ≤ 4.0 and small kernel → single 2D convolution pass (≤28 samples) | ||
| σ ≤ 4.0 → two separable 1D passes | ||
| σ > 4.0 → downsample until σ ≤ 4.0, blur, upsample (recursive) | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks (MD040).
Three fenced blocks are missing language tags, which is currently flagged by markdownlint.
🛠️ Minimal fix
-```
+```text
σ ≤ 4.0 and small kernel → single 2D convolution pass (≤28 samples)
σ ≤ 4.0 → two separable 1D passes
σ > 4.0 → downsample until σ ≤ 4.0, blur, upsample (recursive)- +text
frame_cost ≈ Σ visible_nodes(
if cache_hit: ~5 µs
if cache_miss: C_fixed(effect_type)
)
-```
+```text
save_layer_overhead_us = measured via single save_layer + draw + restore
pixels_per_ms = measured via full-screen solid rect
</details>
Also applies to: 246-251, 263-266
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 103-103: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @docs/wg/feat-2d/render-cost-prediction.md around lines 103 - 107, The
markdown has three fenced code blocks missing language identifiers (MD040); add
a language tag such as "text" to each opening triple-backtick for the blocks
containing the lines starting with "σ ≤ 4.0 and small kernel → single 2D
convolution pass (≤28 samples)", the block starting with "frame_cost ≈ Σ
visible_nodes(", and the block starting with "save_layer_overhead_us = measured
via single save_layer + draw + restore" (also apply the same fix to the other
occurrences noted around lines 246-251 and 263-266): change totext for
each of those fenced code blocks so markdownlint no longer flags MD040.
</details>
<!-- fingerprinting:phantom:poseidon:hawk:3e738bfb-ef61-4909-9c79-7fe710fc08bd -->
<!-- This is an auto-generated comment by CodeRabbit -->
Summary
PictureCache. On 135K-node scenes, eliminates ~800µs of HashMap lookups per warm frame (p50 pan: 111µs → 76µs, -32%).runtime::cost_predictionmodule. Computespredicted_cost_usinFramePlanand displays it in the devtools overlay (pred: Xµs (Y×)) — instrumentation only, no behavioral changes.render-cost-prediction.mdreference with measured GPU operation costs, Skia blur analysis, blend mode tiers, and cache hit/miss ratios. All claims labeled as FACT/BENCHMARK/INFERENCE/HEURISTIC.skia_bench_cost_model(per-effect cost),skia_bench_cache_blit(cache hit vs miss ratio),skia_bench_scene_scale(full pipeline at 1K–136K nodes).box-margin.htmlfixture demonstrating CSS margin collapse, negative margins, and auto margins.Measured improvements (01-135k.perf.grida)
rt_pan_fast_fitp50rt_pan_fast_fitp95pan_settle_slow_fitsettlelarge_baseline/panChanged files
crates/grida-canvas/src/cache/picture.rs— generation counter on PictureCachecrates/grida-canvas/src/runtime/cost_prediction.rs— new cost prediction modulecrates/grida-canvas/src/runtime/scene.rs— prefill skip logic, cost prediction integrationcrates/grida-canvas/src/window/application.rs— devtools overlay for predicted costcrates/grida-canvas/examples/skia_bench/— three new validation benchmarksdocs/wg/feat-2d/render-cost-prediction.md— reference documentdocs/wg/feat-2d/optimization.md— updated optimization notesfixtures/test-html/L0/box-margin.html— CSS margin test fixtureSummary by CodeRabbit
Release Notes
New Features
Performance Improvements
Documentation