perf improvements#306
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance investigation pass on the WebGPU viewer. Ships measurable GPU-time reductions on the rendering path and adds a runtime diagnostic toolkit for ongoing thermal/perf debugging —
?perfoverlay with real GPU-time-per-frame measurement and?disable=...URL flags to ablate individual passes without code changes.What's in this PR
Rendering perf improvements (always-on)
shadow-radius: 3 → 2on the main directional light (lights.tsx). ~30% cheaper PCF sampling on every shadow-receiving fragment, visually near-identical.1.5 → 1.25on coarse-pointer devices (viewer/index.tsx). ~30% fewer fragments on every pass for phones/tablets; desktops keep1.5.groundShape→meshBasicMaterial(site-renderer.tsx). The ground fill is just the canvas background color used as a depth-buffer occluder — PBR + 16 lights +receiveShadowwas wasted work over almost the whole viewport. The previousmeshStandardMaterialblock is left as a commented reference.glassMaterial:MeshStandardNodeMaterial→MeshLambertNodeMaterial(lib/materials.ts). Glass doesn't need PBR specular/roughness/metalness; Lambert is a fraction of the cost and still responds to the existing theme-aware light tinting (so the glass shifts naturally between light/dark mode).castShadow+receiveShadowon the window root mesh (window-renderer.tsx). The root mesh has its material overridden to an invisible hitbox at runtime — it was casting shadows of an invisible box every frame.Diagnostic toolkit (
?perfand?disable=...)?perfoverlay — mounts an upgraded<PerfMonitor />showing FPS, real GPU ms per frame (avg + max), per-frame draw calls / triangles, dirty-node count, and a scene-graph breakdown by drawable type (MESH,LINE,SPRITE,LIGHT). (perf-monitor.tsx)GPU-time measurement via
device.queue.onSubmittedWorkDone()(lib/gpu-perf.ts). The customRenderPipeline.render()path bypasses three.js'strackTimestampinfrastructure, so timestamp queries can't see our work. Instead we time from CPU submit to GPU-done via the native WebGPU promise — accurate per-frame GPU duration with no instrumentation overhead. Samples are pushed frompost-processing.tsx(PostProcessing path) andDebugRenderer(raw path).info.autoReset = false+ explicitinfo.reset()per window inPerfMonitor. The customRenderPipeline.render()path doesn't trigger three.js's automatic per-frame info reset, soinfo.render.callsaccumulated across frames and the previous display showed lifetime totals. The overlay now shows true per-frame averages.?disable=...URL flags (post-processing.tsx,lights.tsx) — comma-separated subset of:ao— skip SSGI entirely (and denoise, since denoise has nothing to denoise)denoise— keep SSGI but feed raw noisy AO straight through (isolates denoise cost)outline— skip the merged-outline node and its 14 internal RTspostFx— bypass the wholeRenderPipelineand userenderer.render(scene, camera)directly (isolates raw scene-render cost from any post-FX overhead)shadows— skip the shadow-map render passEach flag prevents allocation + per-frame work for that stage, so device-temperature deltas across combos isolate which pass is the actual culprit. Picked up once at pipeline build — reload after changing the URL.
Other
item-renderer.tsxandnode-renderer.tsx— harmless, pure ordering changes.getMaterialForOriginalreturn type relaxed fromMeshStandardNodeMaterialtoMaterialso the glass case (now Lambert) typechecks.How to use the diagnostic toolkit
The deltas between these tell you exactly how much each pass costs. Reading the GPU number: at 50fps the budget is 20ms, so
gpuMs / 20≈ GPU utilization. Sustained 10ms+ is the "warm device" zone; <4ms is cool.Test plan
bun typecheckfrom private-editor root — green ✓?perf— expect a meaningful reduction on idle thermal load?perf&disable=ao,denoise,outline,postFxstill renders (the fallback path)(pointer: coarse)media query)