fix(video): atomic shader destroy + transparent context-loss recovery (19.6.0)#1460
Merged
Conversation
… (19.6.0)
Fixes a Windows + Chrome crash on NVIDIA Optimus laptops (dual-GPU
switch loses the WebGL context → ANGLE/D3D11 throws on the cross-
context `gl.deleteProgram` inside `GLShader.destroy`, leaving the
shader with `uniforms = null` and `enabled` still true → next frame's
`setUniform("uTime", …)` crashes with `Cannot read properties of null
(reading 'uTime')`). Apple GL drivers silently no-op on the same
cross-context delete, which is why the bug was Windows-only.
Two-part fix:
1. `GLShader.destroy()` is now atomic + idempotent. Wraps the
`gl.deleteProgram` in a try/catch so the partial-destroy can't
happen; sets a new `destroyed: boolean` flag for the single-source
guard pattern across `setUniform` / `bind` / `getAttribLocation`.
2. `ShaderEffect.destroy()` sets `enabled = false` BEFORE the inner
`_shader.destroy()` call. Even if the inner destroy throws (the
original failure mode), the public-facing gate is already closed
so a still-registered `setTime(t)` loop sees a no-op.
Also completes the unfinished `ONCONTEXT_LOST` → recovery design that
the comment at glshader.js had documented but never implemented:
shaders now SUSPEND on context lost (program released, source +
uniform-value cache preserved) and RECOMPILE + replay cached uniforms
on context restored. The visual state survives a GPU switch
transparently — user code doesn't need to re-apply uniforms.
Platformer + platformer-matter coin examples migrated from a shared
singleton `coinShader` with manual refcount to per-instance
`ShineEffect`, matching plinko-planck's pattern.
16 new adversarial tests in `webgl_pipeline_adversarial.spec.js`,
including 7 that drive real `WEBGL_lose_context` lose/restore cycles
end-to-end. Two of them verifiably catch the original crash signature
when their corresponding fix is reverted. Each test uses `ctx.skip()`
+ `expect.assertions(N)` so a CI runner without WebGL reports SKIPPED
(not fake-passed).
Visually verified in real Chrome (Playwright) — coin with shine
effect renders correctly, zero console errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the WebGL shader/effect lifecycle to prevent crashes during WebGL context loss (notably on Windows/Chrome ANGLE) and completes automatic context-loss recovery by suspending/recompiling shaders while preserving uniform state.
Changes:
- Make
GLShaderdestruction atomic/idempotent and adddestroyed/suspendedstate with context-lost/restored suspend + recompile + uniform replay. - Make
ShaderEffect.destroy()idempotent and reorder teardown soenabledis disabled before inner shader destruction; also toggleenabledacross context loss/restoration. - Add adversarial WebGL tests (including real
WEBGL_lose_contextcycles), bump melonjs version to 19.6.0, update changelog, and migrate platformer coin examples to per-instanceShineEffect.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/melonjs/src/video/webgl/glshader.js | Adds destroyed/suspended lifecycle, safe/idempotent destroy, and context-lost/restored suspend + recompile with uniform caching/replay. |
| packages/melonjs/src/video/webgl/shadereffect.js | Adds destroyed flag, context-lost/restored enabled gating, and idempotent destroy with safer teardown ordering. |
| packages/melonjs/tests/webgl_pipeline_adversarial.spec.js | Adds shader lifecycle + context loss/recovery regression tests, including real WEBGL_lose_context end-to-end cycles. |
| packages/examples/src/examples/platformer/entities/coin.ts | Migrates platformer coin to per-instance ShineEffect with per-coin time updates. |
| packages/examples/src/examples/platformer-matter/entities/coin.ts | Migrates matter platformer coin to per-instance ShineEffect with per-coin time updates. |
| packages/melonjs/package.json | Bumps package version from 19.5.0 to 19.6.0. |
| packages/melonjs/CHANGELOG.md | Adds an unreleased 19.6.0 entry documenting shader lifecycle hardening and example migrations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
GLShader._uniformCache: snapshot caller arrays / typed arrays in setUniform so a later in-place mutation on the caller's side can't poison the value replayed against the new program on context restore. webgl_pipeline_adversarial spec: assert NO_ERROR explicitly after the destroyed-shader no-op block instead of just draining — a stray gl.* call from a destroyed path would now fail loudly. Platformer + matter-platformer coin.ts: move GAME_UPDATE subscribe/ unsubscribe from constructor/onDestroyEvent to onActivateEvent/ onDeactivateEvent. CoinEntity is pool.register(..., true), so onDestroyEvent does NOT fire on pool return — the prior pattern leaked one dead update handler per coin pickup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ttery Extends 19.6.0's shader-level context recovery to the rest of the WebGL pipeline. The previous handler only emitted ONCONTEXT_RESTORED and let `reset()`'s batcher.init() path do the work; that left the renderer's own `vertexBuffer` tied to the dead context (gl.bindBuffer against it produced INVALID_OPERATION on the next flush), default GL state untouched after the driver wiped it, and stale texture-unit assignments in TextureCache. webgl_renderer.js webglcontextrestored handler now: - re-creates `this.vertexBuffer = gl.createBuffer()` + re-binds it - re-applies default state (BLEND, DEPTH_TEST, SCISSOR_TEST, depthMask, blendMode) — the driver clears all GL state on restore - runs reset() (which already takes the batcher.init() recovery path when isContextValid is false) against fresh GL resources - clears TextureCache.units / usedUnits so the next uploadTexture hits the create-and-upload path instead of rebinding dead handles - finally emits ONCONTEXT_RESTORED for GLShader / ShaderEffect recompile + uniform replay Also added 11 adversarial integration tests under "Renderer-wide context loss / restore battery" covering: gl.isContextLost transitions, renderer.isContextValid mirror flag, renderer.vertexBuffer validity, indexed-batcher vertex buffers, default GL state restoration, batcher boundTextures invalidation + lazy re-upload, drawImage + flush after the cycle, mesh batcher post-cycle drawing, litQuad (Light2d + normal map) post-cycle drawing, ShaderEffect concurrently alive through the cycle, and three consecutive cycles for cumulative-leak coverage. End-to-end verified via Playwright + ANGLE: both `/platformer` and `/platformer-matter` survive an in-game ext.loseContext() → ext.restoreContext() cycle with zero console errors and the game loop continuing to render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onstruction 8 new tests closing the gaps identified in the post-PR audit: - Renderable.addPostEffect dispatch survives a full cycle (closest test to the in-the-wild coin scenario — exercises beginPostEffect → drawImage → endPostEffect across a real loseContext / restore) - Pool-recycle simulation: same ShaderEffect re-attached to a renderable after a cycle still drives the dispatch path correctly - addPostEffect during the suspended window does not throw - 20 concurrent ShaderEffects all survive a cycle with distinct per-instance uniform values (subscription leak / shared-cache check) - renderer.customShader assignment survives a cycle (the underlying ShaderEffect / GLShader keeps replaying its uniforms; reset() clears the renderer-level pointer, user code re-assigns) - Tinted texture cache regenerates cleanly after a cycle - save / restore stack interacting with a cycle does not throw (saved state is transient; restore() with empty stack is safe) - 30 destroyed ShaderEffects + cycle leaves no leaked subscriber recompiling a destroyed shader (subscription cleanup proof) Plus a small GLShader resilience fix needed to make #3 pass: constructing a new GLShader while gl.isContextLost() === true now sets suspended = true and skips _compile() instead of throwing "Error: null" from gl.getShaderInfoLog on the fake (lost-context) shader object. The ONCONTEXT_RESTORED subscription triggers the deferred compile once the context is alive again; any setUniform the caller writes in the meantime is cached and replayed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ygiene GLShader.setUniform: replace the eager `.slice()` snapshot with the existing `captureValue` helper from utils/uniforms.js. Reuses the existing cache slot's allocation when the value's length matches (steady-state writes don't allocate at all), only allocating on first capture or a shape change. Live GL writes now pass the caller's original value (Vector / matrix types still flow through `.toArray()` because the GL proxy reads `val.length`), since gl.uniform* setters copy synchronously into program state. Eliminates per-frame GC churn on hot paths where the same uniform gets a typed-array write every frame (light positions / colors in LitQuadBatcher, per-frame uTime via Vector writes, the procedural TMX tile shader's animation uniforms, etc.). Pixi v8 uses the same slot-reuse pattern (`SystemUniformGroupBuffer.update`); the helper was already in our codebase for the live-write cache inside extractUniforms, this just shares it with the context-restored replay cache. webgl_pipeline_adversarial spec: wrap the two monkey-patch tests (GLShader partial-destroy + ShaderEffect partial-destroy) in try / finally so a mid-test assertion failure can't leak the patched deleteProgram into subsequent tests. Previously a single failed assertion would cascade — every other test that called destroy on ANY shader would hit the mock-throw and produce confusing failures in unrelated specs. Also relax the array-snapshot test to compare by value (toBeCloseTo) rather than by JS type — the cache slot is intentionally reused across writes so the type may differ from what the caller passed, but the values are what matter for replay correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…work Sweep through the source and test comments added in this PR per "keep them short" feedback. The multi-paragraph essays explaining the 19.5.0 crash, the partial-state-destroy reasoning, the ANGLE behavior, etc. are kept in the CHANGELOG Highlights and the original PR description where they belong; in-code they collapse to one-line WHY pointers (the WHAT is in the code itself). No behavior changes — just comment trim across glshader.js, shadereffect.js, webgl_renderer.js, the two example coin.ts files, and webgl_pipeline_adversarial.spec.js. 52/52 tests still pass, build still clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_onContextRestored` was unconditionally setting `enabled = true`,
which overrode `effect.enabled = false` toggles user code uses for
visibility (the dropZone effect in plinko-planck is the canonical
example — it stays disabled between drops). On a GPU context cycle
the user-disabled effect would silently re-enable itself on the
next frame, producing the exact "shader appears even though I
turned it off" bug.
Fix: remember `enabled` in `_onContextLost`, restore the same value
in `_onContextRestored`. Default to true only when there was no
recorded pre-suspend state (e.g. first activation right after
restore on a never-seen effect).
Added a regression test ("a user-disabled ShaderEffect stays
disabled across a cycle") pinning the contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deactivate / destroy Both platformer + platformer-matter CoinEntity deferred shader attachment via `event.once(event.LEVEL_LOADED, () => ...)` using an inline arrow. If the coin gets deactivated or destroyed before LEVEL_LOADED fires (level reload while the engine is still loading, test teardown mid-init, etc.), the arrow keeps a reference to the now-pooled coin instance and later fires `subscribeShine()` on it, attaching a GAME_UPDATE handler on an inactive object — a slow leak across pool cycles. Fix: store the handler on the instance as `deferredAttachHandler`, add a `cancelDeferredAttach()` helper that does `event.off`, and call it from `onActivateEvent` (before re-arming), `onDeactivateEvent`, and `onDestroyEvent`. Confirmed `event.off` removes once-registered handlers — melonJS's EventEmitter.removeListener walks both the `eventListeners` and `eventListenersOnce` arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 30eb169.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the Windows + Chrome shader-crash report from #1457 readers, and finishes the unfinished context-loss recovery design.
What was wrong
On NVIDIA Optimus laptops (Windows + Chrome with dual GPU), switching between integrated and discrete GPU loses the WebGL context. The renderer's existing
webglcontextlosthandler fires `ONCONTEXT_LOST` → every `GLShader` auto-destroys via the existing subscription at `glshader.js:89`. On ANGLE/D3D11 the cross-context `gl.deleteProgram` throws, leaving the partial state: `GLShader.uniforms === null` while `ShaderEffect.enabled` was still true. Next frame's `setUniform("uTime", …)` crashes with `TypeError: Cannot read properties of null (reading 'uTime')`. Mac drivers silently no-op on the cross-context delete, which is why the bug was Windows-only.What this PR does
Two-part fix:
`GLShader.destroy()` is now atomic + idempotent. Wraps `gl.deleteProgram` in a try/catch so the throwy GL call can't leave a partial state. Adds a public `destroyed: boolean` flag for a uniform guard pattern across `setUniform` / `bind` / `getAttribLocation`.
`ShaderEffect.destroy()` sets `enabled = false` BEFORE the inner `_shader.destroy()` call. If the inner destroy ever throws (the original failure mode), the public-facing gate is already closed so a still-registered `setTime(t)` loop sees a no-op.
Also completes the unfinished context-loss design — the comment at glshader.js:88 already documented the intent ("will be recreated on context restore") but only the destroy side was implemented. Shaders now SUSPEND on context lost (program released; source + uniform-value cache preserved) and RECOMPILE + replay cached uniforms on context restored. Visual state survives a GPU switch transparently; user code doesn't have to re-apply uniforms.
Example migration — `platformer` and `platformer-matter` coin entities migrated from a shared singleton `coinShader` with manual refcount to per-instance `ShineEffect`. Matches `plinko-planck`'s pattern (peg.ts, dropZone.ts) — the platformer coins were the deviation in 19.5.0.
Tests
16 new adversarial tests in `webgl_pipeline_adversarial.spec.js` (3491 → 3506 melonjs total):
Every test uses `ctx.skip("WebGL not available")` + `expect.assertions(N)` so a CI runner without WebGL reports SKIPPED, not fake-passed. The two regression tests verifiably FAIL when their corresponding fix is reverted (confirmed by stash + rerun) — they're real guards, not theater.
Verified
Test plan
🤖 Generated with Claude Code