Skip to content

fix(video): atomic shader destroy + transparent context-loss recovery (19.6.0)#1460

Merged
obiot merged 9 commits into
masterfrom
fix/19.6-shader-lifecycle
May 22, 2026
Merged

fix(video): atomic shader destroy + transparent context-loss recovery (19.6.0)#1460
obiot merged 9 commits into
masterfrom
fix/19.6-shader-lifecycle

Conversation

@obiot
Copy link
Copy Markdown
Member

@obiot obiot commented May 22, 2026

Closes the Windows + Chrome shader-crash report from #1457 readers, and finishes the unfinished context-loss recovery design.

What was wrong

On NVIDIA Optimus laptops (Windows + Chrome with dual GPU), switching between integrated and discrete GPU loses the WebGL context. The renderer's existing webglcontextlost handler fires `ONCONTEXT_LOST` → every `GLShader` auto-destroys via the existing subscription at `glshader.js:89`. On ANGLE/D3D11 the cross-context `gl.deleteProgram` throws, leaving the partial state: `GLShader.uniforms === null` while `ShaderEffect.enabled` was still true. Next frame's `setUniform("uTime", …)` crashes with `TypeError: Cannot read properties of null (reading 'uTime')`. Mac drivers silently no-op on the cross-context delete, which is why the bug was Windows-only.

What this PR does

Two-part fix:

  1. `GLShader.destroy()` is now atomic + idempotent. Wraps `gl.deleteProgram` in a try/catch so the throwy GL call can't leave a partial state. Adds a public `destroyed: boolean` flag for a uniform guard pattern across `setUniform` / `bind` / `getAttribLocation`.

  2. `ShaderEffect.destroy()` sets `enabled = false` BEFORE the inner `_shader.destroy()` call. If the inner destroy ever throws (the original failure mode), the public-facing gate is already closed so a still-registered `setTime(t)` loop sees a no-op.

Also completes the unfinished context-loss design — the comment at glshader.js:88 already documented the intent ("will be recreated on context restore") but only the destroy side was implemented. Shaders now SUSPEND on context lost (program released; source + uniform-value cache preserved) and RECOMPILE + replay cached uniforms on context restored. Visual state survives a GPU switch transparently; user code doesn't have to re-apply uniforms.

Example migration — `platformer` and `platformer-matter` coin entities migrated from a shared singleton `coinShader` with manual refcount to per-instance `ShineEffect`. Matches `plinko-planck`'s pattern (peg.ts, dropZone.ts) — the platformer coins were the deviation in 19.5.0.

Tests

16 new adversarial tests in `webgl_pipeline_adversarial.spec.js` (3491 → 3506 melonjs total):

  • Idempotent destroy (GLShader + ShaderEffect)
  • Partial-state-safe destroy under throwing `gl.deleteProgram` (mocked ANGLE failure)
  • ShaderEffect destroy reordering — enabled flip BEFORE inner destroy
  • Per-instance shader isolation
  • Synthetic context-lost/restored cycle via event-bus
  • 7 real `WEBGL_lose_context` end-to-end cycles in a trailing block (last so the transient context drop can't poison earlier tests):
    • 1 shader, 1 cycle
    • 2 shaders, no cross-pollination
    • `setUniform` during suspended window caches + replays on restore
    • destroy during suspended window stays destroyed across restore
    • 3 consecutive lose/restore cycles, uniform state intact
    • `ShaderEffect.enabled` flips on each transition
    • destroy during cycle + second lose/restore is harmless (no leaked subscribers)

Every test uses `ctx.skip("WebGL not available")` + `expect.assertions(N)` so a CI runner without WebGL reports SKIPPED, not fake-passed. The two regression tests verifiably FAIL when their corresponding fix is reverted (confirmed by stash + rerun) — they're real guards, not theater.

Verified

  • `pnpm -F melonjs build` clean
  • `pnpm -F melonjs vitest run` — 3506 / 12 skipped (was 3491)
  • `pnpm -F @melonjs/matter-adapter vitest run` — 177
  • `pnpm -F @melonjs/planck-adapter vitest run` — 169
  • Real-Chrome smoke test via Playwright — coin with shine effect renders correctly in `/#/platformer-matter`; zero console errors; in-game introspection confirms `ShineEffect.destroyed === false`, `innerProgramAlive === true`.

Test plan

  • CI lint + test on this branch goes green
  • Verify CHANGELOG renders correctly
  • Optional: someone with a Windows + NVIDIA Optimus machine confirms the original crash is gone

🤖 Generated with Claude Code

… (19.6.0)

Fixes a Windows + Chrome crash on NVIDIA Optimus laptops (dual-GPU
switch loses the WebGL context → ANGLE/D3D11 throws on the cross-
context `gl.deleteProgram` inside `GLShader.destroy`, leaving the
shader with `uniforms = null` and `enabled` still true → next frame's
`setUniform("uTime", …)` crashes with `Cannot read properties of null
(reading 'uTime')`). Apple GL drivers silently no-op on the same
cross-context delete, which is why the bug was Windows-only.

Two-part fix:

1. `GLShader.destroy()` is now atomic + idempotent. Wraps the
   `gl.deleteProgram` in a try/catch so the partial-destroy can't
   happen; sets a new `destroyed: boolean` flag for the single-source
   guard pattern across `setUniform` / `bind` / `getAttribLocation`.
2. `ShaderEffect.destroy()` sets `enabled = false` BEFORE the inner
   `_shader.destroy()` call. Even if the inner destroy throws (the
   original failure mode), the public-facing gate is already closed
   so a still-registered `setTime(t)` loop sees a no-op.

Also completes the unfinished `ONCONTEXT_LOST` → recovery design that
the comment at glshader.js had documented but never implemented:
shaders now SUSPEND on context lost (program released, source +
uniform-value cache preserved) and RECOMPILE + replay cached uniforms
on context restored. The visual state survives a GPU switch
transparently — user code doesn't need to re-apply uniforms.

Platformer + platformer-matter coin examples migrated from a shared
singleton `coinShader` with manual refcount to per-instance
`ShineEffect`, matching plinko-planck's pattern.

16 new adversarial tests in `webgl_pipeline_adversarial.spec.js`,
including 7 that drive real `WEBGL_lose_context` lose/restore cycles
end-to-end. Two of them verifiably catch the original crash signature
when their corresponding fix is reverted. Each test uses `ctx.skip()`
+ `expect.assertions(N)` so a CI runner without WebGL reports SKIPPED
(not fake-passed).

Visually verified in real Chrome (Playwright) — coin with shine
effect renders correctly, zero console errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 05:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the WebGL shader/effect lifecycle to prevent crashes during WebGL context loss (notably on Windows/Chrome ANGLE) and completes automatic context-loss recovery by suspending/recompiling shaders while preserving uniform state.

Changes:

  • Make GLShader destruction atomic/idempotent and add destroyed/suspended state with context-lost/restored suspend + recompile + uniform replay.
  • Make ShaderEffect.destroy() idempotent and reorder teardown so enabled is disabled before inner shader destruction; also toggle enabled across context loss/restoration.
  • Add adversarial WebGL tests (including real WEBGL_lose_context cycles), bump melonjs version to 19.6.0, update changelog, and migrate platformer coin examples to per-instance ShineEffect.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/melonjs/src/video/webgl/glshader.js Adds destroyed/suspended lifecycle, safe/idempotent destroy, and context-lost/restored suspend + recompile with uniform caching/replay.
packages/melonjs/src/video/webgl/shadereffect.js Adds destroyed flag, context-lost/restored enabled gating, and idempotent destroy with safer teardown ordering.
packages/melonjs/tests/webgl_pipeline_adversarial.spec.js Adds shader lifecycle + context loss/recovery regression tests, including real WEBGL_lose_context end-to-end cycles.
packages/examples/src/examples/platformer/entities/coin.ts Migrates platformer coin to per-instance ShineEffect with per-coin time updates.
packages/examples/src/examples/platformer-matter/entities/coin.ts Migrates matter platformer coin to per-instance ShineEffect with per-coin time updates.
packages/melonjs/package.json Bumps package version from 19.5.0 to 19.6.0.
packages/melonjs/CHANGELOG.md Adds an unreleased 19.6.0 entry documenting shader lifecycle hardening and example migrations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/melonjs/src/video/webgl/glshader.js Outdated
Comment thread packages/melonjs/tests/webgl_pipeline_adversarial.spec.js Outdated
Comment thread packages/examples/src/examples/platformer/entities/coin.ts Outdated
Comment thread packages/examples/src/examples/platformer-matter/entities/coin.ts Outdated
obiot and others added 2 commits May 22, 2026 15:16
GLShader._uniformCache: snapshot caller arrays / typed arrays in
setUniform so a later in-place mutation on the caller's side can't
poison the value replayed against the new program on context restore.

webgl_pipeline_adversarial spec: assert NO_ERROR explicitly after the
destroyed-shader no-op block instead of just draining — a stray gl.*
call from a destroyed path would now fail loudly.

Platformer + matter-platformer coin.ts: move GAME_UPDATE subscribe/
unsubscribe from constructor/onDestroyEvent to onActivateEvent/
onDeactivateEvent. CoinEntity is pool.register(..., true), so
onDestroyEvent does NOT fire on pool return — the prior pattern
leaked one dead update handler per coin pickup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ttery

Extends 19.6.0's shader-level context recovery to the rest of the
WebGL pipeline. The previous handler only emitted ONCONTEXT_RESTORED
and let `reset()`'s batcher.init() path do the work; that left the
renderer's own `vertexBuffer` tied to the dead context (gl.bindBuffer
against it produced INVALID_OPERATION on the next flush), default GL
state untouched after the driver wiped it, and stale texture-unit
assignments in TextureCache.

webgl_renderer.js webglcontextrestored handler now:
- re-creates `this.vertexBuffer = gl.createBuffer()` + re-binds it
- re-applies default state (BLEND, DEPTH_TEST, SCISSOR_TEST,
  depthMask, blendMode) — the driver clears all GL state on restore
- runs reset() (which already takes the batcher.init() recovery
  path when isContextValid is false) against fresh GL resources
- clears TextureCache.units / usedUnits so the next uploadTexture
  hits the create-and-upload path instead of rebinding dead handles
- finally emits ONCONTEXT_RESTORED for GLShader / ShaderEffect
  recompile + uniform replay

Also added 11 adversarial integration tests under "Renderer-wide
context loss / restore battery" covering: gl.isContextLost transitions,
renderer.isContextValid mirror flag, renderer.vertexBuffer validity,
indexed-batcher vertex buffers, default GL state restoration, batcher
boundTextures invalidation + lazy re-upload, drawImage + flush after
the cycle, mesh batcher post-cycle drawing, litQuad (Light2d + normal
map) post-cycle drawing, ShaderEffect concurrently alive through the
cycle, and three consecutive cycles for cumulative-leak coverage.

End-to-end verified via Playwright + ANGLE: both `/platformer` and
`/platformer-matter` survive an in-game ext.loseContext() →
ext.restoreContext() cycle with zero console errors and the game loop
continuing to render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 08:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread packages/melonjs/src/video/webgl/glshader.js
Comment thread packages/melonjs/tests/webgl_pipeline_adversarial.spec.js Outdated
obiot and others added 2 commits May 22, 2026 17:03
…onstruction

8 new tests closing the gaps identified in the post-PR audit:

- Renderable.addPostEffect dispatch survives a full cycle (closest
  test to the in-the-wild coin scenario — exercises beginPostEffect
  → drawImage → endPostEffect across a real loseContext / restore)
- Pool-recycle simulation: same ShaderEffect re-attached to a
  renderable after a cycle still drives the dispatch path correctly
- addPostEffect during the suspended window does not throw
- 20 concurrent ShaderEffects all survive a cycle with distinct
  per-instance uniform values (subscription leak / shared-cache check)
- renderer.customShader assignment survives a cycle (the underlying
  ShaderEffect / GLShader keeps replaying its uniforms; reset()
  clears the renderer-level pointer, user code re-assigns)
- Tinted texture cache regenerates cleanly after a cycle
- save / restore stack interacting with a cycle does not throw
  (saved state is transient; restore() with empty stack is safe)
- 30 destroyed ShaderEffects + cycle leaves no leaked subscriber
  recompiling a destroyed shader (subscription cleanup proof)

Plus a small GLShader resilience fix needed to make #3 pass:
constructing a new GLShader while gl.isContextLost() === true now
sets suspended = true and skips _compile() instead of throwing
"Error: null" from gl.getShaderInfoLog on the fake (lost-context)
shader object. The ONCONTEXT_RESTORED subscription triggers the
deferred compile once the context is alive again; any setUniform
the caller writes in the meantime is cached and replayed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ygiene

GLShader.setUniform: replace the eager `.slice()` snapshot with the
existing `captureValue` helper from utils/uniforms.js. Reuses the
existing cache slot's allocation when the value's length matches
(steady-state writes don't allocate at all), only allocating on
first capture or a shape change. Live GL writes now pass the
caller's original value (Vector / matrix types still flow through
`.toArray()` because the GL proxy reads `val.length`), since
gl.uniform* setters copy synchronously into program state.

Eliminates per-frame GC churn on hot paths where the same uniform
gets a typed-array write every frame (light positions / colors in
LitQuadBatcher, per-frame uTime via Vector writes, the procedural
TMX tile shader's animation uniforms, etc.). Pixi v8 uses the same
slot-reuse pattern (`SystemUniformGroupBuffer.update`); the helper
was already in our codebase for the live-write cache inside
extractUniforms, this just shares it with the context-restored
replay cache.

webgl_pipeline_adversarial spec: wrap the two monkey-patch tests
(GLShader partial-destroy + ShaderEffect partial-destroy) in try /
finally so a mid-test assertion failure can't leak the patched
deleteProgram into subsequent tests. Previously a single failed
assertion would cascade — every other test that called destroy on
ANY shader would hit the mock-throw and produce confusing failures
in unrelated specs.

Also relax the array-snapshot test to compare by value (toBeCloseTo)
rather than by JS type — the cache slot is intentionally reused
across writes so the type may differ from what the caller passed,
but the values are what matter for replay correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 10:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread packages/melonjs/src/video/webgl/shadereffect.js Outdated
obiot and others added 2 commits May 22, 2026 19:03
…work

Sweep through the source and test comments added in this PR per
"keep them short" feedback. The multi-paragraph essays explaining
the 19.5.0 crash, the partial-state-destroy reasoning, the ANGLE
behavior, etc. are kept in the CHANGELOG Highlights and the original
PR description where they belong; in-code they collapse to one-line
WHY pointers (the WHAT is in the code itself).

No behavior changes — just comment trim across glshader.js,
shadereffect.js, webgl_renderer.js, the two example coin.ts files,
and webgl_pipeline_adversarial.spec.js. 52/52 tests still pass,
build still clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_onContextRestored` was unconditionally setting `enabled = true`,
which overrode `effect.enabled = false` toggles user code uses for
visibility (the dropZone effect in plinko-planck is the canonical
example — it stays disabled between drops). On a GPU context cycle
the user-disabled effect would silently re-enable itself on the
next frame, producing the exact "shader appears even though I
turned it off" bug.

Fix: remember `enabled` in `_onContextLost`, restore the same value
in `_onContextRestored`. Default to true only when there was no
recorded pre-suspend state (e.g. first activation right after
restore on a never-seen effect).

Added a regression test ("a user-disabled ShaderEffect stays
disabled across a cycle") pinning the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 14:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comment thread packages/examples/src/examples/platformer/entities/coin.ts
Comment thread packages/examples/src/examples/platformer-matter/entities/coin.ts
obiot and others added 2 commits May 23, 2026 07:00
deactivate / destroy

Both platformer + platformer-matter CoinEntity deferred shader
attachment via `event.once(event.LEVEL_LOADED, () => ...)` using an
inline arrow. If the coin gets deactivated or destroyed before
LEVEL_LOADED fires (level reload while the engine is still loading,
test teardown mid-init, etc.), the arrow keeps a reference to the
now-pooled coin instance and later fires `subscribeShine()` on it,
attaching a GAME_UPDATE handler on an inactive object — a slow leak
across pool cycles.

Fix: store the handler on the instance as `deferredAttachHandler`,
add a `cancelDeferredAttach()` helper that does `event.off`, and
call it from `onActivateEvent` (before re-arming), `onDeactivateEvent`,
and `onDestroyEvent`. Confirmed `event.off` removes once-registered
handlers — melonJS's EventEmitter.removeListener walks both the
`eventListeners` and `eventListenersOnce` arrays.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 23:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

@obiot obiot merged commit ae86ee9 into master May 22, 2026
7 checks passed
@obiot obiot deleted the fix/19.6-shader-lifecycle branch May 22, 2026 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants