Skip to content

fix(wasm): eliminate UI hang during prompt prefill#30

Merged
unamedkr merged 1 commit intomainfrom
fix/wasm-prefill-hang
Apr 10, 2026
Merged

fix(wasm): eliminate UI hang during prompt prefill#30
unamedkr merged 1 commit intomainfrom
fix/wasm-prefill-hang

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

Problem

Browser freezes 5-15s after pressing Enter. "Thinking..." indicator never shows because the prefill loop blocks the main thread before the browser can paint.

Root cause

The prefill loop (quant.h:15465) runs tq_forward() for each prompt token with zero yield points. ASYNCIFY emscripten_sleep(0) only existed in the token generation callback, not during prefill.

Fix

  1. quant.h: emscripten_sleep(0) every 2 tokens during prefill (guarded by __EMSCRIPTEN__ — native unaffected)
  2. index.html: double requestAnimationFrame before WASM call guarantees "Thinking..." paints first

🤖 Generated with Claude Code

The browser froze for 5-15s after pressing Enter because the prefill
loop (running all prompt tokens through 28 layers) had no yield points.
ASYNCIFY sleep only fired during token generation, not prefill.

Two fixes:
1. quant.h: add emscripten_sleep(0) every 2 tokens in the prefill
   loop. Browser can repaint and stays responsive. Only active on
   __EMSCRIPTEN__ builds — native is unaffected.
2. index.html: double-requestAnimationFrame before WASM call ensures
   "Thinking..." indicator paints before any blocking starts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit a2149ef into main Apr 10, 2026
@unamedkr unamedkr deleted the fix/wasm-prefill-hang branch April 10, 2026 08:41
unamedkr added a commit that referenced this pull request Apr 10, 2026
The emscripten_sleep(0) added to quant.h's prefill loop (PR #30)
broke ASYNCIFY for the entire quant_generate call. The call stack
during tq_forward() is too deep (matmul → SIMD kernels) for
ASYNCIFY to unwind/rewind — it silently fails and the generation
callback's sleep stops working too.

Fix: remove prefill sleep entirely. The prefill blocks the browser
for a few seconds (unavoidable without a step-by-step API), but
"Thinking..." is shown before via requestAnimationFrame. Token
streaming during generation works again.

Also: pthreads removed (PR #32) to avoid pthreads+ASYNCIFY
conflict, build.sh now uses single-thread SIMD + ASYNCIFY only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unamedkr added a commit that referenced this pull request Apr 10, 2026
)

The emscripten_sleep(0) added to quant.h's prefill loop (PR #30)
broke ASYNCIFY for the entire quant_generate call. The call stack
during tq_forward() is too deep (matmul → SIMD kernels) for
ASYNCIFY to unwind/rewind — it silently fails and the generation
callback's sleep stops working too.

Fix: remove prefill sleep entirely. The prefill blocks the browser
for a few seconds (unavoidable without a step-by-step API), but
"Thinking..." is shown before via requestAnimationFrame. Token
streaming during generation works again.

Also: pthreads removed (PR #32) to avoid pthreads+ASYNCIFY
conflict, build.sh now uses single-thread SIMD + ASYNCIFY only.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant