fix(wasm): eliminate UI hang during prompt prefill by unamedkr · Pull Request #30 · quantumaikr/quant.cpp

unamedkr · 2026-04-10T08:41:53Z

Problem

Browser freezes 5-15s after pressing Enter. "Thinking..." indicator never shows because the prefill loop blocks the main thread before the browser can paint.

Root cause

The prefill loop (quant.h:15465) runs tq_forward() for each prompt token with zero yield points. ASYNCIFY emscripten_sleep(0) only existed in the token generation callback, not during prefill.

Fix

quant.h: emscripten_sleep(0) every 2 tokens during prefill (guarded by __EMSCRIPTEN__ — native unaffected)
index.html: double requestAnimationFrame before WASM call guarantees "Thinking..." paints first

🤖 Generated with Claude Code

The browser froze for 5-15s after pressing Enter because the prefill loop (running all prompt tokens through 28 layers) had no yield points. ASYNCIFY sleep only fired during token generation, not prefill. Two fixes: 1. quant.h: add emscripten_sleep(0) every 2 tokens in the prefill loop. Browser can repaint and stays responsive. Only active on __EMSCRIPTEN__ builds — native is unaffected. 2. index.html: double-requestAnimationFrame before WASM call ensures "Thinking..." indicator paints before any blocking starts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The emscripten_sleep(0) added to quant.h's prefill loop (PR #30) broke ASYNCIFY for the entire quant_generate call. The call stack during tq_forward() is too deep (matmul → SIMD kernels) for ASYNCIFY to unwind/rewind — it silently fails and the generation callback's sleep stops working too. Fix: remove prefill sleep entirely. The prefill blocks the browser for a few seconds (unavoidable without a step-by-step API), but "Thinking..." is shown before via requestAnimationFrame. Token streaming during generation works again. Also: pthreads removed (PR #32) to avoid pthreads+ASYNCIFY conflict, build.sh now uses single-thread SIMD + ASYNCIFY only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) The emscripten_sleep(0) added to quant.h's prefill loop (PR #30) broke ASYNCIFY for the entire quant_generate call. The call stack during tq_forward() is too deep (matmul → SIMD kernels) for ASYNCIFY to unwind/rewind — it silently fails and the generation callback's sleep stops working too. Fix: remove prefill sleep entirely. The prefill blocks the browser for a few seconds (unavoidable without a step-by-step API), but "Thinking..." is shown before via requestAnimationFrame. Token streaming during generation works again. Also: pthreads removed (PR #32) to avoid pthreads+ASYNCIFY conflict, build.sh now uses single-thread SIMD + ASYNCIFY only. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit a2149ef into main Apr 10, 2026

unamedkr deleted the fix/wasm-prefill-hang branch April 10, 2026 08:41

unamedkr mentioned this pull request Apr 10, 2026

fix(wasm): remove prefill sleep — restores token streaming #33

Merged

unamedkr mentioned this pull request Apr 10, 2026

fix(wasm): ccall({async:true}) — fixes ASYNCIFY streaming #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(wasm): eliminate UI hang during prompt prefill#30

fix(wasm): eliminate UI hang during prompt prefill#30
unamedkr merged 1 commit intomainfrom
fix/wasm-prefill-hang

unamedkr commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 10, 2026

Problem

Root cause

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant