Skip to content

Fix BERT embedding pthread pool sizing#7

Merged
leehack merged 1 commit intomainfrom
fix/bert-embedding-pthread-pool
May 9, 2026
Merged

Fix BERT embedding pthread pool sizing#7
leehack merged 1 commit intomainfrom
fix/bert-embedding-pthread-pool

Conversation

@leehack
Copy link
Copy Markdown
Owner

@leehack leehack commented May 8, 2026

Summary

  • cap bridge-selected thread counts to the compiled pthread pool size reported by the WASM core
  • default pthread pool strictness to 0 as a fallback against hard aborts from unexpected over-pool requests
  • make the wasm64 BigInt post-link patch tolerate non-minified/debug-style call formatting
  • add an Emscripten thread-detection cache hint so local/CI builds do not fail before llama.cpp configures

Fixes #5.

Verification

  • CCACHE_DIR=/private/tmp/llama_web_bridge_issue5_ccache EM_CACHE=/private/tmp/llama_web_bridge_issue5_emcache LLAMA_CPP_DIR=/opt/UnitySrc/personal/llama/llamadart-native/third_party/llama.cpp BUILD_DIR=/private/tmp/llama_web_bridge_issue5_build5 MEM64_BUILD_DIR=/private/tmp/llama_web_bridge_issue5_build5_mem64 OUT_DIR=/private/tmp/llama_web_bridge_issue5_dist6 WEBGPU_BRIDGE_BUILD_MEM64=1 ./scripts/build_bridge.sh
  • bash -n scripts/build_bridge.sh
  • node --check js/llama_webgpu_bridge.js
  • Local Chromium BERT smoke with jina-embeddings-v2-small-en-Q2_K.gguf: direct runtime load/tokenize/embed/embedBatch passed with auto threads capped to 4
  • Local Chromium BERT smoke with the bridge worker path: load/tokenize/embed/embedBatch passed with auto threads capped to 4

@leehack leehack force-pushed the fix/bert-embedding-pthread-pool branch from 22bddfd to 7b312c3 Compare May 8, 2026 23:49
@leehack leehack requested a review from Copilot May 8, 2026 23:51
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses silent aborts when running BERT embedding workloads in the browser by aligning JS-selected thread counts with the actual compiled Emscripten pthread pool size, and by making the build/runtime configuration more tolerant of different build modes and toolchains.

Changes:

  • Expose the compiled pthread pool size from the WASM core and use it in the JS runtime to cap selected thread counts.
  • Make PTHREAD_POOL_SIZE_STRICT configurable (defaulting to 0) to avoid hard aborts on unexpected over-pool requests.
  • Make the wasm64 BigInt post-link patch more robust to non-minified/debug-style formatting, and force-enable CMake’s pthread probe hint for newer Emscripten toolchains.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/llama_webgpu_core.cpp Adds an exported function to report the compiled pthread pool size.
js/llama_webgpu_bridge.js Reads pool size from core via ccall and uses it to cap runtime thread selection.
CMakeLists.txt Adds a strictness knob, exports the new core symbol, defines pool size for compilation, and adds an Emscripten pthread probe cache hint.
scripts/build_bridge.sh Adds env var plumbing for strictness and improves wasm64 BigInt patching via regex.
README.md Documents the new strictness env var and updated pthread pooling behavior.
AGENTS.md Adds local verification guidance and a regression smoke checklist for pthread/BERT cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
@leehack leehack force-pushed the fix/bert-embedding-pthread-pool branch from 7b312c3 to 8b63078 Compare May 8, 2026 23:58
@leehack leehack merged commit a5041e7 into main May 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BERT embedding silently aborts on web

2 participants