Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emscripten tbb stuff #653

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

pca006132
Copy link
Collaborator

This is mainly to reproduce the issues, not intended for merging. There are still weird issues with pthread enabled build, and the performance is not that good when running from the browser (it can run significantly slower).

  1. Enabling the build is actually very easy, just need to specify -pthread to CMAKE_CXX_FLAGS and CMAKE_EXE_LINKER_FLAGS. We also need to set the cross origin policy bla bla bla to use SharedArrayBuffer, which is required for shared memory between workers.
  2. It turns out the vite issue is Importing a file from a web worker that imports said web worker causes Vite to build forever vitejs/vite#7015 and Eliminate the circular dependency between the main loader and the worker emscripten-core/emscripten#20580, and there is also an issue from emscripten that uses require from the worker which does not work with ES6 (is this WASM_WORKERS and ES6 exports emscripten-core/emscripten#17664 ? they said pthread supports ES6 though). I managed to fix by an ugly script.
  3. For some reason I am unable to get anything run with node when pthread and ES6 module is used. This is the major issue and I am unable to get this work with the latest version of emscripten (3.1.51). You may need to modify the patch function to use .\/built\/manifold.worker.js instead of built\/manifold.worker.js. Somehow vite is only happy with the latter while node is only happy with the former.
  4. For extra fun, try
    diff --git a/CMakeLists.txt b/CMakeLists.txt
    index 881182d..a1137d6 100644
    --- a/CMakeLists.txt
    +++ b/CMakeLists.txt
    @@ -57,9 +57,10 @@ endif()
     
     if(EMSCRIPTEN)
       message("Building for Emscripten")
    +  set(CMAKE_EXECUTABLE_SUFFIX_CXX ".mjs")
       set(MANIFOLD_FLAGS -fexceptions -D_LIBCUDACXX_HAS_THREAD_API_EXTERNAL -D_LIBCUDACXX_HAS_THREAD_API_CUDA)
       # ALLOW_MEMORY_GROWTH is slow, so we just allow 2gb of memory usage
    -  set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -sINITIAL_MEMORY=2gb -sALLOW_MEMORY_GROWTH=0 -fexceptions -sDISABLE_EXCEPTION_CATCHING=0")
    +  set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -sMODULARIZE=1 -sEXPORT_ES6=1 -sINITIAL_MEMORY=2gb -sALLOW_MEMORY_GROWTH=0 -fexceptions -sDISABLE_EXCEPTION_CATCHING=0")
       set(MANIFOLD_PYBIND OFF)
       if(MANIFOLD_PAR STREQUAL "TBB")
         set(CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS} -pthread)
    and then you will find that nodejs somehow will not run test/manifold_test.mjs no matter how hard you try... Probably similar issue with 3.

Copy link

codecov bot commented Dec 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e27c9c1) 91.67% compared to head (d578c86) 91.67%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #653   +/-   ##
=======================================
  Coverage   91.67%   91.67%           
=======================================
  Files          37       37           
  Lines        4730     4730           
=======================================
  Hits         4336     4336           
  Misses        394      394           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pca006132 pca006132 changed the title emscripten tbb stuff [DO NOT MERGE] emscripten tbb stuff Dec 14, 2023
@kripken
Copy link

kripken commented Dec 18, 2023

Regarding Emscripten's support for the combination of ES6+node+pthreads, it looks like it works but it does need a little boilerplate. I opened a PR in Emscripten now to add a test for it (this was not tested before, it seems, even though it worked), which contains the working boilerplate, which is basically to create another .mjs file that does

import initModule from "./hello.mjs";
initModule();

So the module needs to be imported, and also executed/called, I think.

@pca006132
Copy link
Collaborator Author

Thanks, will give this a try later.

@pca006132
Copy link
Collaborator Author

I think the main problem now is why manifold worker does not work on vitest.

@elalish
Copy link
Owner

elalish commented Jan 1, 2024

I think the main problem now is why manifold worker does not work on vitest.

Can you elaborate a little? Are 1-4 no longer current? Can you update this PR so that the CI fails in the way you're describing?

@pca006132
Copy link
Collaborator Author

No, this just means that 4 is no longer an issue. I just have some idea about a potential workaround, will try and update this PR.

set(MANIFOLD_PYBIND OFF)
if(MANIFOLD_PAR STREQUAL "TBB")
set(CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS} -pthread)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pthread -sPTHREAD_POOL_SIZE=\"(typeof window == 'undefined')?4:(()=>navigator.hardwareConcurrency)()\"")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't set it ourselves, PTHREAD_POOL_SIZE defaults to zero and I think it is a link-time constant, so we must set it to some sensible value here...

name: 'configure-response-headers',
configureServer: (server) => {
server.middlewares.use((_req, res, next) => {
res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cross-origin policy required for SharedArrayBuffer to work, there is no way around this. Hope that github pages supports this.

import {examples} from './public/examples.js';
import ManifoldWorker from './worker?worker';

const io = setupIO(new WebIO());

const wasm = await Module();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the weird part. For some reason if we load manifold.js multiple times, the subsequent loads will not complete.

Perhaps this is caused by pthread worker loader implementation. No idea about this, will try to come up with a minimal example and submit the issue to emscripten.

I currently workaround this issue by having only a single manifold worker, and move the get properties and genus calls to worker.ts, but this is not ideal.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever submit this to emscripten?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to try to reproduce this, will try later.

#/usr/bin/env bash
if test -f $1/manifold.js; then
# I still have no idea why this is needed...
sed -i 's/new Worker/new (ENVIRONMENT_IS_NODE ? global.Worker : Worker)/g' $1/manifold.js
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea why global.Worker != Worker, considering there are no let Worker or var Worker anywhere...

Also, Worker is something like this:

class extends EventTarget {
    constructor(url, options2) {
      super();
      this._vw_workerTarget = new EventTarget();
      this._vw_insideListeners = /* @__PURE__ */ new Map();
      this._vw_outsideListeners = /* @__PURE__ */ new Map();
      this._vw_messageQueue = [];
      this.onmessage = null;
      this.onmessageerror = null;
      this.onerror = null;
      const context = {
        onmessage: null,
        name: options2 == null ? void 0 : options2.name,
        close: () => this.terminate(),
        dispatchEvent: (event) => {
          return this._vw_workerTarget.dispatchEvent(event);
        },

while global.Worker is this:

class Worker extends EventEmitter {
  constructor(filename, options = kEmptyObject) {
    super();
    debug(`[${threadId}] create new worker`, filename, options);
    if (options.execArgv)
      validateArray(options.execArgv, 'options.execArgv');

    let argv;
    if (options.argv) {
      validateArray(options.argv, 'options.argv');
      argv = ArrayPrototypeMap(options.argv, String);
    }

    let url, doEval;
    if (options.eval) {
      if (typeof filename !== 'string') {
        throw new ERR_INVALID_ARG_VALUE(
          'options.eval',
          options.eval,
          'must be false when \'filename\' is not a string',
        );
      }
      url = null;
      doEval = 'classic';

global.Worker is the correct target in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that nodejs dynamic import will define Worker for the module being imported, but I am still not sure why the Worker defined here is the wrong one.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your other comment it sounds like you think this might be a vitest issue? Have we opened anything on their forums?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is related to vitest, but I cannot make a minimal example yet and don't quite understand why it causes an issue, so I have not yet open anything there.

# I still have no idea why this is needed...
sed -i 's/new Worker/new (ENVIRONMENT_IS_NODE ? global.Worker : Worker)/g' $1/manifold.js
fi
sed -i 's/var nodeWorkerThreads=require("worker_threads");/const{createRequire:createRequire}=await import("module");var require=createRequire(import.meta.url);var nodeWorkerThreads=require("worker_threads");/g' $1/*.worker.js
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some ES6 compatibility issue. This is something upstream should fix, and pretty easy to fix.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one that already got fixed upstream?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think they have not yet released a new version.

@pca006132
Copy link
Collaborator Author

and I do occasionally observe this kind of weird index out of bound issue when using pthread on the browser...

 ❯ worker.test.js  (9 tests | 1 failed) 8329ms
   ❯ worker.test.js > Examples > Gyroid Module
     → no glbURL
worker sent an error! undefined:undefined: table index is out of bounds

so even if we can make it build and work, not sure if we want to really use this in production for now.

@elalish
Copy link
Owner

elalish commented Jan 3, 2024

@kripken I think we have some more specific emscripten issues to look at now. Do you have any insights on why we're randomly getting undefined:undefined: table index is out of bounds with pthreads? And perhaps you can comment on patch-emscripten.sh?

@kripken
Copy link

kripken commented Jan 3, 2024

table index is out of bounds is unfortunately a very general error, something like a segfault. It's a little more specific than that, it's basically a segfault in the space of function pointers, but still it can be caused by any sort of memory corruption. Emscripten supports several sanitizers, which might help here. See also the function pointers debugging issues section.

I'm not familiar enough with the worker global scope stuff to have a guess at the issues there. Is there a chance those can be reduced to standalone testcases? I ask because the first at least seems like it should be a general node.js issue with pthreads, but we do test that heavily in upstream, so there must be some difference in how you build or run the code (and maybe reducing it to a standalone testcase will lead us to find the difference).

@pca006132
Copy link
Collaborator Author

OK I figured out the potential reason: Worker is defined by vitest.

// a.test.js
import '@vitest/web-worker';
import {test} from 'vitest';
import b from './b';

test('foo', () => { b() })

// b.mjs
export default () => console.log(`worker: ${Worker}`)

this prints

worker: class extends EventTarget {
    constructor(url, options2) {
      super();
      this._vw_workerTarget = new EventTarget();
      this._vw_insideListeners = /* @__PURE__ */ new Map();
      this._vw_outsideListeners = /* @__PURE__ */ new Map();
      this._vw_messageQueue = [];
      this.onmessage = null;
      this.onmessageerror = null;
      this.onerror = null;
      const context = {
        onmessage: null,
        name: options2 == null ? void 0 : options2.name,
        close: () => this.terminate(),
        dispatchEvent: (event) => {
          return this._vw_workerTarget.dispatchEvent(event);
        },
        addEventListener: (...args) => {
          if (args[1])
            this._vw_insideListeners.set(args[0], args[1]);
          return this._vw_workerTarget.addEventListener(...args);
        },
        removeEventListener: this._vw_workerTarget.removeEventListener,
        postMessage: (...args) => {
          if (!args.length)
            throw new SyntaxError('"postMessage" requires at least one argument.');
          debug("posting message %o from the worker %s to the main thread", args[0], this._vw_name);
          const event = createMessageEvent(args[0], args[1], cloneType());
          this.dispatchEvent(event);
        },

No idea why global.Worker is different from Worker. Perhaps some weird scoping issue.

@pca006132
Copy link
Collaborator Author

With the ES6 module issue being fixed in the upstream, we are now left with the function pointer thing and the Worker variable issue.

@pca006132
Copy link
Collaborator Author

pca006132 commented Jan 13, 2024

For some reason address sanitizer and ub sanitizer are really really slow (waited for several minutes for one test case and it did not finish, while it should normally finish within a few seconds) for the js binding, but seems to work fine (albeit slow) with the native test (wasm). Building with -sASSERTIONS=1 did not reveal anything.

My guess is that there is some issue in the js binding that causes the problem, or issue with function table + pthread.

@pca006132
Copy link
Collaborator Author

Interestingly when I tried to test with -sMALLOC=mimalloc on emscripten 3.1.51, the build succeeds but nodejs somehow refuses to utilize multiple threads. This is the only change and removing it makes pthread works again.

@pca006132
Copy link
Collaborator Author

testing on browser with address sanitizer enables shows "RuntimeError: indirect call signature mismatch" without any address sanitizer error (except reporting leaks from tbb)

@pca006132
Copy link
Collaborator Author

OK the function pointer issue is fixed: it turns out we were calling the javascript function from multiple threads and somehow it will cause problem only in the subsequent runs... weird

@pca006132
Copy link
Collaborator Author

pca006132 commented Jan 14, 2024

The performance on the browser is not bad:

Example no pthread with pthread
Rounded Frame 0.16s 0.19s
Scallop 0.69s 0.62s
Menger Sponge 0.54s 0.82s
Stretchy Bracelet 3.19s 1.27s
Gyroid Module 3.02s 2.54s

I am not entirely sure why menger sponge is significantly slower with pthread, probably overhead from parallelization being too fine?

@pca006132
Copy link
Collaborator Author

found the culprit: emscripten does not like parallel copy. the performance is faster in all cases now.

@pca006132 pca006132 changed the title [DO NOT MERGE] emscripten tbb stuff emscripten tbb stuff Jan 14, 2024
@pca006132
Copy link
Collaborator Author

@elalish it is ready now, the only problem here is the weird worker variable issue, remaining part of the patching script is to fix something that the upstream already fixes.

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work on this! The only change I'd like to make sure gets in before this merges is to make sure our JS gltf I/O is still getting tested.

CMakeLists.txt Outdated
@@ -58,8 +58,13 @@ endif()
if(EMSCRIPTEN)
message("Building for Emscripten")
set(MANIFOLD_FLAGS -fexceptions -D_LIBCUDACXX_HAS_THREAD_API_EXTERNAL -D_LIBCUDACXX_HAS_THREAD_API_CUDA)
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -sALLOW_MEMORY_GROWTH=1 -fexceptions -sDISABLE_EXCEPTION_CATCHING=0")
# ALLOW_MEMORY_GROWTH is slow, so we just allow 2gb of memory usage
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, how slow (roughly)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, this is a compiler warning (WebAssembly/design#1271), and it does not hurt to make it a fixed memory size. This probably affects our js callbacks as they perform quite a lot of memory read/write.

I think a fixed memory size should be fine, it does not occupy physical memory if not that much memory is actually used, and because wasm by default is using 32 bit pointers we cannot use more than 4gb of memory anyway. Forgot why I set it to 2gb though, perhaps just a random choice.

if(MANIFOLD_PAR STREQUAL "TBB")
add_custom_command(
TARGET js_deps POST_BUILD
COMMAND ${CMAKE_SOURCE_DIR}/patch-emscripten.sh ${CMAKE_CURRENT_BINARY_DIR})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the patch applied both here and in the manifold.yml file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch applied in manifold.yml is for patching the test/manifold_test.js, while the patch here is to patch the binding files. I can remove the package.json so running test/manifold_test.js does not require any patch. I added that mainly for testing and forgot to remove it.

import {examples} from './public/examples.js';
import ManifoldWorker from './worker?worker';

const io = setupIO(new WebIO());

const wasm = await Module();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever submit this to emscripten?

if (!docMesh) {
continue;
}
const {mesh} = readMesh(docMesh);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the one problem I see with this refactor is we're no longer testing readMesh, and we're not really testing writeMesh either since the test isn't validating its output at all. These functions aren't actually dependent on Manifold, so perhaps we can slot them back in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do test writeMesh, in the worker.ts code I am reading the output of writeMesh for the property.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it now - okay, that's perfect. Thanks!

# I still have no idea why this is needed...
sed -i 's/new Worker/new (ENVIRONMENT_IS_NODE ? global.Worker : Worker)/g' $1/manifold.js
fi
sed -i 's/var nodeWorkerThreads=require("worker_threads");/const{createRequire:createRequire}=await import("module");var require=createRequire(import.meta.url);var nodeWorkerThreads=require("worker_threads");/g' $1/*.worker.js
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one that already got fixed upstream?

#/usr/bin/env bash
if test -f $1/manifold.js; then
# I still have no idea why this is needed...
sed -i 's/new Worker/new (ENVIRONMENT_IS_NODE ? global.Worker : Worker)/g' $1/manifold.js
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your other comment it sounds like you think this might be a vitest issue? Have we opened anything on their forums?

@@ -60,13 +60,14 @@ endif()
if(EMSCRIPTEN)
message("Building for Emscripten")
set(MANIFOLD_FLAGS -fexceptions -D_LIBCUDACXX_HAS_THREAD_API_EXTERNAL -D_LIBCUDACXX_HAS_THREAD_API_CUDA)
# ALLOW_MEMORY_GROWTH is slow, so we just allow 2gb of memory usage
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -sMODULARIZE=1 -sEXPORT_ES6=1 -sINITIAL_MEMORY=2gb -fexceptions -sDISABLE_EXCEPTION_CATCHING=0")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove MODULARIZE and EXPORT_ES6? Or are they already set somewhere else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are already set for the js binding

@@ -103,10 +103,7 @@
emmake make -j''${NIX_BUILD_CORES}
'';
checkPhase = ''
cd test
echo '{"type":"module"}' > package.json
node --experimental-wasm-threads -e '(async () => {const module = await import("./manifold_test.js"); await module.default();})()'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is causing the nix failure somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, will check

Copy link
Owner

@elalish elalish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm ready when you are for this one.

if (!docMesh) {
continue;
}
const {mesh} = readMesh(docMesh);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it now - okay, that's perfect. Thanks!

@pca006132
Copy link
Collaborator Author

For some reason I am unable to get nodejs20 and nodejs21 to run with multiple threads... The --experimental-wasm-threads option is removed (people said it is now the default after v12) but I have trouble getting it working. Need more time to investigate...

@pca006132
Copy link
Collaborator Author

I still don't have any idea about why nodejs20/21 fail to use multiple cores. I think we should probably just build everything and look into the nodejs thing later? what do you think @elalish
I can clean this up and prepare for merging.

@elalish
Copy link
Owner

elalish commented Mar 11, 2024

SG to me - basically it's faster unless we have the latest node? That sounds like something we can reasonably follow up on.

@pca006132
Copy link
Collaborator Author

Another question: should we publish the version with tbb on npm as a separate package or replacing the current one?

@elalish
Copy link
Owner

elalish commented Mar 12, 2024

I would prefer to just have one version - what are the pros/cons?

@pca006132
Copy link
Collaborator Author

I think the version with tbb is slower when only a single thread is used. As I don't know how to make nodejs use multiple threads for newer versions, this will likely be the case for users using it on npm. Also, shared array buffer requires Cross-Origin-Embedder-Policy which some websites may not be able to set?

The cons of sticking with the current single threaded implementation is that it is slower than multithreaded one when multiple threads are available, which does work on browser and older version of nodejs for now.

@elalish
Copy link
Owner

elalish commented Mar 12, 2024

Hmm, yeah that COEP thing is an issue - Safari doesn't even support credentialless yet. Okay, maybe two versions is best for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants