Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MODULARIZE with USE_PTHREADS #7667

Merged
merged 2 commits into from
Feb 7, 2019

Conversation

juj
Copy link
Collaborator

@juj juj commented Dec 14, 2018

This adds MODULARIZE support with USE_PTHREADS. Will probably need some iteration, putting this up already to get a CI run on this, and since this was asked for by some people before.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is very important, good to have this in progress.

src/library_pthread.js Outdated Show resolved Hide resolved
src/postamble.js Outdated Show resolved Hide resolved
src/shell.js Outdated Show resolved Hide resolved
src/worker.js Outdated Show resolved Hide resolved
tools/preprocessor.js Outdated Show resolved Hide resolved
src/settings.js Outdated Show resolved Hide resolved
tools/shared.py Outdated Show resolved Hide resolved
@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 5 times, most recently from 43aa03f to fab6375 Compare January 30, 2019 17:31
@juj juj mentioned this pull request Jan 30, 2019
@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 3 times, most recently from 6185ab6 to eaafb2b Compare January 30, 2019 22:27
@juj
Copy link
Collaborator Author

juj commented Jan 30, 2019

Updated this for a second pass, now ready for a second review.

Unfortunately things do get quite messy here, when the same worker.js file needs to sometimes access via global scope, and other times via Module, depending on what kind of symbol is being accessed, whether MODULARIZE is used, and whether minifying optimizations are being run.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the EmscriptenCode change here - that seems like a breaking change?

Overall, I wonder if a way to reduce complexity here might be to say that pthreads requires modularize?

emcc.py Outdated Show resolved Hide resolved
@juj
Copy link
Collaborator Author

juj commented Jan 31, 2019

I worry about the EmscriptenCode change here - that seems like a breaking change?

Unless I did a bug, the EmscriptenCode change should apply only if a) user is building to .html and b) user did not pass -s EXPORT_NAME directive at all. My impression is that such a build mode would generate a broken build anyways, since before this change, it would generate a page that does not work.

If user builds to .js or explicitly passes in -s EXPORT_NAME= directive of their own, the change should not apply.

The main reason I did that is that with such change, I could edit src/settings.js and flip MODULARIZE = 0; to MODULARIZE = 1; and be able to run python tests/runner.py browser.test_pthread* to mass test all pthread tests in modularize enabled mode.

Though I can revert that if it's not desirable, I don't need it for anything anymore now that I have run the tests.

Overall, I wonder if a way to reduce complexity here might be to say that pthreads requires modularize?

I don't like that change, if we did that, I think we should require overall that all builds require MODULARIZE, otherwise we are diverging. I think it is ok for now.

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 2 times, most recently from 111567c to 04365e4 Compare January 31, 2019 18:18
@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 2 times, most recently from af04ac9 to ca39182 Compare February 1, 2019 15:47
@bvibber
Copy link
Collaborator

bvibber commented Feb 2, 2019

Couple of things I noticed testing this branch with ogv.js:

If I compile with -s ERROR_ON_UNDEFINED_SYMBOLS=1 I get error: undefined symbol: emscripten_webgl_create_context when building my modules. (Don't have a minimal test case handy yet, can write one up later.) Easily worked around by setting it to 0 for now.

I get Uncaught TypeError: Cannot set property '_stdin' of undefined when instantiating a module for a second time, here:

    var _stdin;
        if (ENVIRONMENT_IS_PTHREAD)
            _stdin = PthreadWorkerInit._stdin;
        else
            PthreadWorkerInit._stdin = _stdin = 40384;

Also, a variable ENVIRONMENT_IS_PTHREAD leaks into the window global.... that may be causing this bit to fail:

if (typeof ENVIRONMENT_IS_PTHREAD === "undefined") {
            ENVIRONMENT_IS_PTHREAD = false;
            var PthreadWorkerInit = {}
        } else {
            var buffer = OGVDecoderVideoVP9MTW.buffer;
            var tempDoublePtr = OGVDecoderVideoVP9MTW.tempDoublePtr;
            var TOTAL_MEMORY = OGVDecoderVideoVP9MTW.TOTAL_MEMORY;
            var STATICTOP = OGVDecoderVideoVP9MTW.STATICTOP;
            var DYNAMIC_BASE = OGVDecoderVideoVP9MTW.DYNAMIC_BASE;
            var DYNAMICTOP_PTR = OGVDecoderVideoVP9MTW.DYNAMICTOP_PTR;
            var PthreadWorkerInit = OGVDecoderVideoVP9MTW.PthreadWorkerInit
        }

so it's taking the second path and initializing PthreadWorkerInit to what turns out to be undefined instead of an empty object.

@bvibber
Copy link
Collaborator

bvibber commented Feb 2, 2019

Also I just want to leave a SUPER HUGE THANK YOU to @juj for working on this!

bvibber added a commit to bvibber/ogv.js that referenced this pull request Feb 2, 2019
@juj
Copy link
Collaborator Author

juj commented Feb 2, 2019

Great testing Brion! Instantiating the module multiple times is something I think the browser test suite doesn't have tests for - PRs that add minimal failing test cases to the suite would be highly appreciated, I can try to take a look then.

Something that would be good to verify is that this did not regress anything when using multithreading without MODULARIZE enabled? If so, then it seems like this would be good to land, and subsequent PRs could improve on the different remaining issues.

@bvibber
Copy link
Collaborator

bvibber commented Feb 2, 2019

Great, I'll whip up tests cases tonight or tomorrow for the multiple instantiation. :)

My code's highly module dependent so I can't test the non module path much myself.

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch from ca39182 to bf4253c Compare February 2, 2019 23:24
@juj
Copy link
Collaborator Author

juj commented Feb 4, 2019

CI passes on this, so I hope this could land soon @kripken ? This can further be revised in tree, and this PR carries the support for preprocessing src/worker.js, which my other PRs need, so I'm a bit blocked from authoring further.

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 2 times, most recently from 4986281 to d24e5b7 Compare February 4, 2019 13:55
emcc.py Outdated
# If we are building directly to .html with -s MODULARIZE=1 (but not -s MODULARIZE_INSTANCE=1), default to exporting
# Emscripten code under function EmscriptenCode() so that it does not collide with the Module object. (When not modularizing,
# or if building with MODULARIZE_INSTANCE, or if building to .js manually, there is no conflict, so default to 'Module' in those cases)
shared.Settings.EXPORT_NAME = 'EmscriptenCode' if final_suffix == '.html' and shared.Settings.MODULARIZE and not shared.Settings.MODULARIZE_INSTANCE else 'Module'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change - current users doing Module() would no longer work.

How about requiring EXPORT_NAME be set, when MODULARIZE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works as well - let me update to that.

emcc.py Outdated
@@ -2942,18 +2962,28 @@ def un_src(self):
"""Use this if you want to modify the script and need it to be inline."""
if self.src is None:
return
run_module = ''
if shared.Settings.MODULARIZE and not shared.Settings.MODULARIZE_INSTANCE:
run_module = 'script.onload = function() { %s(Module); };' % shared.Settings.EXPORT_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get this code, hmm... we should not run anything when MODULARIZE, instead we just provide that function, that users can call. So why does this code call it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I build with

emcc tests\hello_world.c -s MODULARIZE=1 -o a.html -s EXPORT_NAME=MyModule

what should happen? Do we expect to generate a web page that does not run at all? (but user has to open up web console and run MyModule(Module); there? That may be a bit odd out of the box behavior. The above change makes it so that a -o a.html page compiled with -s MODULARIZE=1 will generate an output that runs automatically.

If developer is targeting a -o a.js build, then the above code is avoided, and they can have their "I'll run it manually" feature.

I suppose I am a bit indifferent about this, I can revert this if this may be problematic - just thought I'd make these kind of -s MODULARIZE=1 -o a.html builds run out of the box by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect to generate a web page that does not run at all?

Yes, exactly. Only a function MyModule should exist in the global scope, and the user is expected to call it in a way that makes sense for their application. Consider that some user may only wany to call MyModule after some special input event, for example. And some users want to call it more than once.

The other behavior, where the code is run, is what MODULARIZE_INSTANCE does: it automatically runs it for you (and exactly once). It sounds like you're expecting MODULARIZE to work the same, but the use cases are different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MODULARIZE_INSTANCE is different than MODULARIZE in that MODULARIZE_INSTANCE assigns export over to {{{ EXPORT_NAME }}} and kills the module function, whereas MODULARIZE returns the exports, and the function can be called many times. I was not expecting MODULARIZE to work like MODULARIZE_INSTANCE, but I thought that -o a.html would be good to be able to run out of the box, while retaining the module-like structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I still don't get your point here, but "run out of the box" is exactly what modularize is supposed to not do? Making it run automatically would be a breaking and surprising change. I don't think that's different for HTML - the user would still be writing code to call the function themselves, so if you call it that would interfere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha - yeah, then it sounds like the right option is to abandon the autorun when building to html, so this PR no longer has that.

// Generates access to module exports variable in pthreads worker.js
function makeAsmExportAccessInPthread(variable) {
if (MODULARIZE) {
return "Module['" + variable + "']" // 'Module' is defined in worker.js local scope, so not EXPORT_NAME in this case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, how about defining var {{{ EXPORT_NAME }}} = Module; in worker.js local scope, then things would just work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, {{{ EXPORT_NAME }}} is the name of the function that one needs to call to instantiate the compiled JS module - I should not be assigning anything over it?

src/worker.js currently does

Module = {{{ EXPORT_NAME }}}(Module);

i.e. it uses Module as both the import object name, and also as export object name, like non-MODULARIZE-non-pthreads builds do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, {{{ EXPORT_NAME }}} is the name of the function that one needs to call to instantiate the compiled JS module - I should not be assigning anything over it?

True, yes - but I thought we were inside that function, so creating an alias might have simplified things? Did I get that wrong?

That is, what I meant to suggest is that for internal uses, we can do

function ExportName(Module) {
  var ExportName = Module;
  [.. internal uses here can use either Module or ExportName ..]
}

I think we do something similar in the general case, so that ExportName can be used internally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worker global scope in src/worker.js is not inside the function scope, but it is in the worker's global scope outside the function, so it will either need to assign variables to global scope to import to the main JS code, and read variables off of global scope or Module, if not building with MODULARIZE. If building with MODULARIZE, imports need to go in an imports object, and exports need to be read off the return value of invoking the module function. I'm not sure how applicable the code snippet you write would be here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so if I understand you now, src/worker.js adds code to the global scope in both modularize and non-modularize mode?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the worker does add code to global scope in both cases. From inside the JS local scope in the module in MODULARIZE mode, it reads variables in the global scope. Strictly speaking this is not polluting, since the global scope of a pthread-hosting worker is not "public", i.e. there's no main .html file or external JS code that would potentially get injected to it. But perhaps in the future it may make sense to either formalize this, or move items to the imports object. I don't know yet.

src/settings.js Outdated
// Default EXPORT_NAME is 'Module', but if building directly to .html
// with -s MODULARIZE=1 but not with -s MODULARIZE_INSTANCE=1, then default EXPORT_NAME
// is 'EmscriptenCode' to not conflict with the default provided Module object.
var EXPORT_NAME = '';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment text is not fully clear, and I still don't get it myself maybe - so we have the MODULARIZE function, and the Module object. What do we use the Module object for, that conflicts here? In MODULARIZE I would hope that it is used for nothing, and any existing uses are bugs we should fix? I know you fixed the separate-asm one, are there more?

Copy link
Collaborator Author

@juj juj Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/shell.html and src/shell_minimal.html both have

var Module = {
  ...
};

and we have tons of code patterns showing Module to be a global configuration object that has both the imports and exports. So this puts out an idiom that people will write var Module = { ... }; configuration objects in their own shells as well.

So then if I do -s MODULARIZE=1 -o a.html, I will get an a.js that has function Module(Module) {, but my a.html also has var Module = { configs here };, which contradict each other.

Or if I do -s MODULARIZE=1 -o a.js and wrote my own a.html to launch it, following tacit practices shown by src/shell.html that showcases that one should write a var Module = { ... }; config object, then again these two conflict, as my a.html has a var Module = { ... };, but when the compiled a.js file is loaded in, it will override it with a function Module(...) {...}.

The above aspects can lead to confusion of "what on earth is 'Module'?". Sometimes it is a global config object that has both imports+exports+library_x.js specific config vars. Sometimes it is the name of the JS module function that is generated (as EXPORT_NAME='Module' was default value in -s MODULARIZE=1 mode). In MODULARIZE=1 mode inside the generated module it is again the imports object (i.e. function {{{EXPORT_NAME}}}(Module) { ... }). In MODULARIZE_INSTANCE=1 builds it is first the imports object, but then is replaced by assigning the export object over it.

Requiring the value of EXPORT_NAME= to come directly from the user and never giving it a default value (especially EXPORT_NAME='Module') is a good step towards clearing the confusion, since people will then less easily mix these together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, very good points.

I think we should fix this before this PR. What I'd recommend is requiring EXPORT_NAME to not be Module. This would be a breaking change, so we need to prepare users, but it definitely seems necessary given the above - otherwise we'll be stuck trying to work around this in complicated ways in our codebase forever. (Sad we didn't realize this earlier, but better now than later.)

Specifically, what I'm suggesting is that the default value of EXPORT_NAME be something like EmscriptenCode as you suggested. (or maybe just Code? probably too short) and in emcc.py we assert on it not being set to Module, with a clear error.

If you don't have time for this, I could try to find time myself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted away from doing any changes to this, and just focused on the MODULARIZE+USE_PTHREADS combination.

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 5 times, most recently from 4e9ede5 to 7a46e0b Compare February 5, 2019 14:22
@juj
Copy link
Collaborator Author

juj commented Feb 5, 2019

I changed this PR to avoid the issue altogether, and only concentrate on the MODULARIZE+USE_PTHREADS combination. Adapted the tests to use a custom pattern to run. How does this look now?

@juj
Copy link
Collaborator Author

juj commented Feb 5, 2019

Looks green now on CI, ready for second review?

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 3 times, most recently from 8134613 to aaf8d44 Compare February 6, 2019 09:32
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, but I'd like to review another pass.

// Generates access to a global scope variable in pthreads worker.js
function makeAsmGlobalAccessInPthread(variable) {
if (MODULARIZE) {
return "Module['" + variable + "']" // 'Module' is defined in worker.js local scope, so not EXPORT_NAME in this case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ; on these returns

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These generate generic expressions to be used for assingment or read, so it's not a return Module['variable']; that would be generated here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, outside of the string:

return "Module['" + variable + "']";

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, gotcha :)

@@ -1477,6 +1477,34 @@ function makeStaticString(string) {
return '(stringToUTF8("' + string + '", ' + ptr + ', ' + len + '), ' + ptr + ')';
}

// Generates access to module exports variable in pthreads worker.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments could be better - should explain the stuff we are discussing here. in particular, why pthreads is different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comments with more detailed info


// Generates access to both global scope variable and exported Module variable, e.g. "Module['foo'] = foo" or just plain "foo" depending on if we are MODULARIZEing.
// Used the be able to initialize both variables at the same time.
function makeAsmExportAndGlobalAccessInPthread(variable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for an assignment target, I think? The name could specify that, and so it would not be odd that one code path has side effects and one seems to not (as in all cases this is the assign target, so there are side effects).

How about makeAsmExportAndGlobalAssignTargetInPthread?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

src/shell.js Outdated
@@ -81,11 +81,34 @@ if (Module['ENVIRONMENT']) {
// 2) We could be the application main() thread proxied to worker. (with Emscripten -s PROXY_TO_WORKER=1) (ENVIRONMENT_IS_WORKER == true, ENVIRONMENT_IS_PTHREAD == false)
// 3) We could be an application pthread running in a worker. (ENVIRONMENT_IS_WORKER == true and ENVIRONMENT_IS_PTHREAD == true)
#if USE_PTHREADS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the pthreads code from shell into a separate file - I think that will help a lot with readability here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch from fe8d279 to de9a1ff Compare February 6, 2019 20:18
@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch 3 times, most recently from 9f8f146 to e798a43 Compare February 6, 2019 22:43
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for your patience with the reviewing!

@juj juj force-pushed the MODULARIZE_with_USE_PTHREADS branch from e798a43 to 3934858 Compare February 7, 2019 21:14
@juj juj merged commit 62f6ca6 into emscripten-core:incoming Feb 7, 2019
@VirtualTim
Copy link
Collaborator

So I went to try this out, but noticed that it's not part of 1.38.27. Is this planned for 1.38.28? Any ETA on when this should land?

@juj
Copy link
Collaborator Author

juj commented Feb 12, 2019

This should be part of 1.38.27. Git log looks like this:

1 38 27

@VirtualTim
Copy link
Collaborator

VirtualTim commented Feb 13, 2019

Hm, OK, it looks like my emcc.py just didn't update on my machine, even though emsdk list shows emscripten-1.38.27 and sdk-1.38.27-64bit as installed and activated.

I had a look at the tags, and emcc.py from Feb 6 was tagged as 1.38.27, and these changes came in a day later. That made me think that these changes were not included. Manually downloading the 1.38.27 zip from github showed the expected emcc.py.

So I'm not sure what's going on on my machine, apologies for the false report.

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 10, 2024
Last usage looks like it was removed in emscripten-core#7667.
sbc100 added a commit that referenced this pull request Apr 10, 2024
Last usage looks like it was removed in #7667.
impact-maker pushed a commit to impact-maker/emscripten that referenced this pull request Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants