Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading multiple emscripten modules in one page #3167

Closed
kg opened this issue Feb 2, 2015 · 33 comments
Closed

Loading multiple emscripten modules in one page #3167

kg opened this issue Feb 2, 2015 · 33 comments

Comments

@kg
Copy link

kg commented Feb 2, 2015

It would be great to have (at least basic) support for loading multiple emscripten modules in one webpage. This is important for scenarios where a larger application wants to pull in existing native libraries without itself being written in C.

Right now doing this will produce multiple files that blow away Module and potentially shadow each others' names in the global namespace (real__ or whatever), and they don't share an address space.

Sharing an address space is really valuable here because without it, you have to associate each pointer (void*, etc) with the emscripten module it came from, and carry that association around. Carrying it through C++ call boundaries is really tough. I think it's not necessary for anything more than shared address space to happen though - it's fine for the modules to still have their own stack, and even have their own function pointer table and malloc/free. (Though a shared function pointer table would be handy, it's easier to carry annotations around for FPs since they're already more magical than regular data pointers.)

(For reference, my use case is porting C# applications that call native libraries - like SDL2 or sqlite - to the browser by compiling the native libraries with emscripten individually. It seems like at present the only way to do this would be some sort of merged library, assuming merging them is viable without conflicts.)

@kripken
Copy link
Member

kripken commented Feb 3, 2015

Being on the same page without sharing the HEAP is possible, you just give each its own Module object. For example, by putting each compiled codebase in a function scope where Module is defined. That's how ammo, box2d, etc. work, they are fine on pages with other emscripten-compiled projects.

Sharing the HEAP is, on the one hand, simple. We just need to tell emscripten which range in the heap to actually use. This would be incompatible with memory growth, but otherwise ok. However, the more I think about this, the more worried I am ;) Is this safe? They will each have their own malloc and free, for example. Allocating in one and freeing in another will lead to corruption and errors. Calling a method on an object you pass between them will also fail (separate function tables).

It seems like this would only work with simple data, ints and floats, arrays, pointers, etc., and without manipulating it. Very fragile.

@kg
Copy link
Author

kg commented Feb 3, 2015

Cross-module malloc/free is already broken in most dynamic linking scenarios, though, unless you're compiling all the modules yourself with a single compiler configuration. (I'm coming from a Win32 background, where the odds are good that you've got a bunch of libraries that came from different compilers and have different C runtimes.)

For my scenarios I mostly care about large-scale stuff, where it becomes unreasonable to copy between heaps - like I've got 2mb of vertex data here, or a texture there, and I want to pass it from one emscripten library to another. If I could just pass a 2MB Float32Array into an asm.js function I'd do that instead :-)

Cases like calling methods across module boundaries or invoking destructors seem hopelessly complex in comparison, and the returns are much smaller - you can just copy most of those bytes around and it'll be quite fast.

EDIT: If it helps, I'm thinking in terms of address space here, mostly. Being able to access the bytes at a pointer from moduleA in moduleB. So it's probably intended/best for each module to have their own heap, using a separate reservation of the single shared HEAP memory buffer. Though since emscripten provides its own malloc/free, there could always be one shared between all modules.

@kripken
Copy link
Member

kripken commented Feb 3, 2015

I see.

So, another option here is to add support for non-asm.js-validating passing of typed arrays. Not sure how practical this would be. All access to the typed array would need to be through a special interface, imagine something like

void double_vector(FloatVec x) {
  for (int i = 0; i < 1000; i++) {
    x[i] *= 2;
  }
}

where FloatVec has overloaded the array operator, and it turns into function calls, which emscripten turns into

function double_vector(x) {
  for (int i = 0; i < 1000; i++) {
    x[i] = (x[i] * 2) | 0;
  }
}

and x would actually be a typed array view into anything we want. Thoughts?

@kg
Copy link
Author

kg commented Feb 3, 2015

I think that solves the problem pretty well in a general sense but not for existing code. The problem I'm looking at is passing external data into existing libraries (like say zlib or sdl), and the external data isn't easy to get into the heap of individual Modules without copies. That's why I want to share a single buffer between them, since I can just pass the offset between the modules, and then pass that final buffer to WebGL.

If I have to change all the code to use FloatVec and then compile in a non-asm-validating mode, it's probably not going to end up being useful for the cases I care about. It seems like a reasonable solution though, if other people have a use for it in similar scenarios.

The C type system really hurts us here since all we have are addresses, instead of ranges like if we were dealing with known-length arrays and buffers in another language. :/

@kripken
Copy link
Member

kripken commented Feb 3, 2015

Got it. Ok, sounds like we want to experiment with sharing a single ArrayBuffer over multiple programs.

That commit should provide enough to start experimenting with this. As i suspected, it's pretty easy: emscripten can already put the program's "start of memory" anywhere it wants, using GLOBAL_BASE. With the new ALLOW_MEMORY_SHARING option, you just need to define Module['buffer'] before the program runs, and it will reuse it. The base of memory is defined by GLOBAL_BASE at compile time.

I don't have an idea for a good test for this, but it might work...

@kg
Copy link
Author

kg commented Feb 3, 2015

That's perfect! I can add it to my set of basic test cases, at least.

@kg
Copy link
Author

kg commented Feb 3, 2015

An obstacle here is that because emscripten output files aren't wrapped in an IIFE, it defines Module at global scope and uses it at global scope. So if I capture the Module object from one library then load another one, they are both using the current value of Module instead of the appropriate one. They're going to end up all using a single arbitrarily selected _malloc, printErr, etc.

kripken added a commit that referenced this issue Feb 3, 2015
@kripken
Copy link
Member

kripken commented Feb 3, 2015

MODULARIZE option in the last commit makes it easy to create wrapped outputs from emscripten. Used in ammo.js: kripken/ammo.js@118998f

@NWilson
Copy link
Contributor

NWilson commented Feb 4, 2015

Does that work fully? It looks like you're intending for it to be possible to make multiple instances of an emscripten application, but unless I'm missing something, two instances of the same application will still quarrel over the global exported name:

https://github.com/kripken/emscripten/blob/c4b8ad6343950fcd479908ca11d5dff8a8c4081c/src/shell.js#L153

Maybe something similar to PR #3008 would help in that case? My changes to shell.js could be merged with if MODULARIZE = 1 instead of if EXPORT_NAME = 'noexport'.

@kripken
Copy link
Member

kripken commented Feb 4, 2015

I agree it would be good to do something like that pull. However, I don't think there is currently a problem as if MODULARIZE is used with multiple modules on the same page, it must be used with different EXPORT_NAME values anyhow (because we do var EXPORT_NAME = ... in the global scope). That will also make the line in that link work ok and the modules won't overlap.

But, I wonder if we can't just remove the assignment to window and this as shown in the diff in that pull? Not sure if we need them anymore - the test suite would break if there is a problem, so we should check that.

@kripken
Copy link
Member

kripken commented Feb 4, 2015

Running test suite now.

@kripken
Copy link
Member

kripken commented Feb 4, 2015

Well, the test suite seems to not complain, so I pushed this. If it sticks and no one finds fault, I think we can close that pull (or was it doing something more)?

@NWilson
Copy link
Contributor

NWilson commented Feb 4, 2015

It looks like it's possible to create multiple instances of the module by
calling the exported wrapper function twice? That would cause problems
because both would try to export the same property. This would effectively
be multi-process computing, maybe someone could find some use for
implementing fork!
On 4 Feb 2015 18:44, "Alon Zakai" notifications@github.com wrote:

I agree it would be good to do something like that pull. However, I don't
think there is currently a problem as if MODULARIZE is used with multiple
modules on the same page, it must be used with different EXPORT_NAME values
anyhow (because we do var EXPORT_NAME = ... in the global scope). That
will also make the line in that link work ok and the modules won't overlap.

But, I wonder if we can't just remove the assignment to window and this
as shown in the diff in that pull? Not sure if we need them anymore - the
test suite would break if there is a problem, so we should check that.


Reply to this email directly or view it on GitHub
#3167 (comment).

@kripken
Copy link
Member

kripken commented Feb 5, 2015

Well, I don't think anyone was actually reading from this.EXPORT_NAME, which is why it was safe to remove in the last commit. But good point, until we removed the writes in that commit, it was risky to call the method more than once.

There are definitely use cases where the method is called more than once, I've done it when the app ported was too "messy" to provide a nice interface, so by calling the method with everything in it, you get a new heap, as if you started up the program from scratch. I am sure there are better uses too ;)

@kg
Copy link
Author

kg commented Feb 5, 2015

MODULARIZE seems to be working well in our test cases so far. I'll let you know once I have something shippable that's actually putting it to use with two modules.

@NWilson
Copy link
Contributor

NWilson commented Feb 5, 2015

Hooray! I'm happy now too. We ship a JS SDK which hides the fact that it's
using emscripten (for encapsulation), so now that's one less patch I have
to maintain on our build machines.

MODULARIZE doesn't add much really, you could achieve the same with some
pre-js and post-js I think, or just a bit of awk after compilation, but I
guess it's handy to have a stable option for it.

@kripken
Copy link
Member

kripken commented Feb 6, 2015

Yeah, it's mainly just to have a stable option, with tests in the test suite, so there is a standard safe way to do this.

@kripken
Copy link
Member

kripken commented Feb 7, 2015

I think this has been completed.

@kripken kripken closed this as completed Feb 7, 2015
@makc
Copy link

makc commented Apr 4, 2015

which version was this added at? I get this

WARNING  root: Assigning a non-existent settings attribute "MODULARIZE"
WARNING  root:  - perhaps a typo in emcc's  -s X=Y  notation?
WARNING  root:  - (see src/settings.js for valid values)

with 1.29.0 (which is what downloads link to).

@kripken
Copy link
Member

kripken commented Apr 4, 2015

git says 1.29.9.

edit: fix typo

@makc
Copy link

makc commented Apr 4, 2015

never the less MODULARIZE is not found in src/settings.js there. was it removed later?

@kripken
Copy link
Member

kripken commented Apr 4, 2015

Sorry, I had a typo but I guess you saw an email from github which was before. It was added in 1.29.9, which is later.

@juanprietob
Copy link

When using both flags, for example

-s MODULARIZE=1 -s EXPORT_NAME=\"'ModuleIMGJS'\"

How is it possible to access the FS library?
The library is not accessible any more after the export was done.

@kripken
Copy link
Member

kripken commented Nov 10, 2015

The FS is not exported by default. You can add it to Module manually (e.g. in a --post-js), or on incoming I added an option to export it using an -s option, this will work:

emcc tests/hello_world.c -s 'EXTRA_EXPORTED_RUNTIME_METHODS=["FS"]'

Then you can access Module['FS'].

@juanprietob
Copy link

If the FS module is exported as suggested. Do libraries share the FS space? I saw this post and I want to confirm https://groups.google.com/forum/#!topic/emscripten-discuss/_K61fo-9oKY

As an example:

var FSA = ModuleA['FS'];
FSA.mkdir('/temp');
FSA.writeFile('/temp/file.txt', someData);

...


var FSB = ModuleB['FS'];
var file = FSB.readFile('/temp/file.txt');
doSomething(file);

I would like to have a module that writes some data using FS and then, many different modules can read this data to do some processing.
Is it possible to have an architecture like this?

@kripken
Copy link
Member

kripken commented Nov 11, 2015

What do you mean by libraries? Shared modules as in https://github.com/kripken/emscripten/wiki/Linking ? Or just separately compiled projects, that are logically libraries?

Shared modules definitely share the FS of the main module (they also share all the system libraries). But separately-compiled programs don't have a way to share the FS, they each define one. But things like BrowserFS might help. There might also be a simple way to just make them reference the same FS object, however, that has references to things in memory (like the stdin stream in C), so memory would likely need to be shared, too. But that is feasible as well. However, that would take some hacking, probably.

@juanprietob
Copy link

Yes, I want to have separately compiled libraries.
In that case I'll try with browserFS.

Thank you.
On Nov 11, 2015 16:22, "Alon Zakai" notifications@github.com wrote:

What do you mean by libraries? Shared modules as in
https://github.com/kripken/emscripten/wiki/Linking ? Or just separately
compiled projects, that are logically libraries?

Shared modules definitely share the FS of the main module (they also share
all the system libraries). But separately-compiled programs don't have a
way to share the FS, they each define one. But things like BrowserFS might
help. There might also be a simple way to just make them reference the same
FS object, however, that has references to things in memory (like the stdin
stream in C), so memory would likely need to be shared, too. But that is
feasible as well. However, that would take some hacking, probably.


Reply to this email directly or view it on GitHub
#3167 (comment)
.

@noamtcohen
Copy link
Contributor

Hi

This is how I'm calling emcc:
emcc resample.c -o digi-resampler.js --memory-init-file 0 -O3 -s MODULARIZE=1 -s EXPORT_NAME=\"'ModuleDIGI'\" -s EXPORTED_FUNCTIONS="['_test','_resampler_init']"

And I'm getting this runtime error:
TypeError: ModuleDIGI._resampler_init is not a function

This is how the c function is declared:

__attribute__((used))
ResamplerState *resampler_init(int nb_channels, int in_rate, int out_rate, int quality, int *err)
{
   return resampler_init_frac(nb_channels, in_rate, out_rate, in_rate, out_rate, quality, err);
}
emcc -v
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 1.35.0
clang version 3.7.0 (https://github.com/kripken/emscripten-fastcomp-clang dbe68fecd03d6f646bd075963c3cc0e7130e5767) (https://github.com/kripken/emscripten-fastcomp 4e83be90903250ec5142edc57971ed4c633c5e25)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

When I don't MODULARIZE all is ok,

Thanks,
N

@kripken
Copy link
Member

kripken commented Jan 7, 2016

Are you instantiating the module? When we modularize, you must create an instance (and can create multiple, which is a benefit here, etc.). If it's not that, perhaps look in the test suite for how modularize is used. Also looking in your build output might show the problem.

Or, if you make a standalone testcase showing the issue, I can take a look at that.

@noamtcohen
Copy link
Contributor

Yup, this was the problem, should have done:

var DIGI = ModuleDIGI();
DIGI._resampler_init

Thanks!
Amazing project!

@noamtcohen
Copy link
Contributor

I got another question:
This is how I'm using the code from NodeJS:

eval(require("fs").readFileSync("./digi-resampler.js").toString());
var DIGI = ModuleDIGI();

What happens here is that anything on the module.exports object gets replaced with [Emscripten Module object]. This is a simple problem and it's easy to get around by assigning values the the exports object after I initialise the module but it doesn't seem right.
What is the correct way to use a module in nodejs?

Regarding modularizations again, if I have two different modules I compiled with emcc that I want to use in the same page, is it enough to modularize one of them or do I need to modularize both?

@kripken
Copy link
Member

kripken commented Jan 11, 2016

I don't know node that well, but i'm not sure i follow this. Who is writing to module.exports? If it's something in the emscripten output, that seems like a bug we need to fix - can you please file a new issue with a small testcase? (best thing is a pull request containing the test already in the test suite)

@noamtcohen
Copy link
Contributor

#4032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants