Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ES6 modules as our internal components #5828

Open
kripken opened this issue Nov 22, 2017 · 13 comments
Open

Investigate ES6 modules as our internal components #5828

kripken opened this issue Nov 22, 2017 · 13 comments

Comments

@kripken
Copy link
Member

kripken commented Nov 22, 2017

(Forked off from #5794 (comment) )

As a long-term goal, it would be nice if emscripten output

  • Fit in with node.js packaging in a natural way.
  • Fit in with ES6 build system tools.
  • Were overall as modular as possible.

Those might be achievable by splitting all or most of the JS we currently emit into ES6 modules. So the GL code might be such a module, etc. Less obvious how the runtime would, but could be possible.

This is not something we can do right now (in particular, not all browsers support ES6 modules, and wasm integration there is farther out), but in the linked issue above we discuss various refactorings, and it could be useful to keep the ES6 modular goal in mind, so that we are working towards that.

We'll need to do a bunch of investigation along the way, including

  • Code and compile time implications of such changes. In theory compile-time tools can merge ES6 modules and remove the overhead (of imports etc.), but we should verify that (e.g., it might remove imports but leave more JS objects or IIFEs around as "namespaces"), and we'll need to see how fast those things are.
  • How such modularization would work with our various build flag options (separate module for GL vs GL emulation? or at compile time like now? etc.).
  • How it would fit with our EM_ASM and js-library options.
  • Look at Rust's new approach to how JS and wasm integrate. Still early there, but we should see what they do and how it goes.
@kripken
Copy link
Member Author

kripken commented Nov 22, 2017

One possible way towards this may be

  • Allow JS libraries to be ES6 modules.
  • Move more things into JS libraries.

@lukewagner
Copy link
Contributor

That sounds like the right first incremental steps. Additional incremental steps could be:

  • Create some notion of "package" that is a collection of .h, .cpp, and .js files that can be explicitly depended on by other packages.
    • The .js files would be ES modules, importable by the .cpp files.
    • Static linking of multiple packages would simply collect all the .js files from all the packages and optionally merge them into a single .js module with a tool like webpack or rollup.
  • Allow packages to be compiled in a "clean" environment in which the only code they can #include and link against are packages they've explicitly included as dependencies.
  • Break up all standard Emscripten runtime/libraries/glue-code into fine-granularity packages
    • Provide a compatibility mode/flag/command that implicitly depends on all the standard packages, thereby preserving today's behavior
  • Build up a workflow for creating packages, publishing packages to a package manager (e.g., npm), and installing external packages

Following this to its logical conclusion (which may require some of the breaking changes described in #5794 to avoid otherwise-unpackage-able JS glue code) would I think lead to the end state described in my #5984 comment.

@saschanaz
Copy link
Collaborator

I think we may use ES2015 modules (and even any ES2015+ syntax) before browser support with Babel+Webpack dependency. Both depends on Node.js we already have so probably okay to have them together.

@stale
Copy link

stale bot commented Sep 19, 2019

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Sep 19, 2019
@stale stale bot closed this as completed Sep 26, 2019
@RReverser RReverser reopened this Nov 25, 2019
@stale stale bot removed the wontfix label Nov 25, 2019
@caspervonb
Copy link

caspervonb commented Jan 31, 2020

I've been playing around with this lately in a standalone helper script for LLVM, here's what I've learned so far.

I've done a couple of passes on it, earlier iterations handled archives in a special way allowing .js files to be embedded in archives but the simplest approach turned out to be embedding the source in custom sections. Primarily it allows for most things to be left as-is including linker flag as an archive is now just an archive which may or may not have ECMAScript modules embedded within it.

Essentially I allow ECMAScript modules to be input files just as C source files would be. They're preprocessed with cpp -E -P ... before being embedded into .ll files which contain the source in NUL terminated strings in a custom section named esm.

At this stage it makes sense to also mark ECMAScript exports as extern_weak symbols in the .ll file but without having type annotations in the ECMAScript code this is non-trivial and requires additional parsing so for now I've skipped doing this.

Clang/LLVM then does it's thing with the modified arguments, wasm-ld concatenates the custom sections into a single section during linker hence the need for a delimiter.

Then wasm-ld spits out a module I extract each fragment of the esm section into numbered files (Each fragment is written to it's own file to maintain scope).

Once every partial file is extracted a top level env.js is generated which exports every export from the fragments that were extracted.

Finally entry point/loader is generated by looking at the exports of the module. I generate the importsObject by explicitly naming each member I want to be kept as it lets compilers/bundlers do a much better job at eliminating code with everything being depended on explicitly.

Code and compile time implications of such changes. In theory compile-time tools can merge ES6 modules and remove the overhead (of imports etc.), but we should verify that (e.g., it might remove imports but leave more JS objects or IIFEs around as "namespaces"), and we'll need to see how fast those things are.

Depends on the bundler but Rollup leaves everything in a nice single flat namespace, Webpack is more versatile but slower and noisier output. Closure Compiler produces the smallest bundles but is also the slowest by far; an empty project takes about 6 seconds to boot.

How such modularization would work with our various build flag options (separate module for GL vs GL emulation? or at compile time like now? etc.).

If emulated features can be designed to just go away when not being called that would be the best way.

Otherwise aim for GL{N} and GLES{N} libraries?

How it would fit with our EM_ASM and js-library options.

As for EM_ASM and EM_JS, while it's possible to do with pre-processing it would be easier with upstream support from LLVM.

This can be done trough an attribute, I found that rust's bindgen tools often uses an attribute to embed things like typescript typings in custom sections.

__attribute__((section("esm"))
const char* foo = "inline";

Or inline assembly; not a huge fan of doing with this approach but saw there has been some work on the parser side in LLVM already and tried to implement it this way but seems the patch hasn't landed.

__asm__ {
.custom_section esm
  ...
}

Upstream LLVM support in any direction would go along way here, section attributes would probably be the most useful across toolchains. Rust uses them to embed arbitrary data in various tools but they have their own front-end attribute for it.

As a side effect, this would resolve #9366 as the section can easily be removed by wasm-ld, wasm-opt etc.

The -js-library option would go away with time as ECMAScript fragments would be embedded inside of objects which are inside an archive so it would become just -larchive or -lshared_module. The fact that some parts are in JavaScript just becomes an implementation detail the same way having fragments of Objective C source code in a library made up of mostly C is just a detail.

@kripken
Copy link
Member Author

kripken commented Jan 31, 2020

cc @jgravelle-google , see last comment.

@caspervonb
Copy link

caspervonb commented Feb 1, 2020

Couldn't find the exact issue when writing the comment earlier but managed to dig it up now; Rust went with a similar approach to embedding data in custom sections with a wasm_custom_section attribute in their frontend which was merged in rust-lang/rust#48883 and later merged into their link_section attribute which I assume is their equivalent to the section attribute found in Clang.

@jgravelle-google
Copy link
Contributor

__attribute__((section("esm"))
const char* foo = "inline";

is close to something I played with a few months ago, looking at smuggling information from clang through to the backend (for use with interface types). For EM_ASM/EM_JS, this would be essentially ideal, so it's just a question of when we get around to it.

EM_ASM is tricky though because it needs to be evaluated in an expression context, whereas EM_JS has more flexibility as a top-level statement. I think ideally EM_ASM could do something like:

// from EM_ASM({ return $0 + $1; }, x, y);
(__attribute__((asm_const("{ return $0 + $1; }"))) const char* code, // comma operator
__em_asm_const(code, x, y))

but I'm pretty sure you can't declare variables in a comma operator. Also, ew.

I do have some ideas here, but they're not really half-baked yet.

@caspervonb
Copy link

is close to something I played with a few months ago, looking at smuggling information from clang through to the backend (for use with interface types). For EM_ASM/EM_JS, this would be essentially ideal, so it's just a question of when we get around to it.

Neat.

EM_ASM is tricky though because it needs to be evaluated in an expression context, whereas EM_JS has more flexibility as a top-level statement. I think ideally EM_ASM could do something like:

but I'm pretty sure you can't declare variables in a comma operator. Also, ew.

Yeah as far as I know you can't mix expressions and declarations.

@caspervonb
Copy link

caspervonb commented Feb 6, 2020

Aside from the internal linkage model for ad-hoc glue code; another thing to consider is that outside of Emscripten discrete module names are being used for module imports; the most prevalent example right now is that modules compiled with the wasm32-unknown-wasi triplet expect to import a module named wasi.

For system libraries; for example OpenGL it would be beneficial if all the toolchains could share a single implementation on the ECMAScript side of things.

Let's say we had a wagl module which mirrored WebGL but with WebAssembly semantics.

It would be fairly trivial to author thin shim GL/EGL/GLES libraries on-top of that in C that provides the appropriate feature-set since Clang/LLVM already has support for authoring bindings and libraries this way via the import_name and import_module attributes.

The linker also looks for an imports file adjacent to the linked library so the compile time linking story is fairly complete here.

On the runtime side; things can be bundled ahead of time and the resulting bundles are effectively the same as they are now except we get a cleaner library authoring and interoperability story.

I did some quick testing with Closure and the minify-imports-and-exports-and-modules pass in Binaryen which lead to wee itty bitty bundles with no namespaces in sight.

It's quite feasible to author libraries in this way today and it's less awkward than the current approach used in Emscripten now.

Outside of bundling, runtime resolution might be a desired trait in some cases.

In the future when/if WebAssembly gets access to WebIDL interfaces we'll probably want the ability to decide which module to load at runtime during what will likely be a fairly long transitional period; the ECMAScript WebGL based one or the WebAssembly WebIDL/WebGL based one.

But for now the primary benefit of runtime resolution would be caching; in the case of OpenGL you are very likely to be using most if not all of the functions available in the interface; especially the case with an OpenGL like API where a central function dispatches to many-things so bundling isn't saving that many bytes. If every WebAssembly/WebGL based application used the same canonical CDN then the cache hits would make up for the extra bytes very easily as program code may change frequently but WebGL is effectively frozen forever.

Being able to do this with ECMAScript module semantics depends on what future looks like in terms of the esm-integration and import-maps.

The reason being browser resolution is relative to the path of the module itself; as in if you were to import https://somecdn.com/wagl and that module is expecting to import "/env" to get access to linear memory then that module wouldn't be able to resolve "/env" relative to the caller's domain as intended but instead relative to the CDN's domain.

I got some ideas but I haven't come up with a good solution for the dynamically linking to remote modules that would align with the intentions of esm-integration proposal yet.

@stale
Copy link

stale bot commented Feb 7, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Feb 7, 2021
@RReverser
Copy link
Collaborator

Bump.

@stale stale bot removed the wontfix label Feb 7, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants