Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out main story for wasm32-unknown-unknown bins #18

Closed
aturon opened this issue Jan 17, 2018 · 15 comments
Closed

Sort out main story for wasm32-unknown-unknown bins #18

aturon opened this issue Jan 17, 2018 · 15 comments

Comments

@aturon
Copy link
Contributor

aturon commented Jan 17, 2018

Wasm has a concept of a start function, which is called synchronously when you instantiate a wasm module. Until this function returns, instance is not set, which means you can't access the wasm module's memory. As such, this is really more of an initialization function, rather than a "main".

That leads to a thorny question: for wasm32-unknown-unknown binaries, what is the role of main as it relates to wasm's start?

Currently, main is treated as start, which is in practice surprising because you its abilities are therefore highly restricted.

An alternative is to document clearly that main is something that must be invoked manually (for the moment), and provide a separate mechanism for hooking into start initialization. In the long run, for JS at least we envision bundlers often being used to package uses of wasm, and they may be able to automatically run main at an appropriate point.

@aturon
Copy link
Contributor Author

aturon commented Jan 17, 2018

From @alexcrichton:

What I'm having trouble reconciling though is that if you don't want to hook wasm's start function why use a binary? Is that not exactly what a cdylib is? We have basically 0 official documentation about wasm but I'd imagine we'd be very clear about this from the first paragraph (ish) so I don't think it'd confuse too much?

I think the core issue is that main and start are really quite different things, sadly, and so tying them together for bins is confusing.

@alexcrichton
Copy link
Contributor

I'd personally love to understand more the intented use case of the start hook in wasm. Was it made for binaries? Was it made for something else? What is planned to do for C/C++? (etc)

I'd again still be confused though, why would we have two equivalent output types? If you don't want start, why not use cdylib? Why have a bin where start isn't set?

@fitzgen
Copy link
Member

fitzgen commented Jan 17, 2018 via email

@pepyakin
Copy link
Member

pepyakin commented Jan 17, 2018

What is static data constructors? @fitzgen
If you mean something that doesn't require to do computations (so layout is statically known) it is enough to use something called "data segments": just arrays of data put in memory at some offset.

start, as mentioned by @koute, is needed for something like C++-ish global initializers.

@fitzgen
Copy link
Member

fitzgen commented Jan 17, 2018

What is static data constructors? @fitzgen

I just mean the constructors for some global of a non-POD type; I could be mixing up terminology.

@pepyakin
Copy link
Member

Then I think we are talking about the same thing!

@lukewagner
Copy link
Contributor

It's always an option for Rust to ignore start altogether. But another option we can discuss is to leverage cyclic ES Module imports. If foo.wasm imports foo.js and foo.js imports foo.wasm, then foo.js can have access to the exports of foo.wasm (the Memory, the functions, constant-initialized globals) when called from within foo.wasm's start function and thus start could call main.

Until wasm+ES Module integration, the bundler stage, which is already responsible for emitting the wasm JS API in the proposed Rust workflow, could transform .wasm modules that are involved in cyclic imports into JS API calls that did the equivalent thing (ultimately turning start functions into non-start functions that were automatically called after imports/exports were wired up via JS API).

The C++ folks were actually have the same question recently. Seems like we have an opportunity to solve this problem once and for all in the shared bundler convention we establish here. CC @sunfishcode

@sunfishcode
Copy link

I discussed this some more with @lukewagner and calling both "static constructors" and main from the wasm start function sounds like a good fit. There are problems with start functions in tools today, because it's currently common to have JS code that isn't packaged in ES Modules, but with wasm+ES Module integration, it'll be able to put all code in modules, at which point these problems can be solved, and this seems like a desirable place to aim for.

Also, since the wasm start function has no arguments or return values, we can briefly consider how those might be implemented:

Command-line arguments can be provided through imported functions. A program can call an import to ask how many arguments there are, allocate a buffer, and then call another import to request the buffer be filled in with argument information. (There are details to work out, such as whether the individual arguments are all allocated contiguously in one allocation or not, and how to determine their lengths, but that can be designed.) C-family compilers will have to do this in some crt0.o-like startup code; Rust has it easier here since Rust does command-line arguments through std::env::args() rather than arguments to main.

For return values, a simple thing that covers many use cases is just to have a wrapper around the user main function that returns normally on success, and traps on failure. Wasm doesn't have a way to specify a value when trapping, so it wouldn't be able to express arbitrary program exit values. However, when exception handling is added to wasm, the traps can be replaced by code that throws an exception indicating an exit status.

@alexcrichton
Copy link
Contributor

@Diggsey and @koute, do y'all have thoughts on the above?

@koute
Copy link

koute commented Jan 19, 2018

@sunfishcode wrote that "there are problems with start functions in tools today", but I feel like it's not a tooling problem. The JavaScript API itself for loading WebAssembly modules is broken as it doesn't allow for setting up the environment before it launches the start hook. (It could take a JS callback when instantiating the module and call it before main, for example, but it doesn't.) I think it's silly that we need external tools to fix broken behavior of the API instead of having a sane API in the first place.

Nevertheless assuming that wasm + ES module integration would fix the issue (as in - allow us to run JS code before main but with access to exports) it would leave an inconsistency where you could technically use a .wasm module as an ES module, but it would be useless as soon as you'd want to load it with WebAssembly.instantiate(). So I think whatever we go for should work well with both, or the WebAssembly.instantiate() should be fixed so that such hacks like deleting the start hook and creating an export out of it won't be necessary.

@Diggsey
Copy link

Diggsey commented Jan 19, 2018

I don't think that the ES Module stuff really affects the decision about what rustc should output: the pipeline outlined above involves multiple stages of transformations on the wasm file - it therefore really doesn't matter what rustc outputs, because the bundler stage can rewrite it however it wants.

We should have rustc output an exported "main" function, and a "start" symbol which can be configured through a function attribute. That way the unmolested output is flexible and easy to use. If there is a more complete pipeline set up for compatibility with ES Modules, or indeed other targets, then great! Wasm is not only for javascript, and as we previously established, this issue affects all wasm implementations as it is part of the core specification.

Finally, this solution (ES Module cyclic imports) is still very hypothetical - changing the behaviour of the start function is something we can do today that will make everyone's lives much easier.

@aturon
Copy link
Contributor Author

aturon commented Jan 20, 2018

I don't think that the ES Module stuff really affects the decision about what rustc should output: the pipeline outlined above involves multiple stages of transformations on the wasm file - it therefore really doesn't matter what rustc outputs, because the bundler stage can rewrite it however it wants.

I wanted to clarify a point here, just to make sure we're all on the same page.

We ultimately want to be able to e.g. publish a Rust crate as an npm package, and do so without consumers of the package knowing about Rust or requiring a Rust toolchain. To do this, we need to publish a compiled .wasm file to npm. Bundlers will then consume the package and its .wasm file and stitch things together.

Given that pipeline, any Rust-specific transformations need to happen prior to publishing to npm -- and hence, prior to the bundlers.

changing the behaviour of the start function is something we can do today that will make everyone's lives much easier.

Can you clarify a bit what specifically becomes easier today? I'm aware of learning curve issues (in that the way we treat main as start today leads to surprising problems), but is there more than that?

@Diggsey
Copy link

Diggsey commented Jan 20, 2018

To do this, we need to publish a compiled .wasm file to npm. Bundlers will then consume the package and its .wasm file and stitch things together.

True, then it might have to happen at one of the earlier stages, it depends what the bundler expects as input: we don't even know what the convention for packages published to npm will be yet! As I've mentioned before, emscripten outputs wasm files with a "start" section that doesn't include "main", so the only convention that has been established so far is to not call "main" from "start".

Given that pipeline, any Rust-specific transformations need to happen prior to publishing to npm -- and hence, prior to the bundlers.

This is always going to require some npm specific transformation prior to being published, so either there's going to be a tool to do this transformation where "start" can be rewritten, or it's going to be a separate target in rustc, and the decision made here need not be the same for both targets. This issue is about finding an answer for the -unknown target.

but is there more than that?

No, that's exactly what I meant.

@aturon
Copy link
Contributor Author

aturon commented Jan 27, 2018

Circling back to this, I want to articulate a discomfort that I think @Diggsey and I share: it seems problematic to tie Rust's semantics to a bundler convention, not least because bundlers are not the only way to use Rust-generated-wasm. These conventions are still under development, and things may also shift with changes to the wasm standard.

Given that this is all still highly experimental, it seems simplest and clearest to have rustc generate separate start and main symbols, which can then be handled by convention in the rest of the toolchain. If the bundler convention heads in the way that @lukewagner suggests, we'll be able to take advantage of that to have main do "the right thing". And if later on it becomes possible for us to make stronger guarantees about start via new wasm specs, at that point we can drop the special start attribute and have main emit start automatically.

@alexcrichton
Copy link
Contributor

I believe this was resolved in rust-lang/rust#47102 where we no longer generate the start section, which sounds like the consensus for now at least!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants