Implement extensible syscall interface for wasm #47102

Diggsey · 2018-01-01T15:51:48Z

Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O.

Any solution is constrained by these factors:

It must not be javascript specific
There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target)
WebAssembly does not allow "optional" imports - all imports must be resolved.
WebAssembly does not support calling the host environment through any channel other than imports.

The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid:

imports.env = { rust_wasm_syscall: function(index, data) {} }

Currently I have implemented these system calls:

Read from stdin
Write to stdout/stderr
Set the exit code
Get command line arguments
Get environment variable
Set environment variable
Get time

It need not be extended beyond this set if being able to run tests for this target is the only goal.

edit:
As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance.

To solve this, ~~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (#![wasm_auto_run])~~ I disabled this behaviour when building tests.

rust-highfive · 2018-01-01T15:51:53Z

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

est31 · 2018-01-01T18:35:09Z

I agree with the idea of this PR and from a cursory look it looks great. It really is independent from js. It doesn't even use json for the communication, that's great. The only question remaining is about feature gating. I think following established procedures here is important before this is available on stable and nothing about the design can't be changed any more. Although I've heard people say that the stability promise is only valid for tier1 targets? Dunno.

RReverser · 2018-01-01T19:02:15Z

In general, as I said before in related issue, I'm very supportive of providing a generic host-agnostic interface for std bindings to WebAssembly, but I'm not sure about using a single syscall entry point - it seems sort of error prone and suboptimal as 1) host might easily miss implementation of some syscalls (especially when Rust side adds support for new ones), in which case they will just silently be no-op and 2) if Rust uses only few syscalls, it's hard to know on the host side which exactly without looking through entire code or the generated wasm, and so it's impossible to eliminate unused code as it's all inside of single function with switch.

IMO it would make sense to split bindings into separate functions so that host could implement only those that are used, and when it misses implementations for some of them, WebAssembly constructor will immediately emit an error about missing imports upon loading, so it will be both easy to track what's going on and eliminate anything unused for constrained targets.

Diggsey · 2018-01-01T19:16:28Z

@RReverser one of the goals of this PR is to make it possible to extend without breaking existing uses. All libstd operations are currently implemented as no-ops, so if we add new ones then existing code will continue to function as it used to. If we added a new import each time we made a change, then every user would have to continually update the code hosting the WebAssembly module.

A generic host-agnostic interface for std bindings using wasm imports is impossible to reconcile with rust's stability guarantees.

est31 · 2018-01-01T20:11:46Z

src/etc/wasm32-shim.js

+      case 2: syscall_exit(viewstruct(data, 1)); break;
+      case 3: syscall_args(viewstruct(data, 3)); break;
+      case 4: syscall_getenv(viewstruct(data, 5)); break;
+      default: console.log("Unsupported syscall: " + index.toString());


I think it is better to return a value to the Rust program instead, indicating "unsupported syscall". This would allow the program to handle the "unsupported syscall" itself, maybe in a non fatal way.

This should already handle unsupported syscalls in a non-fatal way. For example, if SetEnv is called, it turns into a no-op and we simply log the fact that it went unhandled. On the rust side there's no way to indicate that an error happened, so either we silently do nothing or we panic. I don't think we get anything from adding a return value?

est31 · 2018-01-01T20:30:27Z

if Rust uses only few syscalls, it's hard to know on the host side which exactly without looking through entire code or the generated wasm, and so it's impossible to eliminate unused code as it's all inside of single function with switch.

That's of a concern when you ship the wasm together with glue code, e.g. the browser embedding use case where you have additional control over the js layer. The browser use case can however already be covered by specific targets and the js! macro.

If you have a wasm-first embedding system, e.g. for your cryptocurrency or for a platform independent plugin system for your DAW, you wouldn't ship the syscall implementations as they are already provided by the host program (of course, the host program would need to ship with an implementation for all the syscall implementations).

The great advantage of @Diggsey 's approach is that if a syscall is not implemented, it is not neccessarily fatal. This has several upsides:

if you are prototyping, you can just implement that single function and say "unsupported syscall". You don't need to create stubs for a large range of syscalls.
if you are just starting out with wasm and don't know anything, having a linker error for one syscall would be better than a large amount for multiple syscalls.
if you don't use a feature (like printing to stdout) but some crate used by you still emits code that uses stdout printing, and you don't use that code but dead code elimination isn't smart enough so the call still ends up in the wasm, then having a runtime error is perfectly acceptable. This isn't some remote use case I think but very relevant to wasm as a ton of crates assume various std features to be present.

rpjohnst · 2018-01-01T20:34:53Z

The -unknown target must forever and always be able to generate modules with zero imports, not just one possibly-no-op one. Thus, this needs to be configurable much like the global allocator or panic runtime.

The syscall interface should not use syscall numbers. It should use multiple imports- they don't need to be stabilized immediately, if ever. If they are, they're much closer to the target specification format than to std's surface area. Using multiple imports also ties into the portability lint, so that we can check at compile time whether the target environment supports a syscall.

We should not be relying on dead code elimination and runtime errors for the core functionality here.

(The interface also should not look anything like POSIX, though this PR doesn't really attempt to lay out a design for that anyway so we can probably hold off on that discussion for now.)

est31 · 2018-01-01T20:46:06Z

WebAssembly does not allow "optional" imports - all imports must be resolved.

I've looked up JS API docs and that while this does seem to be correct, one can easily list all the required imports of a wasm module. So I think we might want to switch to multiple functions after all? I think I'm on the fence on this now.

Diggsey · 2018-01-01T20:50:28Z

@est31 the full list of imports will include user defined imports. You would need to implement a naming scheme to distinguish those imports that should be automatically generated from those which the user is implementing, and the naming scheme would also somehow need to indicate what the arguments and return type should be. It could be quite problematic if you wanted to host wasm in a statically typed language.

est31 · 2018-01-01T22:53:30Z

You would need to implement a naming scheme to distinguish those imports that should be automatically generated from those which the user is implementing

Yeah. If this is thought further you'd arrive close to my suggestion to use the js! macro, as such macros always apply mangling.

the naming scheme would also somehow need to indicate what the arguments and return type should be. It could be quite problematic if you wanted to host wasm in a statically typed language.

Other embeddings than the JS embedding will have different APIs. If you only want to return an error, you don't need any arguments or return types, do you?

Diggsey · 2018-01-01T23:15:59Z

Other embeddings than the JS embedding will have different APIs. If you only want to return an error, you don't need any arguments or return types, do you?

Since wasm is strongly typed, it's likely that any wasm bindings to a static language would require that the signatures of your imports match the signatures being imported. It may not even be possible to dynamically generate imports - they may have to be specified at compile time.

It's probably possible with enough work, but at that point what are you really gaining? If we do it via imports, then users have to worry about backwards compatibility, and will have to implement this generator and name de-mangler in their host so that when they update rust their programs keep working. It seems to me they shouldn't have to care about that stuff.

Another problem with the current situation is that we depend on the dead code elimination step. If the dead code elimination ever changes in any way it potentially breaks all users of the wasm target. Because of this, the guarantees that webassembly gives you WRT to matching up imports don't really mean all that much...

RReverser · 2018-01-02T00:36:55Z

You would need to implement a naming scheme to distinguish those imports

Just like in this PR you need to implement an encoding scheme where each syscall ID corresponds to specific operation; IMO either has equal complexity, while other one allows names to be more descriptive.

Diggsey · 2018-01-02T00:38:18Z

@RReverser no, in this PR you don't need to do anything other than define one no-op import and you get the same behaviour as you would today.

RReverser · 2018-01-02T00:38:26Z

If we added a new import each time we made a change, then every user would have to continually update the code hosting the WebAssembly module.

Not at all - new imports are required only for new std functionality, so if the code doesn't use anything new from std, it will continue to work exactly as it used to, as that import simply won't be linked in.

RReverser · 2018-01-02T00:39:33Z

in this PR you don't need to do anything other than define one no-op import

I'm talking about real-world usecase where all syscalls are implemented, not just the stub - that one is easy bit with any approach.

Diggsey · 2018-01-02T00:42:04Z

Not at all - new imports are required only for new std functionality, so if the code doesn't use anything new from std, it will continue to work exactly as it used to, as that import simply won't be linked in.

As I said, that depends on dead code elimination, and that's not guaranteed to happen in a multi-crate scenario. That means adding new imports will break code.

I'm talking about real-world usecase where all syscalls are implemented, not just the stub - that one is easy bit with any approach.

By that definition, the current wasm target is completely useless, so this PR is an improvement either way... Also, the real world usecase is where only some syscalls are implemented. Testing is a real world usecase and it only needs basic IO.

RReverser · 2018-01-02T00:59:32Z

As I said, that depends on dead code elimination, and that's not guaranteed to happen in a multi-crate scenario. That means adding new imports will break code.

Why on dead code elimination? It's just regular linkage which pulls only required imports from std, and then, correspondingly, from whatever it uses. If you print LLVM IR of "hello world" in debug mode, you'll see that it has only the symbols defined that are actually required for the entry point of the app & for println and not all of std.

RReverser · 2018-01-02T01:01:22Z

By that definition, the current wasm target is completely useless

Agreed, but

so this PR is an improvement either way...

is a slippery slope to valuate PRs IMO, as once we introduce some approach, it will be much harder if not impossible to introduce breaking changes to the ecosystem, so it's worth discussing pros/cons of all options on issue or PR before merging anything.

Diggsey · 2018-01-02T01:21:30Z

Why on dead code elimination? It's just regular linkage which pulls only required imports from std, and then, correspondingly, from whatever it uses. If you print LLVM IR of "hello world" in debug mode, you'll see that it has only the symbols defined that are actually required for the entry point of all & println and not all of std.

Linking happens at an object file level, if you pull in an object file, the linker pulls in any symbols required by the object as a whole, even if they would not be used by the main program: dependencies are tracked from object file -> symbol, not from symbol -> symbol. If symbols are removed beyond that, then that's an additional optimisation that we shouldn't be relying on for correctness.

There are other ways using imports can cause issues:

Disabling a feature when on wasm by branching in the code. In this case, although we're never actually going to call the unsupported feature, the compiler will still require the import.
Changing the implementation of a libstd method. Let's say we implement a "get time" syscall, to support the time API. Now some time later we decide we need it to be timezone aware, so we change our implementation to use two syscalls: one to get the timezone, and one to get the time. Now existing code requires an extra import. With this PR, the timezone syscall could have a fallback to the previous behaviour.

RReverser · 2018-01-02T02:13:09Z

Disabling a feature when on wasm by branching in the code.

If we're branching using cfg! (as we should), then, again, without optimisations only one branch will be generated by Rust, as per compile-time config.

With this PR, the timezone syscall could have a fallback to the previous behaviour.

I think this and few other concerns were addressed above with the "reading all imports" approach? (which, as you noted, will require a naming scheme, but that's not a big problem given that for current PR you also need a unique name, might as well use it as a prefix for namespace)

est31 · 2018-01-02T02:28:38Z

If we're branching using cfg! (as we should), then, again, without optimisations only one branch will be generated by Rust, as per compile-time config.

cfg! only expands to an expression. It requires (quite trivial) optimisations to get if false { ... } eliminated. The other point is that whether some code not controlled by you performs some syscalls or not might depend on an input parameter and you only call it with "don't perform those syscalls". The optimizer might detect this, then it eliminates the call, or it might not detect this. This might be due to bad design of the code, or maybe is completely legitimate. But you shouldn't be required to ask those people to adapt the code to wasm, or depend on the optimizer. I do agree with @Diggsey that the behaviour for unimplemented syscalls shouldn't be a linker error.

RReverser · 2018-01-02T02:57:43Z

unimplemented syscalls shouldn't be a linker error

Not a linker, but construct-time (when you still have a chance to check .imports() and provide stubs for everything if that's really what you want). But anyway. It is an error on any other platform, why not on WASM? It's not that different.

Diggsey · 2018-01-30T22:44:58Z

OK, I'm in the process of updating the PR - should I be adding the #[wasm_start] attribute, stick with #![wasm_autostart], or remove it entirely?

I looked into adding #[wasm_start] and the implementation looks fairly non-trivial, so if that's the direction we're going, I'd appreciate some pointers on how best to implement that.

alexcrichton · 2018-01-30T23:00:02Z

I'd be ok with removing it entirely for now and adding it back on an as-needed basis.

Diggsey · 2018-01-31T00:52:06Z

Alright, PR updated.

alexcrichton · 2018-01-31T06:32:35Z

@bors: r+

bors · 2018-01-31T06:32:36Z

📌 Commit 0e6601f has been approved by alexcrichton

bors · 2018-01-31T23:52:22Z

⌛ Testing commit 0e6601f with merge ddc3b6814c52b2bf912ba53cba66d5b4a06b81d8...

bors · 2018-02-01T02:53:12Z

💔 Test failed - status-appveyor

kennytm · 2018-02-01T07:58:21Z

@bors retry #46903

bors · 2018-02-01T10:24:40Z

⌛ Testing commit 0e6601f with merge d8a8710326bd379d0fff5b698b0dfb0140dd9d91...

bors · 2018-02-01T13:25:06Z

💔 Test failed - status-appveyor

Diggsey · 2018-02-01T13:28:52Z

One of the appveyor targets seems to be timing out at 3 hours. I don't think this is anything I've done?

kennytm · 2018-02-01T14:01:14Z

@bors retry #46903

bors · 2018-02-01T16:22:22Z

⌛ Testing commit 0e6601f with merge acc1b82...

Implement extensible syscall interface for wasm Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O. Any solution is constrained by these factors: - It must not be javascript specific - There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target) - WebAssembly does not allow "optional" imports - all imports *must* be resolved. - WebAssembly does not support calling the host environment through any channel *other* than imports. The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid: ```javascript imports.env = { rust_wasm_syscall: function(index, data) {} } ``` Currently I have implemented these system calls: - Read from stdin - Write to stdout/stderr - Set the exit code - Get command line arguments - Get environment variable - Set environment variable - Get time It need not be extended beyond this set if being able to run tests for this target is the only goal. edit: As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance. To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.

bors · 2018-02-01T19:24:38Z

💔 Test failed - status-appveyor

kennytm · 2018-02-01T19:54:43Z

@bors retry #46903

(cc @alexcrichton you may want to merge this manually; current testing PR is a beta backport, so no need to retry anything.)

bors · 2018-02-02T01:27:20Z

⌛ Testing commit 0e6601f with merge 6741e41...

Implement extensible syscall interface for wasm Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O. Any solution is constrained by these factors: - It must not be javascript specific - There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target) - WebAssembly does not allow "optional" imports - all imports *must* be resolved. - WebAssembly does not support calling the host environment through any channel *other* than imports. The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid: ```javascript imports.env = { rust_wasm_syscall: function(index, data) {} } ``` Currently I have implemented these system calls: - Read from stdin - Write to stdout/stderr - Set the exit code - Get command line arguments - Get environment variable - Set environment variable - Get time It need not be extended beyond this set if being able to run tests for this target is the only goal. edit: As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance. To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.

bors · 2018-02-02T04:19:08Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 6741e41 to master...

rust-highfive assigned aidanhs Jan 1, 2018

est31 mentioned this pull request Jan 1, 2018

Tracking issue for supporting asm.js and WebAssembly without Fastcomp #44006

Closed

5 tasks

Diggsey force-pushed the wasm-syscall branch from c982a65 to d1be0d8 Compare January 1, 2018 20:03

est31 reviewed Jan 1, 2018

View reviewed changes

Diggsey force-pushed the wasm-syscall branch from d1be0d8 to 51e0b20 Compare January 1, 2018 23:04

Diggsey force-pushed the wasm-syscall branch from 51e0b20 to 40d0b22 Compare January 2, 2018 02:14

kennytm added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2018

Diggsey force-pushed the wasm-syscall branch from 40d0b22 to 8840bb2 Compare January 2, 2018 10:54

Diggsey force-pushed the wasm-syscall branch from 31bbd9e to 011c6b2 Compare January 30, 2018 22:10

Diggsey force-pushed the wasm-syscall branch from 011c6b2 to d009e15 Compare January 30, 2018 23:14

Diggsey added 2 commits January 30, 2018 23:22

Implement extensible syscall interface for wasm

36695a3

Add wasm_syscall feature to build system

0e6601f

Diggsey force-pushed the wasm-syscall branch from d009e15 to 0e6601f Compare January 30, 2018 23:26

kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 31, 2018

bors merged commit 0e6601f into rust-lang:master Feb 2, 2018

tomaka mentioned this pull request Mar 26, 2018

Put stdweb dependency behind a target feature rust-random/rand#336

Merged

Implement extensible syscall interface for wasm #47102

Implement extensible syscall interface for wasm #47102

Conversation

Diggsey commented Jan 1, 2018 • edited

rust-highfive commented Jan 1, 2018

est31 commented Jan 1, 2018 • edited

RReverser commented Jan 1, 2018

Diggsey commented Jan 1, 2018

est31 Jan 1, 2018

Choose a reason for hiding this comment

Diggsey Jan 1, 2018

Choose a reason for hiding this comment

est31 commented Jan 1, 2018

rpjohnst commented Jan 1, 2018 • edited

est31 commented Jan 1, 2018

Diggsey commented Jan 1, 2018

est31 commented Jan 1, 2018

Diggsey commented Jan 1, 2018

RReverser commented Jan 2, 2018

Diggsey commented Jan 2, 2018

RReverser commented Jan 2, 2018

RReverser commented Jan 2, 2018

Diggsey commented Jan 2, 2018

RReverser commented Jan 2, 2018 • edited

RReverser commented Jan 2, 2018

Diggsey commented Jan 2, 2018 • edited

RReverser commented Jan 2, 2018

est31 commented Jan 2, 2018

RReverser commented Jan 2, 2018

Diggsey commented Jan 30, 2018

alexcrichton commented Jan 30, 2018

Diggsey commented Jan 31, 2018

alexcrichton commented Jan 31, 2018

bors commented Jan 31, 2018

bors commented Jan 31, 2018

bors commented Feb 1, 2018

kennytm commented Feb 1, 2018

bors commented Feb 1, 2018

bors commented Feb 1, 2018

Diggsey commented Feb 1, 2018

kennytm commented Feb 1, 2018

bors commented Feb 1, 2018

bors commented Feb 1, 2018

kennytm commented Feb 1, 2018 • edited

bors commented Feb 2, 2018

bors commented Feb 2, 2018

Diggsey commented Jan 1, 2018 •

edited

est31 commented Jan 1, 2018 •

edited

rpjohnst commented Jan 1, 2018 •

edited

RReverser commented Jan 2, 2018 •

edited

Diggsey commented Jan 2, 2018 •

edited

kennytm commented Feb 1, 2018 •

edited