Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement extensible syscall interface for wasm #47102

Merged
merged 2 commits into from Feb 2, 2018

Conversation

Diggsey
Copy link
Contributor

@Diggsey Diggsey commented Jan 1, 2018

Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O.

Any solution is constrained by these factors:

  • It must not be javascript specific
  • There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target)
  • WebAssembly does not allow "optional" imports - all imports must be resolved.
  • WebAssembly does not support calling the host environment through any channel other than imports.

The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid:

imports.env = { rust_wasm_syscall: function(index, data) {} }

Currently I have implemented these system calls:

  • Read from stdin
  • Write to stdout/stderr
  • Set the exit code
  • Get command line arguments
  • Get environment variable
  • Set environment variable
  • Get time

It need not be extended beyond this set if being able to run tests for this target is the only goal.

edit:
As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance.

To solve this, I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (#![wasm_auto_run]) I disabled this behaviour when building tests.

@rust-highfive
Copy link
Collaborator

r? @aidanhs

(rust_highfive has picked a reviewer for you, use r? to override)

@est31
Copy link
Member

est31 commented Jan 1, 2018

I agree with the idea of this PR and from a cursory look it looks great. It really is independent from js. It doesn't even use json for the communication, that's great. The only question remaining is about feature gating. I think following established procedures here is important before this is available on stable and nothing about the design can't be changed any more. Although I've heard people say that the stability promise is only valid for tier1 targets? Dunno.

@RReverser
Copy link
Contributor

In general, as I said before in related issue, I'm very supportive of providing a generic host-agnostic interface for std bindings to WebAssembly, but I'm not sure about using a single syscall entry point - it seems sort of error prone and suboptimal as 1) host might easily miss implementation of some syscalls (especially when Rust side adds support for new ones), in which case they will just silently be no-op and 2) if Rust uses only few syscalls, it's hard to know on the host side which exactly without looking through entire code or the generated wasm, and so it's impossible to eliminate unused code as it's all inside of single function with switch.

IMO it would make sense to split bindings into separate functions so that host could implement only those that are used, and when it misses implementations for some of them, WebAssembly constructor will immediately emit an error about missing imports upon loading, so it will be both easy to track what's going on and eliminate anything unused for constrained targets.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 1, 2018

@RReverser one of the goals of this PR is to make it possible to extend without breaking existing uses. All libstd operations are currently implemented as no-ops, so if we add new ones then existing code will continue to function as it used to. If we added a new import each time we made a change, then every user would have to continually update the code hosting the WebAssembly module.

A generic host-agnostic interface for std bindings using wasm imports is impossible to reconcile with rust's stability guarantees.

case 2: syscall_exit(viewstruct(data, 1)); break;
case 3: syscall_args(viewstruct(data, 3)); break;
case 4: syscall_getenv(viewstruct(data, 5)); break;
default: console.log("Unsupported syscall: " + index.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to return a value to the Rust program instead, indicating "unsupported syscall". This would allow the program to handle the "unsupported syscall" itself, maybe in a non fatal way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already handle unsupported syscalls in a non-fatal way. For example, if SetEnv is called, it turns into a no-op and we simply log the fact that it went unhandled. On the rust side there's no way to indicate that an error happened, so either we silently do nothing or we panic. I don't think we get anything from adding a return value?

@est31
Copy link
Member

est31 commented Jan 1, 2018

if Rust uses only few syscalls, it's hard to know on the host side which exactly without looking through entire code or the generated wasm, and so it's impossible to eliminate unused code as it's all inside of single function with switch.

That's of a concern when you ship the wasm together with glue code, e.g. the browser embedding use case where you have additional control over the js layer. The browser use case can however already be covered by specific targets and the js! macro.

If you have a wasm-first embedding system, e.g. for your cryptocurrency or for a platform independent plugin system for your DAW, you wouldn't ship the syscall implementations as they are already provided by the host program (of course, the host program would need to ship with an implementation for all the syscall implementations).

The great advantage of @Diggsey 's approach is that if a syscall is not implemented, it is not neccessarily fatal. This has several upsides:

  • if you are prototyping, you can just implement that single function and say "unsupported syscall". You don't need to create stubs for a large range of syscalls.
  • if you are just starting out with wasm and don't know anything, having a linker error for one syscall would be better than a large amount for multiple syscalls.
  • if you don't use a feature (like printing to stdout) but some crate used by you still emits code that uses stdout printing, and you don't use that code but dead code elimination isn't smart enough so the call still ends up in the wasm, then having a runtime error is perfectly acceptable. This isn't some remote use case I think but very relevant to wasm as a ton of crates assume various std features to be present.

@rpjohnst
Copy link
Contributor

rpjohnst commented Jan 1, 2018

The -unknown target must forever and always be able to generate modules with zero imports, not just one possibly-no-op one. Thus, this needs to be configurable much like the global allocator or panic runtime.

The syscall interface should not use syscall numbers. It should use multiple imports- they don't need to be stabilized immediately, if ever. If they are, they're much closer to the target specification format than to std's surface area. Using multiple imports also ties into the portability lint, so that we can check at compile time whether the target environment supports a syscall.

We should not be relying on dead code elimination and runtime errors for the core functionality here.

(The interface also should not look anything like POSIX, though this PR doesn't really attempt to lay out a design for that anyway so we can probably hold off on that discussion for now.)

@est31
Copy link
Member

est31 commented Jan 1, 2018

WebAssembly does not allow "optional" imports - all imports must be resolved.

I've looked up JS API docs and that while this does seem to be correct, one can easily list all the required imports of a wasm module. So I think we might want to switch to multiple functions after all? I think I'm on the fence on this now.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 1, 2018

@est31 the full list of imports will include user defined imports. You would need to implement a naming scheme to distinguish those imports that should be automatically generated from those which the user is implementing, and the naming scheme would also somehow need to indicate what the arguments and return type should be. It could be quite problematic if you wanted to host wasm in a statically typed language.

@est31
Copy link
Member

est31 commented Jan 1, 2018

You would need to implement a naming scheme to distinguish those imports that should be automatically generated from those which the user is implementing

Yeah. If this is thought further you'd arrive close to my suggestion to use the js! macro, as such macros always apply mangling.

the naming scheme would also somehow need to indicate what the arguments and return type should be. It could be quite problematic if you wanted to host wasm in a statically typed language.

Other embeddings than the JS embedding will have different APIs. If you only want to return an error, you don't need any arguments or return types, do you?

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 1, 2018

Other embeddings than the JS embedding will have different APIs. If you only want to return an error, you don't need any arguments or return types, do you?

Since wasm is strongly typed, it's likely that any wasm bindings to a static language would require that the signatures of your imports match the signatures being imported. It may not even be possible to dynamically generate imports - they may have to be specified at compile time.

It's probably possible with enough work, but at that point what are you really gaining? If we do it via imports, then users have to worry about backwards compatibility, and will have to implement this generator and name de-mangler in their host so that when they update rust their programs keep working. It seems to me they shouldn't have to care about that stuff.

Another problem with the current situation is that we depend on the dead code elimination step. If the dead code elimination ever changes in any way it potentially breaks all users of the wasm target. Because of this, the guarantees that webassembly gives you WRT to matching up imports don't really mean all that much...

@RReverser
Copy link
Contributor

You would need to implement a naming scheme to distinguish those imports

Just like in this PR you need to implement an encoding scheme where each syscall ID corresponds to specific operation; IMO either has equal complexity, while other one allows names to be more descriptive.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 2, 2018

@RReverser no, in this PR you don't need to do anything other than define one no-op import and you get the same behaviour as you would today.

@RReverser
Copy link
Contributor

If we added a new import each time we made a change, then every user would have to continually update the code hosting the WebAssembly module.

Not at all - new imports are required only for new std functionality, so if the code doesn't use anything new from std, it will continue to work exactly as it used to, as that import simply won't be linked in.

@RReverser
Copy link
Contributor

in this PR you don't need to do anything other than define one no-op import

I'm talking about real-world usecase where all syscalls are implemented, not just the stub - that one is easy bit with any approach.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 2, 2018

Not at all - new imports are required only for new std functionality, so if the code doesn't use anything new from std, it will continue to work exactly as it used to, as that import simply won't be linked in.

As I said, that depends on dead code elimination, and that's not guaranteed to happen in a multi-crate scenario. That means adding new imports will break code.

I'm talking about real-world usecase where all syscalls are implemented, not just the stub - that one is easy bit with any approach.

By that definition, the current wasm target is completely useless, so this PR is an improvement either way... Also, the real world usecase is where only some syscalls are implemented. Testing is a real world usecase and it only needs basic IO.

@RReverser
Copy link
Contributor

RReverser commented Jan 2, 2018

As I said, that depends on dead code elimination, and that's not guaranteed to happen in a multi-crate scenario. That means adding new imports will break code.

Why on dead code elimination? It's just regular linkage which pulls only required imports from std, and then, correspondingly, from whatever it uses. If you print LLVM IR of "hello world" in debug mode, you'll see that it has only the symbols defined that are actually required for the entry point of the app & for println and not all of std.

@RReverser
Copy link
Contributor

By that definition, the current wasm target is completely useless

Agreed, but

so this PR is an improvement either way...

is a slippery slope to valuate PRs IMO, as once we introduce some approach, it will be much harder if not impossible to introduce breaking changes to the ecosystem, so it's worth discussing pros/cons of all options on issue or PR before merging anything.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 2, 2018

Why on dead code elimination? It's just regular linkage which pulls only required imports from std, and then, correspondingly, from whatever it uses. If you print LLVM IR of "hello world" in debug mode, you'll see that it has only the symbols defined that are actually required for the entry point of all & println and not all of std.

Linking happens at an object file level, if you pull in an object file, the linker pulls in any symbols required by the object as a whole, even if they would not be used by the main program: dependencies are tracked from object file -> symbol, not from symbol -> symbol. If symbols are removed beyond that, then that's an additional optimisation that we shouldn't be relying on for correctness.

There are other ways using imports can cause issues:

  • Disabling a feature when on wasm by branching in the code. In this case, although we're never actually going to call the unsupported feature, the compiler will still require the import.
  • Changing the implementation of a libstd method. Let's say we implement a "get time" syscall, to support the time API. Now some time later we decide we need it to be timezone aware, so we change our implementation to use two syscalls: one to get the timezone, and one to get the time. Now existing code requires an extra import. With this PR, the timezone syscall could have a fallback to the previous behaviour.

@RReverser
Copy link
Contributor

Disabling a feature when on wasm by branching in the code.

If we're branching using cfg! (as we should), then, again, without optimisations only one branch will be generated by Rust, as per compile-time config.

With this PR, the timezone syscall could have a fallback to the previous behaviour.

I think this and few other concerns were addressed above with the "reading all imports" approach? (which, as you noted, will require a naming scheme, but that's not a big problem given that for current PR you also need a unique name, might as well use it as a prefix for namespace)

@est31
Copy link
Member

est31 commented Jan 2, 2018

If we're branching using cfg! (as we should), then, again, without optimisations only one branch will be generated by Rust, as per compile-time config.

cfg! only expands to an expression. It requires (quite trivial) optimisations to get if false { ... } eliminated. The other point is that whether some code not controlled by you performs some syscalls or not might depend on an input parameter and you only call it with "don't perform those syscalls". The optimizer might detect this, then it eliminates the call, or it might not detect this. This might be due to bad design of the code, or maybe is completely legitimate. But you shouldn't be required to ask those people to adapt the code to wasm, or depend on the optimizer. I do agree with @Diggsey that the behaviour for unimplemented syscalls shouldn't be a linker error.

@RReverser
Copy link
Contributor

unimplemented syscalls shouldn't be a linker error

Not a linker, but construct-time (when you still have a chance to check .imports() and provide stubs for everything if that's really what you want). But anyway. It is an error on any other platform, why not on WASM? It's not that different.

@kennytm kennytm added O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2018
@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 30, 2018

OK, I'm in the process of updating the PR - should I be adding the #[wasm_start] attribute, stick with #![wasm_autostart], or remove it entirely?

I looked into adding #[wasm_start] and the implementation looks fairly non-trivial, so if that's the direction we're going, I'd appreciate some pointers on how best to implement that.

@alexcrichton
Copy link
Member

I'd be ok with removing it entirely for now and adding it back on an as-needed basis.

@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 31, 2018

Alright, PR updated.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jan 31, 2018

📌 Commit 0e6601f has been approved by alexcrichton

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 31, 2018
@bors
Copy link
Contributor

bors commented Jan 31, 2018

⌛ Testing commit 0e6601f with merge ddc3b6814c52b2bf912ba53cba66d5b4a06b81d8...

@bors
Copy link
Contributor

bors commented Feb 1, 2018

💔 Test failed - status-appveyor

@kennytm
Copy link
Member

kennytm commented Feb 1, 2018

@bors retry #46903

@bors
Copy link
Contributor

bors commented Feb 1, 2018

⌛ Testing commit 0e6601f with merge d8a8710326bd379d0fff5b698b0dfb0140dd9d91...

@bors
Copy link
Contributor

bors commented Feb 1, 2018

💔 Test failed - status-appveyor

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 1, 2018

One of the appveyor targets seems to be timing out at 3 hours. I don't think this is anything I've done?

@kennytm
Copy link
Member

kennytm commented Feb 1, 2018

@bors retry #46903

@bors
Copy link
Contributor

bors commented Feb 1, 2018

⌛ Testing commit 0e6601f with merge acc1b82...

bors added a commit that referenced this pull request Feb 1, 2018
Implement extensible syscall interface for wasm

Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O.

Any solution is constrained by these factors:
- It must not be javascript specific
- There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target)
- WebAssembly does not allow "optional" imports - all imports *must* be resolved.
- WebAssembly does not support calling the host environment through any channel *other* than imports.

The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid:
```javascript
imports.env = { rust_wasm_syscall: function(index, data) {} }
```

Currently I have implemented these system calls:
- Read from stdin
- Write to stdout/stderr
- Set the exit code
- Get command line arguments
- Get environment variable
- Set environment variable
- Get time

It need not be extended beyond this set if being able to run tests for this target is the only goal.

edit:
As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance.

To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.
@bors
Copy link
Contributor

bors commented Feb 1, 2018

💔 Test failed - status-appveyor

@kennytm
Copy link
Member

kennytm commented Feb 1, 2018

@bors retry #46903

(cc @alexcrichton you may want to merge this manually; current testing PR is a beta backport, so no need to retry anything.)

@bors
Copy link
Contributor

bors commented Feb 2, 2018

⌛ Testing commit 0e6601f with merge 6741e41...

bors added a commit that referenced this pull request Feb 2, 2018
Implement extensible syscall interface for wasm

Currently it's possible to run tests with the native wasm target, but it's not possible to tell whether they pass or to capture the output, because libstd throws away stdout, stderr and the exit code. While advanced libstd features should probably require more specific targets (eg. wasm-unknown-web) I think even the unknown target should at least support basic I/O.

Any solution is constrained by these factors:
- It must not be javascript specific
- There must not be too strong coupling between libstd and the host environment (because it's an "unknown" target)
- WebAssembly does not allow "optional" imports - all imports *must* be resolved.
- WebAssembly does not support calling the host environment through any channel *other* than imports.

The best solution I could find to these constraints was to give libstd a single required import, and implement a syscall-style interface through that import. Each syscall is designed such that a no-op implementation gives the most reasonable fallback behaviour. This means that the following import table would be perfectly valid:
```javascript
imports.env = { rust_wasm_syscall: function(index, data) {} }
```

Currently I have implemented these system calls:
- Read from stdin
- Write to stdout/stderr
- Set the exit code
- Get command line arguments
- Get environment variable
- Set environment variable
- Get time

It need not be extended beyond this set if being able to run tests for this target is the only goal.

edit:
As part of this PR I had to make a further change. Previously, the rust entry point would be automatically called when the webassembly module was instantiated. This was problematic because from the javascript side it was impossible to call exported functions, access program memory or get a reference to the instance.

To solve this, ~I changed the default behaviour to not automatically call the entry point, and added a crate-level attribute to regain the old behaviour. (`#![wasm_auto_run]`)~ I disabled this behaviour when building tests.
@bors
Copy link
Contributor

bors commented Feb 2, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 6741e41 to master...

@bors bors merged commit 0e6601f into rust-lang:master Feb 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-wasm Target: WASM (WebAssembly), http://webassembly.org/ S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet