New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out host environment assumptions for wasm-unknown-unknown #16

Open
aturon opened this Issue Jan 17, 2018 · 29 comments

Comments

Projects
None yet
6 participants
@aturon
Contributor

aturon commented Jan 17, 2018

At the moment, the wasm32-unknown-unknown target assuming nothing about its host environment. That means that std, as it stands, cannot even print to stdout.

There's some ongoing debate on a Rust PR about whether and how to approach this issue. I wanted to open an issue here just to get more visibility -- but please comment on the linked PR.

@aturon

This comment has been minimized.

Contributor

aturon commented Jan 17, 2018

cc @lukewagner, I bet you have thoughts.

@aturon

This comment has been minimized.

Contributor

aturon commented Jan 17, 2018

In a nutshell, the PR proposes to add a general "syscall" function which is assumed to always be imported by the wasm module. That's then leveraged to provide functionality like printing and getting the current time.

@aturon

This comment has been minimized.

Contributor

aturon commented Jan 17, 2018

I posted a summary comment which is probably a good way to jump in.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 17, 2018

@aturon Great job summarising that mega thread, here's a few more things related to specifically to this issue:

In WebAssembly, all imports must be fulfilled by the host. It's not possible to have optional imports. If we assume that at some point in the future we will care about backwards compatibility, then we're only left with four choices:

  1. Use a single import, and implement something resembling the system call interface I designed in the above PR.
  2. Use multiple imports, and just never add any new ones after stabilisation.
  3. Require hosts to dynamically generate all required imports.
  4. Use a system-call like interface internally to libstd, and define wasm imports in a separate crate, similarly to how we can link in different allocators. Periodically we release a new major version of the "imports" crate which contains any new functions we wanted to import. Old programs can continue linking to older versions of the imports crate.

For 1) some people complained about using numerical identifiers for system calls. I don't personally think this is an issue, but it's worth noting that you could store and export a mapping of indexes to names as part of the compiled wasm file if you really wanted.

For 2) the downsides are self-evident.

For 3) it places the burden of backwards compatibility on users: we will never be able to provide this generator ourselves, because this target is intended to work anywhere in any host language, and so we must carefully specify exactly what algorithm should be used to generate imports from a list of names. This generator will be complex to implement as we must use name mangling to encode the type signatures of functions that we import. Furthermore, dynamically generating imports may not be easy or even possible for all hosts.

For 4) from the point of view of libstd itself, this is technically the same solution as 1). The benefit is that it provides a more normal interface to the host language, at the cost of increased work to maintain this additional "imports" crate.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 17, 2018

Another thing to add is that imports may be fulfilled either via the wasm imports section, or by linking to libraries which export those functions. This means that even if we add imports to libstd, it's possible to generate a wasm file without imports by implementing them in rust.

@fitzgen

This comment has been minimized.

Member

fitzgen commented Jan 17, 2018

It is already very difficult to create small .wasm files, even with wasm32-unknown-unknown, wasm-gc, wasm-opt, and wasm-snip. I am very wary of baking in more code that I do not use and which will likely add more call graph edges to panicking and formatting infrastructure code that I have to go to trouble to remove. This sounds like a different target, if it is making assumptions about capabilities that exist in the host. For the purposes of minimizing code size, having zero assumptions is great.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 17, 2018

@fitzgen if you really want to go bare metal, wouldn't it make more sense to go #![no_std]? Libstd is supposed to be precisely the set of things that do require capabilities from the host/os/whatever.

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 18, 2018

The very idea of an -unknown target is that you don't know what your environment provides. The idea of a uniform way for std to talk to the environment is good; putting it in -unknown is bad.

It belongs in another target like -stdweb or -web or -node depending on how that question gets resolved- just like libc-based targets have -msvc, -gnu, and -musl. (...and -unknown for bare metal!)

@Diggsey

This comment has been minimized.

Diggsey commented Jan 18, 2018

@rpjohnst
It doesn't make sense to put it in -stdweb or -web or -node because none of this stuff is specific to one of those targets - it works on any target, hence the -unknown.

I don't think it makes sense on any target to have an implementation of libstd that does nothing - certainly that's not a precedent that has been set with any other targets - we don't implement a stubbed out libstd for bare-metal x86 targets.

Either we should say that if you decide to use libstd on the -unknown-unknown target, then you do have some expectations of your environment, or we should make it a #![no_std]-only target, and add a new target, let's say wasm32-unknown-unknown-rust.

Personally I don't see much value in splitting out the two targets, since one is never going to support libstd, and the other will always behave identically to the first in #![no_std] contexts, but I don't particularly care either way.

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 18, 2018

It doesn't make sense to put it in -stdweb or -web or -node because none of this stuff is specific to one of those targets - it works on any target, hence the -unknown.

That's fine, call it -rust or -std. That's not the issue.

Either we should say that if you decide to use libstd on the -unknown-unknown target, then you do have some expectations of your environment, or we should make it a #![no_std]-only target, and add a new target, let's say wasm32-unknown-unknown-rust.

I'm talking about the former. If you use std on an -unknown target, it should fail to build unless the the crate graph provides what it needs- not lie about successfully linking when it actually produced an incomplete binary.

This is how everything else works. If you try to build a #![no_std] executable without defining the necessary lang items, the build fails. If you try to build a -sys crate without providing the library, the build fails.

When you treat that incomplete binary as a successful build, you are placing requirements on the environment (just like, say, an installed libc) and must use a target that expresses that.

the other will always behave identically to the first in #![no_std] contexts

I don't think this is quite enough. If you want to use std without the standard syscall interface, you should be able to use the -unknown target for that. #![no_std] disabling it is insufficient.

This all ignores the portability lint, too. We definitely want to disable parts of std when running on -web, and conflating that with -unknown makes that harder.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 18, 2018

I don't think this is quite enough. If you want to use std without the standard syscall interface, you should be able to use the -unknown target for that. #![no_std] disabling it is insufficient.

That doesn't make sense: how can using #![no_std] not be sufficient? What do you expect using libstd to do if it doesn't have access to any system calls?

This all ignores the portability lint, too. We definitely want to disable parts of std when running on -web, and conflating that with -unknown makes that harder.

You keep bringing up web but this has nothing to do with web, nothing in the referenced PR or anything I've said is specific to web. I don't expect this syscall interface would even need to be used for web once there is a separate web target.

What this does conflate is a pluggable libstd with no libstd. The rationale being that you can choose between them with the #![no_std] attribute.

In case it's not clear: the whole point of the system call interface is that we don't provide the implementation. The implementation is provided for the specific host, or could even be implemented from rust.

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 18, 2018

What do you expect using libstd to do if it doesn't have access to any system calls?

In one sense, std can do plenty without any system calls whatsoever- remember it can run whole test suites as long as you don't care about printing the results.

In the sense I meant, you could have your own host interface (e.g. for fitting into some existing Javascript library) and you get std running the same way you get #![no_core] binaries or -sys libraries going- by providing it with the necessary functionality before the compiler produces its final output.

I guess you could say I'm arguing for these syscalls to be exposed in Rust for crates to fill in if necessary, rather than forcing them to be wasm imports.

this has nothing to do with web

You misunderstand me. The web is just an example. I'm not arguing for -unknown not to support std at all. I'm arguing for it not to default to producing wasm modules with dependencies that may not be possible to meet, given that the host is unknown. If you want to use std on -unknown, you declare the imports yourself and hook them up to std yourself (where "yourself" may mean just linking a crate of course).

The reason I brought up the portability lint is that some platforms (including the web, but think of another one if you like) will be missing functionality like, say, the file system. If -unknown starts generating imports for everything you use, and then fails at runtime, that throws out any advantage the portability lint would bring.

The implementation is provided for the specific host, or could even be implemented from rust.

How could it be provided from Rust if the syscalls are all wasm imports?

@Diggsey

This comment has been minimized.

Diggsey commented Jan 18, 2018

I guess you could say I'm arguing for these syscalls to be exposed in Rust for crates to fill in if necessary, rather than forcing them to be wasm imports.

How could it be provided from Rust if the syscalls are all wasm imports?

Ah, this discussion makes a lot more sense now: there's no difference between wasm imports and C/Rust imports! When you generate a wasm file, any unresolved extern symbols get added as imports - that means as long as you can satisfy any wasm import by statically linking a C or Rust file the exports the required symbol.

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 18, 2018

That's insufficient. Perhaps another way to phrase this is that most platforms differentiate between static imports, which cause link errors if they're missing, and dynamic imports, which are resolved at runtime. WebAssembly only seems to have one kind of import, which out of necessity gets used as a dynamic import.

When you ask rustc to produce a binary and it claims it did so successfully, but that binary has dynamic (and thus unresolved) imports, it's making an assumption about the environment. This is expected for the -msvc and -gnu targets, which assume a libc exists on the target platform. It is wrong for the -unknown target, because the syscall implementation may not exist on the target platform, and crucially that may be intentional.

So the scenario I'm describing is this: when you pick up a Rust program that uses std and build it for -unknown, you should get link errors, not a successful build with a bunch of imports. Notably, you should also be able to take that program and satisfy those link errors somehow, including by remapping them to dynamic imports that may or may not match the syscall interface.

On the other hand, if you build that program for a new target like -std/-stdweb/-web/-node/-emscripten/etc, I would expect it to produce a wasm binary with imports, often along with the Javascript glue to satisfy them.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 18, 2018

So the scenario I'm describing is this: when you pick up a Rust program that uses std and build it for -unknown, you should get link errors, not a successful build with a bunch of imports.

OK, let's assume that instead of requiring imports from the host, we require one or more symbols to be defined which libstd will import and use.

Today, those two ideas have the same implementation: in the future (if your suggestion is implemented and we must be specific about wasm imports) then they may be different. Is your point simply that we should be explicit about them being C-style imports today rather than wasm imports?

Whichever type of imports we define them to be, we still have to choose from the four options I mentioned above.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 18, 2018

Also, there's still no need to have a separate target: even if wasm imports become distinct from C-style imports and we use C-style imports in libstd, then anyone can publish a crate that re-exports the C-style imports as wasm imports.

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 18, 2018

Is your point simply that we should be explicit about them being C-style imports today rather than wasm imports?

No, my point is about which situations generate those imports. The only way a binary should link successfully is when all its remaining imports are "dynamic"/wasm-style and either a) assumed to be provided by the environment because it's using a full target, or b) the crate has opted into it via an explicit declaration or dependency.

Also, there's still no need to have a separate target: even if wasm imports become distinct from C-style imports and we use C-style imports in libstd, then anyone can publish a crate that re-exports the C-style imports as wasm imports.

That re-exporting crate would be great to have, but it would need to be opt in. You couldn't just take any random binary crate (say, ripgrep) and build it with cargo build --target=wasm32-unknown-unknown; you'd first have to add that re-exporting crate as a dependency.

The reason for other targets is to let you run cargo build --target=wasm32-unknown-std and have that succeed without modifying the crate, while letting -unknown remain a target with no implicit imports.

@lukewagner

This comment has been minimized.

Contributor

lukewagner commented Jan 19, 2018

My inclination here (in both Rust and earlier in symmetric C++ discussions) is that practically nothing is elevated to a special builtin/syscall/runtime level: that anything that requires punching through to an embedding/Web API is expressed as pure Rust code that declares an import and calls it, and then how that import is satisfied is host-specific and happens outside of rustc. On the Web, the import would be satisfied with the export of an ES Module (with the 3 variants of where that ESM comes from enumerated in Lin's diagram).

I could be wrong because of lack of Rust knowledge, but I think this matches what @rpjohnst is advocating as well?

Considering a concrete example, printing to stdout, it seems like there would be two levels of crates here:

  • at the lower level, a broad set of alternative crates, each offering a different choice of where to send stdout, but all implementing the same Rust interface exposed to the higher-level crate:
    • to console.log/console.error
    • to the innerText or innerHTML of a given DOM element, configurable via ESM exports of a .js file defined in this crate and included in the final package (again, according to the process in Lin's diagram)
    • to a WebSocket out to some external debugging process, configurable the same way
  • at the next level up, a crate that implements the common part of a printing library (formatting, etc) that calls down into the lower level crate with the final chars to send.

I don't know enough about Rust/crates to know how, but it seems like the user should be able to choose, for each top-level crate that gets built into wasm and packaged up into npm (again, diagram), which of the lower-level crates to use as the printing backend. Furthermore, I should be able to very easily write my own lower-level crate that sends stdout to who-knows-where in a few lines of Rust that call out to my custom JS and choose that one just as easily. I think this will be a pretty common thing to want to do on the Web for many of the areas of standard library functionality that, on other platforms, have a single obvious impl that we take for granted.

Sorry if that was incoherent, happy to discuss more :)

@Diggsey

This comment has been minimized.

Diggsey commented Jan 19, 2018

The reason for other targets is to let you run cargo build --target=wasm32-unknown-std and have that succeed without modifying the crate, while letting -unknown remain a target with no implicit imports.

The implementation on -unknown must come from somewhere - surely you'd still have to modify the crate to inject that?

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 19, 2018

I'm not sure what you're asking. Did you mean to write "the implementation on -std must come from somewhere?" If so, std itself would be allowed to include the dynamic/wasm imports in that case, much like it is allowed to call out to libc on -gnu/etc targets. Thus, no modification necessary.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 19, 2018

No, I mean -unknown. You said before you still expected -unknown to be usable without #![no_std]

@rpjohnst

This comment has been minimized.

rpjohnst commented Jan 19, 2018

On -unknown without #![no_std], I expect link errors unless you modify the crate to explicitly add wasm imports. The important point is that std-on--unknown not be the one to add them.

@Diggsey

This comment has been minimized.

Diggsey commented Jan 19, 2018

I feel like we're talking in circles: as far as libstd goes, that's no different from my original suggestion! The only change is to code generation, and is to make a distinction between wasm imports and C-style imports, which is an important consideration, but quite a separate concern from how libstd is implemented.

You still have to solve the original problem with backwards compatibility that I asked in my first post: which of the four options are you going to go with:

  1. Use a single import, and implement something resembling the system call interface I designed in the above PR.
  2. Use multiple imports, and just never add any new ones after stabilisation.
  3. Require hosts to dynamically generate all required imports.
  4. Use a system-call like interface internally to libstd, and define wasm C imports in a separate crate, similarly to how we can link in different allocators. Periodically we release a new major version of the "imports" crate which contains any new functions we wanted to import. Old programs can continue linking to older versions of the imports crate.
@aturon

This comment has been minimized.

Contributor

aturon commented Feb 8, 2018

I had a chat with @wycats today on this topic, which led to the following proposal as an alternative to today's syscall setup.

trait WasmHost: Send + 'static {
    fn write_stdout(data: &[u8]) -> io::Result<usize>;
    // etc.
}

static WASM_HOST: RefCell<Option<Box<WasmHost>>> = RefCell::new(None);

// call this in the `start` function
fn set_wasm_host<H: WasmHost>(host: H) {
    let old_host = mem::replace(WASM_HOST.borrow_mut(), Box::new(host));
    assert!(old_host.is_none())
}

// Now the `std` implementation can use `WASM_HOST` to dispatch its functionality,
// using `unimplemented!()` as a fallback on `None`

This would make it possible for external libraries to provide host bindings, with the caveat that the host must be manually initialized (probably within the start of the wasm module).

While this isn't ideal in the long run, it would make it much easier to experiment out of tree, and would let us build up a more clear-cut picture of the interface between std and the host. There are some connections to the portability proposal I posted yesterday, as well.

Personally, I'd prefer to go this route for now rather than, say, adding a separate target or otherwise try to nail down shared expectations around a built-in syscall interface. Notably, if you don't set the host, no JS imports are generated. Once we have more experience, we can later revisit the question of standardizing some interface here.

wdyt?

@fitzgen

This comment has been minimized.

Member

fitzgen commented Feb 8, 2018

That seems like a good starting point for something we can experiment with now, out of tree.

Nittiest of nitpicks: WasmHost as written isn't object safe because write_stdout doesn't have a &self.

@Diggsey

This comment has been minimized.

Diggsey commented Feb 8, 2018

@aturon that sounds like a good starting point, but it doesn't (in itself) solve the backwards compatibility issues (ie. we will want to add/change methods on this trait, even after we stabilise it).

There's a couple of ideas I have to solve this:

  1. Give all methods a default implementation: this is simple and means we can add new methods, but it means we can never change or remove methods.

  2. Never stabilise the trait, but stabilise "versioned forks" of the trait - ie. when we're somewhat happy that we have a default set, we fork off WasmHostV1. One of the methods on this trait is default-implemented, and performs the conversion to the unstable WasmHost trait object.

Another thing we should do regardless of which of these we choose, is to make a "no-op" implementation of these traits public, so that it can be deferred to for methods you don't want to implement. We can also use this no-op implementation instead of branching on an Option each time.

@aturon

This comment has been minimized.

Contributor

aturon commented Feb 8, 2018

@Diggsey

but it doesn't (in itself) solve the backwards compatibility issues (ie. we will want to add/change methods on this trait, even after we stabilise it).

Indeed! That wasn't the goal so much as to:

  • make it easier to experiment with plugging in different "hosts"
  • make it easier to explore the API surface with Rust types
  • avoid any costs if you don't use it

I think stabilization is still a ways off; right now I'm just trying to make experimentation as easy as possible.

We can't use a no-op implementation rather than Option, because doing so would force all wasm modules to include all the pieces needed for that vtable construction. Using Option specifically avoids that issue.

@fitzgen

This comment has been minimized.

Member

fitzgen commented Feb 8, 2018

I had started writing a comment yesterday, but threw it away when I considered the downsides, but I'll pull it back out again:

It would be neat if instead of defining all syscalls on a trait, we made the trait return optional capabilities. Something like:

trait WasmHost: Send + 'static {
    fn stdout(&self) -> Option<&'static RefCell<Box<io::Write>>;
    // etc.
}

This would allow us to add new capabilities with default implementations that return None.

The big downside, and why I binned the comment, is that this would presumably require re-writing a ton of std...

@aidanhs

This comment has been minimized.

Contributor

aidanhs commented Feb 8, 2018

I've mentioned it elsewhere (here and linked issues), but I'll note it on this issue too - xargo/cargo sysroots seem like a better solution to this problem than a constantly-changing struct with arguments (edit: meaning when people want to make a PR for syscall X on the trait but another group wants it implemented in another way) about what needs changing next.

That said, I recognise that people are looking for a solution yesterday and sysroots are a way off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment