Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten & WASI & POSIX #9479

Open
syrusakbary opened this issue Sep 23, 2019 · 26 comments
Open

Emscripten & WASI & POSIX #9479

syrusakbary opened this issue Sep 23, 2019 · 26 comments

Comments

@syrusakbary
Copy link

@syrusakbary syrusakbary commented Sep 23, 2019

After compiling this simple example with emscripten (using tot-upstream, with em++ issue_577.cpp -s WASM=1 -o issue_577.wasm):

#include <iostream>

int main(int argc, char* argv[]) {
  std::cout << "Does this work?" << std::endl;
  return 0;
}

The WebAssembly file has the following imports:

  (import "env" "__cxa_uncaught_exceptions" (func $env.__cxa_uncaught_exceptions (type $t16)))
  (import "env" "__cxa_atexit" (func $env.__cxa_atexit (type $t1)))
  (import "env" "__syscall6" (func $env.__syscall6 (type $t2)))
  (import "env" "__syscall145" (func $env.__syscall145 (type $t2)))
  (import "env" "__syscall140" (func $env.__syscall140 (type $t2)))
  (import "wasi_unstable" "fd_write" (func $wasi_unstable.fd_write (type $t11)))
  (import "env" "getenv" (func $env.getenv (type $t0)))
  (import "env" "__map_file" (func $env.__map_file (type $t2)))
  (import "env" "__syscall91" (func $env.__syscall91 (type $t2)))
  (import "env" "strftime_l" (func $env.strftime_l (type $t7)))
  (import "env" "__cxa_pure_virtual" (func $env.__cxa_pure_virtual (type $t12)))
  (import "env" "pthread_cond_broadcast" (func $env.pthread_cond_broadcast (type $t0)))
  (import "env" "pthread_cond_wait" (func $env.pthread_cond_wait (type $t2)))
  (import "wasi_unstable" "proc_exit" (func $wasi_unstable.proc_exit (type $t10)))

Note that there wasi_unstable imports are mixed with the emscripten env ones.

(See attached generated wasm: issue_577.wasm.zip)

This makes a bit hard for standalone-wasm implementors, as now we have to mix two different ABIs (WASI and Emscripten POSIX-like) and make sure they both run properly. This is quite challenging as both WASI and Emscripten have a different data structure in the VM context (in Wasmer, the struct holding the the VM WASI data is different than the Emscripten data)

I think Emscripten should adopt only WASI when all the imports are WASI-like, otherwise use the already existing ABI.

Thoughts @kripken?

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Sep 23, 2019

We are trying to move emscripten in the direction of WASI where possible to avoid supporting two different ABIs. This means emscripten's wasm files will start to look more WASI compatible over time. @kripken even recently added a -s STANDALONE_WASM flag which more of this.

Its sounds like you are saying that it would be hard for you integrate part of the WASI implementing into you emscripten ABI support? But in the long run, if you could do this, wouldn't it actually reduce your code complexity? I admit I didn't look at your source yet, so I don't fully understand why this would be a problem.

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Sep 23, 2019

Also, what do you think this -s STANDALONE_WASM option? Would wasmer want to limit its support for emscripeten binaries such that it only supports binaries built with this flag? Another way of putting it, do you want to continue to support arbitrary emscripten binaries found on the web, or can you control the compiler flags of the binaries you want to run?

Supporting only STANDALONE_WASM would simplify your code I imagine.

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Sep 23, 2019

Thanks for raising the issue @syrusakbary !

There are multiple things happening here. One is that, regardless of STANDALONE_WASM, emscripten wants to to use wasi APIs as much as possible. Just so that we're as standards-compliant as we can be, and not have arbitrary odd things. As a result, even without that flag, a hello world program will have some wasi imports like fd_write alongside the env ones.

Over time we hope to remove the other env imports where possible. However, I don't think we'll ever get to zero, since there are things that just only make sense on the Web, that I doubt wasi will support. For example a memory growth API that is JS-friendly seems to have no support.

But another issue is that we don't think it's practical to have two modes, 100% wasi and 100% non-wasi, because wasi is still a work in progress, as is our support for it. In the meantime, custom embedders may use wasi support but also add extra support for non-wasi stuff. I don't see a way around that (even if we assume wasi will eventually have graphics and audio etc., it will take many years).

We do have the EMSCRIPTEN_METADATA option which indicates which ABI the wasm binary uses - as we support more wasi, the ABI will be changing gradually, and we'll update the version there.

That's the big picture from our perspective. I'd like to understand more what specifically is inconvenient for wasmer, and work to find a solution.

@syrusakbary

This comment has been minimized.

Copy link
Author

@syrusakbary syrusakbary commented Sep 23, 2019

Wasmer Emscripten implementation and WASI implementation don't share the same context data.
The reason why we are doing this is:

  • Wasmer Emscripten relays the syscalls to the OS (with a bit of wrapping)
  • Wasmer WASI makes all the filesystem calls safely sandboxed.

Because of that, it's hard to use Emscripten and WASI together for the same underlying filesystem.

Why is hard?

Based on the example that I provided before, the set of imports are:

  (import "env" "__cxa_uncaught_exceptions" (func $env.__cxa_uncaught_exceptions (type $t16)))
  (import "env" "__cxa_atexit" (func $env.__cxa_atexit (type $t1)))
  (import "env" "__syscall6" (func $env.__syscall6 (type $t2)))
  (import "env" "__syscall145" (func $env.__syscall145 (type $t2)))
  (import "env" "__syscall140" (func $env.__syscall140 (type $t2)))
  (import "wasi_unstable" "fd_write" (func $wasi_unstable.fd_write (type $t11)))
  (import "env" "getenv" (func $env.getenv (type $t0)))
  (import "env" "__map_file" (func $env.__map_file (type $t2)))
  (import "env" "__syscall91" (func $env.__syscall91 (type $t2)))
  (import "env" "strftime_l" (func $env.strftime_l (type $t7)))
  (import "env" "__cxa_pure_virtual" (func $env.__cxa_pure_virtual (type $t12)))
  (import "env" "pthread_cond_broadcast" (func $env.pthread_cond_broadcast (type $t0)))
  (import "env" "pthread_cond_wait" (func $env.pthread_cond_wait (type $t2)))
  (import "wasi_unstable" "proc_exit" (func $wasi_unstable.proc_exit (type $t10)))

But let's get analyze them more detailed:

Cpp calls (are separated from fs) 👍:

  # This is a Cpp method 👍
  (import "env" "__cxa_uncaught_exceptions" (func $env.__cxa_uncaught_exceptions (type $t16)))
  # This is a Cpp method 👍
  (import "env" "__cxa_atexit" (func $env.__cxa_atexit (type $t1)))
  # This is a Cpp method 👍
  (import "env" "__cxa_pure_virtual" (func $env.__cxa_pure_virtual (type $t12)))

Filesystem calls (non-wasi) ❗️

  # This is a call to close a file ❗️(it should be WASI fd_close)
  (import "env" "__syscall6" (func $env.__syscall6 (type $t2)))
  # This is a call to read a file descriptor ❗️(it should be WASI fd_readv)
  (import "env" "__syscall145" (func $env.__syscall145 (type $t2)))
  # This is a call to seek a file descriptor ❗️(it should be WASI fd_seek)
  (import "env" "__syscall140" (func $env.__syscall140 (type $t2)))

WASI filesystem calls

  # This is a WASI call  👍
  (import "wasi_unstable" "fd_write" (func $wasi_unstable.fd_write (type $t11)))

Other calls:

  # This could be a WASI call, but doesn't interact with the fs so it's also ok to not use WASI  👍
  (import "env" "getenv" (func $env.getenv (type $t0)))
  # This map_file can be a bit challenging
  (import "env" "__map_file" (func $env.__map_file (type $t2)))
  # This map_file can be a bit challenging
  (import "env" "__syscall91" (func $env.__syscall91 (type $t2)))
  # This could be a WASI call, but doesn't interact with the fs so it's also ok to not use WASI  👍
  (import "env" "strftime_l" (func $env.strftime_l (type $t7)))
  # This is a non-filesystem method 👍
  (import "env" "pthread_cond_broadcast" (func $env.pthread_cond_broadcast (type $t0)))
  # This is a non-filesystem method 👍
  (import "env" "pthread_cond_wait" (func $env.pthread_cond_wait (type $t2)))
  # This is a WASI call 👍
  (import "wasi_unstable" "proc_exit" (func $wasi_unstable.proc_exit (type $t10)))

Ideal solution

Right now, the filesystem calls are intermixed between WASI (wasi_unstable.fd_write) and Emscripten (env.__syscall6, env.__syscall145, env.__syscall140) ... and this makes the implementation very coupled from both.

Our WASI implementation relies on different sandboxed filesystem descriptors that can't be reused in the normal Emscripten context.

It would be awesome, if for the filesystem we can relay completely in WASI (so it can be easily decoupled from the other Emscripten syscalls).
Perhaps it would be easier to either use all or none, as an incremental approach will be much more challenging on the standalone WASM runtimes side.

Adding also @AndrewScheidecker to the thread as he might have some extra input / ideas

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Sep 24, 2019

The plan is to move as many of the emscripten syscalls as possible to WASI syscalls. So the ones you mention will be moving to WASI very soon I imagine. fd_write was chosen as the first one to transition since its what you need for hello world.

However there will inevitably be syscall in emscripten that don't map to WASI syscalls. This might well include filesystem syscalls that take the same file descriptors as the WASI syscalls. I imagine we will get to a place where the emscripten syscalls represent a super of the WASI syscalls.

For the time being since you have two different sandbox models I imagine you will need to continue to maintain two different version of the WASI syscalls (i.e. fd_write for emscripten will not be the same function as fd_write for WASI). However in the long term I would hope that both WASI and emscripten runtimes would use the same sandboxed implementation. Is there any reason you wouldn't want the same filesystem sandbox when running emscripten-built binaries?

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Sep 24, 2019

Yes, as @sbc100 said, we intend to fix many of those soon. E.g. __cxa_uncaught_exceptions is a trivial fix to just move some JS into C. Full filesystem support may take a while, though, I'm not sure offhand how easy that will be.

I do understand that the intermediate state may be harder to support for your embedding. But we have to move incrementally in Emscripten. One option might be to say that wasmer doesn't support Emscripten versions in that intermediate state - so wasmer would support older versions, and eventually new-enough versions once that work is done, but not versions X-Y in the middle.

But also as @sbc100 said, I expect we'll always be a superset of wasi in some form or other. Hopefully not for filesystem I/O, but definitely for other stuff (graphics, audio, etc.).

@syrusakbary

This comment has been minimized.

Copy link
Author

@syrusakbary syrusakbary commented Sep 24, 2019

Is there any reason you wouldn't want the same filesystem sandbox when running emscripten-built binaries?

We would love to have the same sandbox in Emscripten. But it's just quite hard if the filesystem calls are intermixed between emscripten and WASI (meaning: it's hard if we use wasi_unstable.fd_write, env.__syscall6, env.__syscall145, env.__syscall140 all at the same time)
It will be easier if all the filesystem calls are handled with WASI (as opposed to only fd_write).

I imagine we will get to a place where the emscripten syscalls represent a super of the WASI syscalls.

I think that would be awesome ❤️

But also as @sbc100 said, I expect we'll always be a superset of wasi in some form or other. Hopefully not for filesystem I/O, but definitely for other stuff (graphics, audio, etc.).

Yeah, I think that's the good approach. I just wished all WASI (regarding the fs) was implemented at once so we could reuse the WASI sandboxed code that we have :)

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Sep 25, 2019

@syrusakbary what do you think about the STANDALONE_WASM mode? Do you want to be able to run arbitrary emscripten binaries or would you be ok requiring they be built with STANDALONE_WASM ? (i.e. do you build all your binaries yourself or do you want to run emscripten binaries from the web?)

@AndrewScheidecker

This comment has been minimized.

Copy link

@AndrewScheidecker AndrewScheidecker commented Sep 25, 2019

I think it's a good move to use the WASI ABI for as much of the Emscripten functionality as possible. The ABI changes are a short-term burden for WAVM, but in the long-term it will be much less of a burden if the WASI and Emscripten environments can share code.

But also as @sbc100 said, I expect we'll always be a superset of wasi in some form or other. Hopefully not for filesystem I/O, but definitely for other stuff (graphics, audio, etc.).

Don't rule out adding graphics and audio APIs to WASI! I would implement them in WAVM if they existed.

@syrusakbary what do you think about the STANDALONE_WASM mode? Do you want to be able to run arbitrary emscripten binaries or would you be ok requiring they be built with STANDALONE_WASM ? (i.e. do you build all your binaries yourself or do you want to run emscripten binaries from the web?)

I think it's ok to have a flag to produce binaries that work in non-browser environments. It needs to be possible to unambiguously detect a binary compiled without it and produce a nice error message, though.

@MarkMcCaskey

This comment has been minimized.

Copy link

@MarkMcCaskey MarkMcCaskey commented Sep 25, 2019

Don't rule out adding graphics and audio APIs to WASI! I would implement them in WAVM if they existed.

That's true, I'm also very excited about these in WASI, but I think/hope WASI won't have the same graphics APIs that Emscripten has. I spent some time experimenting with the SDL and OpenGL stuff and got it kind of working in Wasmer, but it had some serious issues. The two most prominent are the main loop and the security. The main loop is inverted and works with callbacks which isn't that natural outside the web or JS. Securely executing OpenGL is really tricky and we have to care about and handle the differences between OpenGL, OpenGL ES, and WebGL. There's been some discussion about using WebGPU in WASI, which apparently solves some or all of these problems.

To the general topic: it's tricky. Being able to execute Wasm from the web directly is really neat. However there are already some issues with this because of versioning and how often Emscripten changes. Emscripten compiles to a complete working solution because it can generate the relevant JS and Wasm that work together, the issue is that this means that supporting arbitrary Emscripten Wasm from the web is non-trivial because it's a moving target. I'm not sure if Emscripten stores its version info in the Wasm anywhere, but we're currently not detecting it or using it at Wasmer, we just vaguely target the latest version.

If Emscripten can move all its FS calls into WASI eventually, that would be a good change in my opinion. However, if it can't then things may start to get really complex. In the future WASI fds will be opaque references so Emscripten's will have to be too, or you'll need extra layers of abstraction to keep track of the relationship between Emscripten file handles and WASi file handles and you'll have to sync metadata between them. Any calls outside the WASI ones need to interact with the sandbox appropriately.

I think this bad-case scenario of partial migration will introduce the complexity primarily on the Emscripten compiler side, which will need to be reimplemented on the host side.

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Sep 25, 2019

Specifically regarding WASI using reference types. If/when that happens both emscripten libc and wasi-libc will need to convert between ref types and integer fds anyway. Because libc is based on file descriptors. I don't see problem doing this. If anything I see an opportunity to one day share libc code between wasi-libc and emscripten's libc.

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Sep 25, 2019

@MarkMcCaskey

I'm not sure if Emscripten stores its version info in the Wasm anywhere, but we're currently not detecting it or using it at Wasmer, we just vaguely target the latest version.

We do have an option (EMIT_EMSCRIPTEN_METADATA) to emit such metadata, but it's not on by default. But maybe wasmer could say that it only runs ones with the metadata?

I think this bad-case scenario of partial migration will introduce the complexity primarily on the Emscripten compiler side, which will need to be reimplemented on the host side.

Yeah, we definitely don't plan to do a partial migration - the goal is to have something simple at the end, hopefully just using wasi for filesystem stuff. However, there is some chance of encountering problems with using the wasi API.

@kripken kripken changed the title Emscripten & WASI Emscripten & WASI & POSIX Oct 16, 2019
@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 16, 2019

After doing more work here, I realized that Emscripten switching to 100% wasi for our ABI would preclude full POSIX support.

One example: In NODERAWFS mode we literally propagate file operations to node's FS API directly. For example, if we ask to create a file with mode 0644 then node will do that for us. However, the wasi API doesn't have a POSIX-like owner/group/world levels of permissions. So this just wouldn't work, and would be a regression for us, unless I'm missing something.

Likewise, there are plenty of open() etc. flags that POSIX supports which wasi doesn't, so this is probably not limited to the file mode.

I don't think we want to regress this, as it's useful to compile to js+wasm and run in node with full POSIX powers!

As a result, I think we may want to think about something like this:

  • Use wasi APIs where there is no downside in size or capabilities. That's hopefully most of them.
  • Keep POSIX-like APIs for the remainder, allowing us to preserve POSIX support. In particular we'd keep the musl open syscall, or maybe an emscripten_open.
  • Add a PURE_WASI (or some other name) flag which uses WASI even for things with downsides. In this mode we would lose our full POSIX support and may regress code size. It would be simplest if in this mode we only ran in wasi VMs, so that this mode would be implemented only on the C side, translating POSIX APIs into wasi ones - as if we want to allow running this wasm in JS as well, we'd need to translate the WASI APIs back into POSIX on the JS side (when really it should be recompiled in normal mode).

In practice, I think the majority of programs doing pure computation, and maybe some printf logging, would not need PURE_WASI. The flag would only be needed when doing more general operations on files, where wasi and POSIX diverge.

Maybe there's a better solution that I'm missing, though?

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Oct 16, 2019

Your approach sounds reasonable.

My only question would be what is the value in the PURE_WASI mode? It seems that any reasonable sized app that targets that web would not be able to use it anyway. And small or non-web codebases might as well use wasi-sdk instread. It seems like the PURE_WASI mode, while kind of cool, might not be very useful since most of the customers for that would probably be better off with wasi-sdk.

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 16, 2019

One advantage would be that users with a web port don't need a new toolchain to get a wasi build, they can just flip a flag. Another advantage is that Emscripten could support wasi + other stuff, like say OpenGL, which is not in the wasi SDK. That might make sense in say a game engine plugin, if their runtime already has wasi support for printing and files.

But, yeah, I wish we could do better here - there would be more value for users if Emscripten emitted wasi by default. But abandoning POSIX support doesn't seem worth that. Curious if others feel otherwise though!

@sbc100

This comment has been minimized.

Copy link
Collaborator

@sbc100 sbc100 commented Oct 16, 2019

I'm not totally against a PURE_WASI option if there are user out there who would want it.

In terms of giving up POSIX compatibility and/to accepting regressions in size and/or performance, and agree that we should not sacrifice those things for WASI compatibility. We could make compromised here and there of course if the loss is negligible, and we can continue to push the WASI standard in places where we think it makes sense (as you have started to do already).

@syrusakbary

This comment has been minimized.

Copy link
Author

@syrusakbary syrusakbary commented Oct 18, 2019

Thanks for the updates and for keeping us in the loop!

And small or non-web codebases might as well use wasi-sdk instread. It seems like the PURE_WASI mode, while kind of cool, might not be very useful since most of the customers for that would probably be better off with wasi-sdk.

Agreed, we are now investigating into more ways to compile project easily into WASI. Perhaps it will be tricky for Emscripten to adopt WASI fully regarding possible regressions in size/performance. You probably have much more context than I do on the feasibility of this :)

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 18, 2019

@syrusakbary

What do you think about the POSIX issue mentioned earlier? I'm curious if wasmer is interested to run applications that use more POSIX than WASI can support.

As a concrete example, you can't implement the commandline tool ls in WASI, AFAIK, because WASI can't represent the owner/group/world permission levels (it has a simpler system). That is, the rwx etc. stuff here;

$ ls -al foo
-rwxrwxr-x 1 alon alon 117312 Oct 17 19:10 foo

Similarly, you can't port something like Python 100% in WASI, because people can use it to look at those permissions.

That stuff does work in Emscripten's POSIX support currently. How important do you think it is?

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 18, 2019

@AndrewScheidecker

I'd be curious to hear your thoughts on how important POSIX support is in WAVM, specifically POSIX stuff that doesn't fit into WASI, see the above example.

@AndrewScheidecker

This comment has been minimized.

Copy link

@AndrewScheidecker AndrewScheidecker commented Oct 18, 2019

I do want to support as much of POSIX as I can in WAVM, one way or another. That would ideally be through some standardized ABI like WASI. I don't see why WASI wouldn't eventually support all of POSIX.

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 21, 2019

Thanks @AndrewScheidecker! Good to know.

@syrusakbary

This comment has been minimized.

Copy link
Author

@syrusakbary syrusakbary commented Oct 21, 2019

What do you think about the POSIX issue mentioned earlier? I'm curious if wasmer is interested to run applications that use more POSIX than WASI can support.

Yeah, I think WASI would eventually support all of POSIX. Or, at least, that's where we would like to move towards :)

As a concrete example, you can't implement the commandline tool ls in WASI

We actually got ls working perfectly with WASI (you can install coreutils from wapm ). It's true that the concept of groups/owner doesn't exist... but it's easy to alleviate!

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 21, 2019

It's true that the concept of groups/owner doesn't exist... but it's easy to alleviate!

Oh, interesting! How did you do it?

@syrusakbary

This comment has been minimized.

Copy link
Author

@syrusakbary syrusakbary commented Oct 21, 2019

@kripken

This comment has been minimized.

Copy link
Member

@kripken kripken commented Oct 21, 2019

Oh, but doesn't that commit just skip the POSIX stuff that WASI can't do? Or did I misunderstand it?

@appcypher

This comment has been minimized.

Copy link

@appcypher appcypher commented Oct 23, 2019

Add a PURE_WASI (or some other name) flag which uses WASI even for things with downsides.

I think this would be a preferable approach IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.