Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete basic support for wasm32-wasip2 #3607

Open
RalfJung opened this issue May 16, 2024 · 9 comments
Open

Complete basic support for wasm32-wasip2 #3607

RalfJung opened this issue May 16, 2024 · 9 comments
Labels
A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@RalfJung
Copy link
Member

RalfJung commented May 16, 2024

The goal is to get this command to pass:

./miri test --target wasm32-wasip2 empty_main integer vec string btreemap hello hashmap heap_alloc align

I think all that's missing here is

  • support for printing to stdout/stderr
  • support for getting randomness

(Also, we're looking for a target maintainer for wasm, so please let us know if you're up to that -- basically someone we'd ping when there are wasm questions, and in case std starts using a new wasm API we'd hope they would be able to provide a Miri implementation.)

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims E-good-first-issue A good way to start contributing, mentoring is available labels May 16, 2024
@RossSmyth
Copy link
Contributor

Looking at this last night, it seems that the FFI library basically casts everything to i32 before passing. But the specified signatures use proper types like pointers & such.

For example for the random_get function the specified signature is:

(@interface func (export "random_get")
  ;;; The buffer to fill with random data.
  (param $buf (@witx pointer u8))
  (param $buf_len $size)
  (result $error (expected (error $errno)))
)

So essentially random_get(buf: const* u8, buf_len: usize) -> errno

But the FFI wrapper uses random_get(arg0: i32, arg1: i32) -> i32

Essentially this program is how I see it:
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=d17cb441fcafbf87168fe879375b52b0

So the question is, should we get an exposed pointer from the address passed? There is no reason why the implementer of the FFI function would call from_exposed_addr on the buf parameter based upon the specification. But if we don't, essentially every function will fail because we will read just a raw addr without any provenance.

This is definitely a bug in the FFI bindings. I remember this pattern used to be semi-pervasive before the concept of provenance was popularized in the Rust community. Reading around the WASI github, they are applying the WASM machine implementation to the Rust FFI bindings. I see a few times saying that they are only reflecting the reality that WASM is just linear memory in the signatures.

It seems that in the latest version this is not done though thankfully. The specified signature matches the FFI decl signatures.

Another interesting note is in the specification of the latest version wrt to miri:

Deterministic environments must omit this function, rather than implementing it with deterministic data.

Specifically for the secure random

@RalfJung
Copy link
Member Author

RalfJung commented May 18, 2024

Looking at this last night, it seems that the FFI library basically casts everything to i32 before passing. But the specified signatures use proper types like pointers & such.

That's unfortunate, makes writing wrappers in Miri a lot more annoying and users will get warning about the int2ptr casts.

So the question is, should we get an exposed pointer from the address passed? There is no reason why the implementer of the FFI function would call from_exposed_addr on the buf parameter based upon the specification.

That is exactly what the implementor must be doing, if it is implemented in Rust.

Essentially this program is how I see it:
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=d17cb441fcafbf87168fe879375b52b0

Is the implementation written in Rust or wasm? If it is written in Rust, then this looks like UB to me, at least if there's any chance of the declaration and implementation being part of the same Rust Abstract Machine (e.g., any kind of link-time optimizations).

If it is implemented in wasm, then linking happens on the wasm level and there's no more provenance, so this would work. It's still not a good choice IMO.

they are applying the WASM machine implementation to the Rust FFI bindings. I see a few times saying that they are only reflecting the reality that WASM is just linear memory in the signatures.

That makes little sense. x86 memory is also linear and we don't write our FFI bindings like that for x86 either. When you're writing Rust code, why not use Rust types? They're useful!


Another interesting note is in the specification of the latest version wrt to miri:

Deterministic environments must omit this function, rather than implementing it with deterministic data.

Specifically for the secure random

We're interpreting Rust code, not wasm code, so I don't feel particularly bound to whatever the wasm standard says about wasm runtimes.

@RossSmyth
Copy link
Contributor

That is exactly what the implementor must be doing, if it is implemented in Rust.

Only if you generate the bindings with that specific version of their tool. If you use a newer version of their tool, or write the signatures yourself based upon the spec then you will not have the pervasive integers, and end up in the situation of what the playground example shows. Which is why I ask if we just reflect the reality of what Rust's stdlib links today, or if we reflect the spec of the functions.

In reality I do not know if anyone regenerates bindings or writes the declarations themselves.

Is the implementation written in Rust or wasm?

They seem to be built-in functions of the WASI environment, so WASM.

Looking into it further I did have concerns about LTO, but it seems that the C API uses Hungarian notation so there isn't risk with LTO.

@bjorn3
Copy link
Member

bjorn3 commented May 18, 2024

but it seems that the C API uses Hungarian notation so there isn't risk with LTO.

Even if the C code used the same name it still wouldn't conflict as the wasi imports are specifically declared as imported from the wasi_snapshot_preview1 wasm module. Rustc will keep the symbol mangled like it would for non-foreign functions and instead attach some metadata to tell the linker to import it from wasi_snapshot_preview1 after linking.

@RossSmyth
Copy link
Contributor

RossSmyth commented May 19, 2024

Oh, yeah. Here's exactly what I was talking about
https://github.com/bytecodealliance/wasmtime/blob/a77e87732e4e94ab6a2cee65bd5d74670bafe4ad/crates/wasi-preview1-component-adapter/src/lib.rs#L1528-L1533

Their Rust implementation uses pointer arguments.

@RalfJung
Copy link
Member Author

Only if you generate the bindings with that specific version of their tool. If you use a newer version of their tool, or write the signatures yourself based upon the spec then you will not have the pervasive integers

So... do we just need to update std to use bindings generated with a newer version then?

@RalfJung
Copy link
Member Author

RalfJung commented May 19, 2024

Looks like std uses wasi crate version 0.11. In 0.12 the API changed (both the public API of the crate and the underlying wasi APIs it uses) but internally it is still using i32 for the pointer type. But in 0.13 it says *mut i32.

It probably doesn't make a ton of sense to implement the old preview1 functions in Miri. So ideally std would be updated to wasi 0.13 and then we can implement the preview2 (?) APIs, and we can expect the argument to have provenance as it should.

@RalfJung RalfJung changed the title Complete basic support for wasm32-wasi Complete basic support for wasm32-wasip2 May 19, 2024
@RalfJung
Copy link
Member Author

Even with wasm32-wasip2, std calls the preview 1 functions. Strange.

@RalfJung RalfJung removed the E-good-first-issue A good way to start contributing, mentoring is available label May 19, 2024
@RalfJung
Copy link
Member Author

Based on the Zulip discussion, it seems like updating std to use the preview-2 APIs may take a while.

So we can either wait, or implement the APIs with internal int2ptr casts in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

No branches or pull requests

3 participants