Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an unsafe method Uint8Array::view_mut_raw(ptr: *mut u8, length: usize) so that JS can efficiently initialize a wasm buffer #1643

Open
mstange opened this issue Jul 5, 2019 · 13 comments
Labels
good first issue This is a good issue for people who have never contributed to wasm-bindgen before help wanted We could use some help fixing this issue! js-sys Issues related to the `js-sys` crate

Comments

@mstange
Copy link
Contributor

mstange commented Jul 5, 2019

Motivation

I am using the following pattern to efficiently initialize a large buffer inside wasm memory from JavaScript without copies (see #1079 for full motivation):

#[wasm_bindgen]
pub struct WasmMemBuffer {
    buffer: Vec<u8>,
}

#[wasm_bindgen]
impl WasmMemBuffer {
    #[wasm_bindgen(constructor)]
    pub fn new(byte_length: u32, f: &js_sys::Function) -> Self {
        let buffer = vec![0; byte_length as usize];
        unsafe {
            let array = js_sys::Uint8Array::view(&buffer);
            f.call1(&JsValue::NULL, &JsValue::from(array))
                .expect("The callback function should not throw");
        }
        Self { buffer }
    }
}

fn compute_buffer_hash_impl(data: &[u8]) -> u32 { ... }

#[wasm_bindgen]
pub fn compute_buffer_hash(data: &WasmMemBuffer) -> u32 {
    compute_buffer_hash_impl(&data.buffer)
}
let buffer = new WasmMemBuffer(100000, array => {
  // "array" wraps a piece of wasm memory. Fill it with some values.
  for (let i = 0; i < array.length; i++) {
    array[i] = Math.floor(Math.random() * 256);
  }
});
let hash = compute_buffer_hash(buffer); // No buffer copy when passing this argument. Yay!
buffer.free();
console.log(hash);

There are two problems with this:

  1. There's a mutation here that the rust compiler doesn't know about. Uint8Array::view takes an immutable slice ref, but then I go ahead and mutate the contents. Lying to the compiler is dangerous.
  2. The vec is zero-initialized before its contents are overwritten, which wastes almost 400ms in my use case: https://perfht.ml/2K0rMKw

If I wanted to eliminate the zero initialization, I could do that using reserve + set_len on the vec. However, rust forbids wrapping uninitialized values in a slice - you have to use raw pointers when initializing uninitialized memory. But the current signature of Uint8Array::view requires me to make a slice.

Proposed Solution

I propose adding a new method to Uint8Array (and maybe to the other typed array variants):

pub unsafe fn view_mut_raw(ptr: *mut u8, length: usize)

This would let me change my rust code to the following:

#[wasm_bindgen]
pub struct WasmMemBuffer {
    buffer: Vec<u8>,
}

#[wasm_bindgen]
impl WasmMemBuffer {
    #[wasm_bindgen(constructor)]
    pub fn new(byte_length: u32, f: &js_sys::Function) -> Self {
        let mut buffer: Vec<u8> = Vec::new();
        buffer.reserve(byte_length as usize);
        unsafe {
            let array =
                js_sys::Uint8Array::view_mut_raw(buffer.as_mut_ptr(),
                                                 byte_length as usize);
            f.call1(&JsValue::NULL, &JsValue::from(array))
                .expect("The callback function should not throw");
            buffer.set_len(byte_length as usize);
        }
        Self { buffer }
    }
}

It solves the problems: We've eliminated the zero initialization, we're not constructing a slice around uninitialized memory, and we're not lying to the compiler about the mutation anymore.

Alternatives

I'm not aware of any obvious alternatives.

@mstange mstange added the enhancement New feature or request label Jul 5, 2019
@alexcrichton
Copy link
Contributor

Seems reasonable to me to add! I'm not 100% convinced we need this for mutability reasons, it seems a long stretch for anything bad to happen. Having this for unininitialized memory though is more compelling to me and seems pretty reasonable!

@alexcrichton alexcrichton added good first issue This is a good issue for people who have never contributed to wasm-bindgen before help wanted We could use some help fixing this issue! js-sys Issues related to the `js-sys` crate and removed enhancement New feature or request labels Jul 8, 2019
@That3Percent
Copy link

In the same vein, I'd like to see a version of copy_to for typed arrays that can take a &mut MaybeUninit<[T]>. Right now the only way to access the memory in a typed array as a slice is through the copy_to function, but it doesn't make sense to have to initialize the memory first. Should that be opened as a separate issue?

@That3Percent
Copy link

Oops, I think I meant &mut [MaybeUninit<T>], sorry.

@lshlyapnikov
Copy link
Contributor

view_mut_raw has been merged. Do we want the copy_to addressed as part of the same issue?

@alexcrichton
Copy link
Contributor

Seems reasonable to me to add as well!

@lshlyapnikov
Copy link
Contributor

lshlyapnikov commented Nov 8, 2019

but it doesn't make sense to have to initialize the memory first. Should that be opened as a separate issue?

@That3Percent Do you mean you want to be able to pass the uninitialized MaybeUninit to the copy_to_xxx and let it deal with the allocation?

@lshlyapnikov
Copy link
Contributor

is there any way to check if MaybeUninit has been initialized?

@That3Percent
Copy link

@ibaryshnikov

Do you mean you want to be able to pass the uninitialized MaybeUninit to the copy_to_xxx and let it deal with the allocation?

MaybeUninit deals with memory that is already allocated, but not yet initialized. So, copy_to_xxx would not have to deal with any allocation. It would merely fulfill the contract of initializing the memory by writing valid data to it.

is there any way to check if MaybeUninit has been initialized?
No there is not, but for the purposes of this task that would not be necessary because it would be the responsibility of copy_to_xxx to initialize the memory by copying the buffer.

Here's the workflow as it exists today and how it could be improved by the issue.

Task: Given some TypedArray owned by the JavaScript runtime, make it's contents readable as a &[T]

Today:

  1. Allocate some buffer with the same length as the TypedArray. This can be done with eg: Vec::.with_capacity or similar
  2. Initialize the memory by writing something to it. Usually by zeroing. (Steps 1 and 2 can be done in a single call, but are still separate steps as far as the program is concerned)
  3. Pass the initialized memory to copy_to_xxx, which overwrites all the zeroes that were just written

Step 2 is what we want to get rid of. Why write 0 when it's going to be overridden immediately? But, it's not possible without invoking Undefined Behavior, because it's not valid to even have a &[u32] over uninitialized memory, even if unused.

Proposed:

  1. Allocate some buffer with the same length as the TypedArray. This can be done with eg: second-stack, RawVec or similar.
  2. Using MaybeUninit, pass the allocated but uninitialized slice to copy_to_xxx, which initializes the memory by virtue of writing to it on the other side of the FFI boundary.

@That3Percent
Copy link

@lshlyapnikov

I just took a look at #1855 . I think that what you built there is a decent step toward a more performant version of the existing to_vec method. If you look at what that method does today, it executes steps 1-3 that I outlined - allocate, initialize, overwrite. Whereas your code does the more performant version. The API for copy_to_maybe_uninit does not make a great deal of sense though. (eg: why take an MaybeUninit at all if the method internally allocates a Vec?). I think you should refactor the code you have into to_vec so that everyone can enjoy the performance gains, without the usage of MaybeUninit at all. The copy_to_maybe_uninit method should be modified to accept a (I think) &mut [MaybeUninit<T>] to move control of the allocation/destination to the caller.

@lshlyapnikov
Copy link
Contributor

@That3Percent does this match your use case?
https://github.com/rustwasm/wasm-bindgen/pull/1855/files#diff-a4985f313137ee0bff2488dce289c41cR238-R251

MaybeUninit is pointing to an uninitialized Vec. sut is copy_to_unsafe

It is not exactly what you asked for, the signature of the function is:

pub unsafe fn copy_to_unsafe(&self, dst: &mut mem::MaybeUninit<&mut [$ty]>) {

@lshlyapnikov
Copy link
Contributor

lshlyapnikov commented Nov 16, 2019

@That3Percent I think you can use the existing copy_to with uninitialized memory. Why do you want another method? The below works with the existing js_sys::Uint8Array::copy_to and uninitialized slice:

js_sys::Uint8Array::copy_to(&js_arr, *(mvec.as_mut_ptr()));`

Do I miss anything? Here is the entire test case:

    let expected: Vec<u8> = (0..10).collect();
    let js_arr = js_sys::Uint8Array::from(expected.as_slice());

    let mut vec: Vec<u8> = Vec::with_capacity(expected.len());
    unsafe {
        vec.set_len(expected.len());
    }
    let mut mvec: mem::MaybeUninit<&mut [u8]> = mem::MaybeUninit::new(&mut vec);

    unsafe {
        js_sys::Uint8Array::copy_to(&js_arr, *(mvec.as_mut_ptr()));
        assert_eq!(*mvec.as_ptr(), expected.as_slice());
        assert_eq!(mvec.assume_init(), expected.as_slice());
    }
    assert_eq!(vec, expected);

@lshlyapnikov
Copy link
Contributor

Can anyone review the copy_to_mem_raw
#1855 (comment)

@adrian17
Copy link
Contributor

adrian17 commented Apr 8, 2022

However, rust forbids wrapping uninitialized values in a slice - you have to use raw pointers when initializing uninitialized memory. But the current signature of Uint8Array::view requires me to make a slice.

This touches more APIs, for example: glReadPixels's point is to fill a possibly uninitialized buffer, but given the wasm-bindgen WebGL signature:

    pub fn read_pixels_with_opt_u8_array(
        this: &WebGlRenderingContext,
        (...)
        pixels: Option<&mut [u8]>,
    ) -> Result<(), JsValue>;

Currently there's no (legal and sound from Rust's POV) way of avoiding zero-initialization without changing this API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue This is a good issue for people who have never contributed to wasm-bindgen before help wanted We could use some help fixing this issue! js-sys Issues related to the `js-sys` crate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants