Skip to content

Generated binding code does not free malloc'd data after passed to Rust if it's a Vec<u8> #1515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
19h opened this issue May 4, 2019 · 13 comments
Labels

Comments

@19h
Copy link

19h commented May 4, 2019

As opposed to the generated code when accepting a &str, wasm-bindgen will not free Vec<u8> aka Uint8Array after it has been passed to Rust.

We investigated this issue after several customers reported a Chrome crash related to memory exhaustion and found big Uint8Array passed to the WebAssembly module to keep growing the memory use, eventually running into memory issues.


Example code:

#[wasm_bindgen]
pub fn process(data: String) -> String {
    process_bytes(data.to_string().to_vec())
}

Generated binding code:

__exports.process = function(data) {
    const ptr0 = passStringToWasm(data);
    const len0 = WASM_VECTOR_LEN;
    const retptr = globalArgumentPtr();
    try {
        wasm.process(retptr, ptr0, len0);
        const mem = getUint32Memory();
        const rustptr = mem[retptr / 4];
        const rustlen = mem[retptr / 4 + 1];

        const realRet = getStringFromWasm(rustptr, rustlen).slice();
        wasm.__wbindgen_free(rustptr, rustlen * 1);
        return realRet;
    } finally {
        wasm.__wbindgen_free(ptr0, len0 * 1);
    }
}

Note how ptr0 (const ptr0 = passStringToWasm(data);) is freed in the finally block at the end of the method: wasm.__wbindgen_free(ptr0, len0 * 1);. That said, the return realRet; in the try block will probably prevent ptr0 from ever being freed in the finally block.

Compare the output when expecting a Vec<u8>:


Example code:

#[wasm_bindgen]
pub fn process(data: Vec<u8>) -> String {
    process_bytes(data)
}

Generated binding code:

__exports. process = function(data) {
    const ptr0 = passArray8ToWasm(data);
    const len0 = WASM_VECTOR_LEN;
    const retptr = globalArgumentPtr();
    wasm.process(retptr, ptr0, len0);
    const mem = getUint32Memory();
    const rustptr = mem[retptr / 4];
    const rustlen = mem[retptr / 4 + 1];

    const realRet = getStringFromWasm(rustptr, rustlen).slice();
    wasm.__wbindgen_free(rustptr, rustlen * 1);
    return realRet;
}

ptr0 is allocated (const ptr0 = passArray8ToWasm(data);) but never freed.

@19h 19h added the bug label May 4, 2019
@devsnek
Copy link
Contributor

devsnek commented May 5, 2019

That said, the return realRet; in the try block will probably prevent ptr0 from ever being freed in the finally block.

Just to clarify, the finally block runs even if there's a return. It's quite an odd semantic in js.

> function a() { try { return 'returned'; } finally { console.log('finally') } }
undefined
> a()
'finally'
'returned'
>

@devsnek
Copy link
Contributor

devsnek commented May 5, 2019

seems the issue is that the finally block doesn't get emitted because this condition is false:

if arg.is_by_ref() || arg.is_clamped_by_ref() {

i guess it should also take into account things that get turned into references even if they aren't originally, like views over uint8arrays?

@ibaryshnikov
Copy link
Member

Thanks for your report!
I've compiled several examples to clarify the memory management.
First is about String

#[wasm_bindgen]
pub fn pass_string(s: String) -> String {
    s.clone()
}

generated js is

module.exports.pass_string = function(s) {
    const ptr0 = passStringToWasm(s);
    const len0 = WASM_VECTOR_LEN;
    const retptr = globalArgumentPtr();
    wasm.pass_string(retptr, ptr0, len0);
    const mem = getUint32Memory();
    const rustptr = mem[retptr / 4];
    const rustlen = mem[retptr / 4 + 1];

    const realRet = getStringFromWasm(rustptr, rustlen).slice();
    wasm.__wbindgen_free(rustptr, rustlen * 1);
    return realRet;
};

And in the webassembly there is a line

call $__rust_dealloc

Now let's go through the details what's going on here.
fn pass_string(s: String) means we pass the ownership from JavaScript to Rust. And the Rust will be responsible for deallocating the memory inside the .wasm binary. The return type of the function -> String means that the ownership of a result is passed to JavaScript. That's why there's wasm.__wbindgen_free(rustptr, rustlen * 1); in generated .js file.

The second example is about &str

#[wasm_bindgen]
pub fn pass_str(s: &str) -> String {
    s.to_owned()
}
module.exports.pass_str = function(s) {
    const ptr0 = passStringToWasm(s);
    const len0 = WASM_VECTOR_LEN;
    const retptr = globalArgumentPtr();
    try {
        wasm.pass_str(retptr, ptr0, len0);
        const mem = getUint32Memory();
        const rustptr = mem[retptr / 4];
        const rustlen = mem[retptr / 4 + 1];

        const realRet = getStringFromWasm(rustptr, rustlen).slice();
        wasm.__wbindgen_free(rustptr, rustlen * 1);
        return realRet;

    } finally {
        wasm.__wbindgen_free(ptr0, len0 * 1);
    }
};

Here the argument is a reference, &str. That means that the ownership stays in JavaScript world. That's why there are two calls of __wbindgen_free in generated .js: one for the argument and another for the returned value.
The Vec<u8> behaves exacly like String from the first example.

@19h
Copy link
Author

19h commented May 5, 2019

Thanks for your elaborate reply!

Passing the ownership to the WebAssembly module something I had considered, but it seemed semantically unintuitive as the difference between extern-alloc-extern-free and extern-alloc-intern-free introduces needless tight-coupling to the underlying behaviour of the WebAssembly module (aka complexity).

What I mean is that #[wasm_bindgen] nearly completely abstracts the malloc and JS-wasm boundary logic away from the developer, and therefore I'd consider it to be appropriate to ensure the passed data to be freed at the same location where it was allocated -- given that the allocation wasn't ever explicitly coded by the developer, but by the abstraction logic.

🐝

@alexcrichton
Copy link
Contributor

Thanks for the report @KenanSulayman! Can you gist the code that you're running into this with? The description in the OP disagrees with what @ibaryshnikov posted above in terms of the bindings generated for fn(String) -> String, so I'm wondering if a typo snuck in by accident somewhere?

@19h
Copy link
Author

19h commented May 6, 2019

@alexcrichton that's accurate, it was meant to say pub fn process(data: &str) -> String {.

The actual code involved is:

// outside rust
const buf = new Uint8Array(target.result as ArrayBuffer);
const docs_raw = this.props.parser.process(buf);
// inside rust
#[wasm_bindgen]
pub fn process(data: &[u8]) -> String {
    types::StatusDTO::from_data(data).to_string()
}

The original implementation read as follows:

// inside rust
#[wasm_bindgen]
pub fn process(data: Vec<u8>) -> String {
    types::StatusDTO::from_data(&data).to_string()
}

Confusion was caused by the significant difference of freeing behaviour&[u8] vs Vec<u8>.

@alexcrichton
Copy link
Contributor

Ah ok thanks for the clarification, that makes sense! So reviewing the generated code again I don't think there's any leaks from the implicit transfer of the bytes and/or string. Are you sure that this is where the leak is coming from? There's a few different ways that memory can be leaked in the wasm heap, and I've also heard reports in the past where Chrome may have an internal wasm bug which causes applications to look like they OOM when it works in other browsers (like Firefox).

@ibaryshnikov
Copy link
Member

@alexcrichton my colleague once reported RangeError: WebAssembly Instantiation: Out of memory: wasm memory. Is it the same thing you mentioned?

@19h
Copy link
Author

19h commented May 7, 2019

@ibaryshnikov yes, that is exactly the same issue that led us to investigate possible leaks

@alexcrichton
Copy link
Contributor

Googling around that error message, perhaps brave/brave-browser#3366 can shed some light on the issue too?

@ibaryshnikov
Copy link
Member

@alexcrichton it's not a wasm-bindgen issue I guess?

@19h
Copy link
Author

19h commented Jun 4, 2019

I would close it, but I also think it's useful to have it around to update it when those memory issues aren't an issue anymore. Although it could use its own tracking issue.

@alexcrichton
Copy link
Contributor

Ah yeah I'm going to close this because I don't think it's a wasm-bindgen issue, but I think unfortunately the root cause wasn't learned. If we get around to that we can certainly document it in wasm-bindgen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants