Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web assembly permanently breaks after calling function with large string #11135

Closed
serg06 opened this issue May 12, 2020 · 13 comments
Closed

Web assembly permanently breaks after calling function with large string #11135

serg06 opened this issue May 12, 2020 · 13 comments
Labels

Comments

@serg06
Copy link

serg06 commented May 12, 2020

I'm trying to make a WASM string parser from C but when I call the function with a large string it breaks. Not only that, all future calls to all WASM functions (with ccall/cwrap) break!

This is my C code:

#include <stdio.h>
#include <emscripten/emscripten.h>

int main(int argc, char ** argv) {
    printf("WASM loaded successfully.\n");
}

int EMSCRIPTEN_KEEPALIVE string_parse(char* str) {
    return 0;
}

Simple, right?

This is my compilation string, I added all the possible fixes I found online but none of them worked:

emcc hello4.c
     -o hello4.js
     -g
     -fsanitize=undefined
     -s ASSERTIONS=1
     -s SAFE_HEAP=1
     -s TOTAL_MEMORY=512MB
     -s DETERMINISTIC=1
     -s BINARYEN_MEM_MAX=2147418112
     -s ALLOW_MEMORY_GROWTH=1
     -s "EXTRA_EXPORTED_RUNTIME_METHODS=['ccall', 'cwrap']"

And this is what running looks like

WASM loaded successfully.
> let string_parse = Module.cwrap('string_parse', 'number', ['string']);
> string_parse("hello");
0
> string_parse("hello".repeat(1000 * 1000));
segmentation fault
hello4.js:1807 segmentation fault
abort @ hello4.js:1807
segfault @ hello4.js:784
SAFE_HEAP_STORE_i32_4_4 @ 0001e07e:1
string_parse @ 0001e07e:1
Module._string_parse @ hello4.js:2498
ccall @ hello4.js:882
(anonymous) @ hello4.js:894
(anonymous) @ VM1534:1
hello4.js:1818 Uncaught RuntimeError: abort(segmentation fault) at Error
    at jsStackTrace (http://localhost:8000/hello3/hello4.js:2231:17)
    at stackTrace (http://localhost:8000/hello3/hello4.js:2248:16)
    at abort (http://localhost:8000/hello3/hello4.js:1812:44)
    at segfault (http://localhost:8000/hello3/hello4.js:784:3)
    at SAFE_HEAP_STORE_i32_4_4 (wasm-function[106]:0x5f12)
    at string_parse (wasm-function[10]:0x462)
    at Module._string_parse (http://localhost:8000/hello3/hello4.js:2498:40)
    at ccall (http://localhost:8000/hello3/hello4.js:882:18)
    at http://localhost:8000/hello3/hello4.js:894:12
    at <anonymous>:1:1
    at abort (http://localhost:8000/hello3/hello4.js:1818:9)
    at segfault (http://localhost:8000/hello3/hello4.js:784:3)
    at SAFE_HEAP_STORE_i32_4_4 (wasm-function[106]:0x5f12)
    at string_parse (wasm-function[10]:0x462)
    at Module._string_parse (http://localhost:8000/hello3/hello4.js:2498:40)
    at ccall (http://localhost:8000/hello3/hello4.js:882:18)
    at http://localhost:8000/hello3/hello4.js:894:12
    at <anonymous>:1:1

And then when I try to call it again with the string that worked the first time, it no longer works!

> string_parse("hello");
<... huge error from above ...>
@sbc100
Copy link
Collaborator

sbc100 commented May 12, 2020 via email

@serg06
Copy link
Author

serg06 commented May 12, 2020

If you want to pass a string to WebAssembly you somehow need to serialize
that string into the WebAssembly memory. You cannot pass JavaScript object
such as strings directly to WebAssembly.

See: https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html

On your link I see this:

The types are “number” (for a JavaScript number corresponding to a C integer, float, or general pointer), “string” (for a JavaScript string that corresponds to a C char* that represents a string) or “array” (for a JavaScript array or typed array that corresponds to a C array; for typed arrays, it must be a Uint8Array or Int8Array).

Doesn't that mean that I can pass a string?

@sbc100
Copy link
Collaborator

sbc100 commented May 12, 2020

Oh sorry, I didn't that you are using cwrap. My apologies.

Presumably you have some code that created the string_parse function? Something akin to int_sqrt = Module.cwrap('int_sqrt', 'number', ['number']). Can you include that here?

It looks to me like this should work. I think we should be able to create a test case from it and figure out what is going wrong.

@serg06
Copy link
Author

serg06 commented May 12, 2020

Oh sorry, I didn't that you are using cwrap. My apologies.

Presumably you have some code that created the string_parse function? Something akin to int_sqrt = Module.cwrap('int_sqrt', 'number', ['number']). Can you include that here?

It looks to me like this should work. I think we should be able to create a test case from it and figure out what is going wrong.

Yep!

let string_parse = Module.cwrap('string_parse', 'number', ['string']);

@bvibber
Copy link
Collaborator

bvibber commented May 12, 2020

If I'm not mistaken, that 5-megabyte string is being allocated onto the stack by cwrap/ccall. Something may be writing beyond the end of the stack, either corrupting heap data or causing some other problem leading to the segfault?

It looks like the current TOTAL_STACK default is 5*1024*1024, which leaves relatively little headroom for anything else than that string. Try either increasing the stack size, or explicitly mallocing heap space for the string and passing a pointer.

@bvibber
Copy link
Collaborator

bvibber commented May 12, 2020

Also, for strings ccall and friends allocate 4 * str.length + 1 bytes just in case it contains a lot of multibyte UTF-8 characters. While this is fast (saves the trouble of doing two passes through the UTF-16 source string to precalculate the exact byte length) it also means you're more likely than you think to overflow the stack when passing large strings.

@serg06
Copy link
Author

serg06 commented May 13, 2020

It looks like the current TOTAL_STACK default is 510241024, which leaves relatively little headroom for anything else than that string.

Oh, I see.

Try either increasing the stack size

How can I increase the stack size?

or explicitly mallocing heap space for the string and passing a pointer.

How can I malloc and pass a pointer to the string before even entering the C code?

@sbc100
Copy link
Collaborator

sbc100 commented May 13, 2020

The stack size is controlled by the -s TOTAL_STACK setting.

If you want to pass huge strings that exceed that stack size the perhaps ccall isn't right for you. There are several other methods listed in https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html.

To allocate space you should be able to call Module['_malloc'] and copy your string into the resulting space using stringToUTF8.

@serg06
Copy link
Author

serg06 commented May 14, 2020

The stack size is controlled by the -s TOTAL_STACK setting.

That fixed the 500MB case, but it doesn't let me increase the stack size past 2GB:

emcc: error: INITIAL_MEMORY must be larger than TOTAL_STACK, was 536870912 (TOTAL_STACK=2147418112)
emcc: error: INITIAL_MEMORY must be less than 2GB due to current spec limitations

To allocate space you should be able to call Module['_malloc'] and copy your string into the resulting space using stringToUTF8.

Oh thanks! Unfortunately it seems like I'm still limited to a maximum of 4GB total memory?

wasm-ld: error: maximum memory too large, cannot be greater than 4294967296

@sbc100
Copy link
Collaborator

sbc100 commented May 14, 2020

WebAssembly uses 32-bit pointers, so it there is literally no way to address more than 4Gb.

Furthermore, if you want o run on the web you are currently limited practically to a lot less than this.

@serg06
Copy link
Author

serg06 commented May 14, 2020

Oh, I see. I guess that's it then. Should I leave this issue open so someone can add a check to fix the permanent break and print a nice error message?

@kripken
Copy link
Member

kripken commented May 14, 2020

It turns out our normal stack checks fail to catch this, leading to the confusing error: WebAssembly/binaryen#2850

However, you should get a better error message using asan with -fsanitize=address.

ericmandel added a commit to ericmandel/js9 that referenced this issue Oct 6, 2020
  -- the Emscripten ccall passes args on the stack (which is about 5Mb in size)
  -- if a FITS header is too large, it blows the stack
  -- see: emscripten-core/emscripten#11135
  -- so we allocate space on the heap, copy the header, and pass the heap ptr
  -- (similar technique used in zscale)
  -- also, parameterize the size of hlength passed to initwcs (def: 256000)
@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Jun 2, 2021
@stale stale bot closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants