Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum call stack error when running pyodide within a web worker #441

Open
SimonBiggs opened this issue May 25, 2019 · 17 comments
Open

Maximum call stack error when running pyodide within a web worker #441

SimonBiggs opened this issue May 25, 2019 · 17 comments

Comments

@SimonBiggs
Copy link
Contributor

SimonBiggs commented May 25, 2019

System combinations that were tested, and confirmed bug occured:

  • Ubuntu 19.04
  • Firefox Quantum 67.0 (64-bit)
  • Chrome Version 74.0.3729.157 (Official Build) (64-bit)

Description of Bug

This issue appears to have been introduced when running Pyodide within a webworker.

Firefox

On Firefox I get the following error message:

image

Weirdly on Firefox on first page load it doesn't appear to occur, but after running a plain refresh F5 this error message does occur.

Chrome

On Chrome I get the following error message:

image

How issue was introduced

It occurs by including the line ctx.pyodide.runPython('import matplotlib') at the following location:

https://github.com/pymedphys/pymedphys/blob/f6bdf5ce9e8858714d9f6c0fa629ce34d91e067b/app/src/observables/webworker-messaging/pyodide.worker#L39-L50

Demo of app causing issue

See a live version of this commit over at https://5ce927378b870500077d8c9c--app-pymedphys.netlify.com/

@SimonBiggs
Copy link
Contributor Author

@jstafford in your use of pyodide with webworkers have you come across this issue?

@SimonBiggs
Copy link
Contributor Author

Could this be related? emscripten-core/emscripten#5316

@SimonBiggs
Copy link
Contributor Author

That issue within emscripten's archive talks about the use of "outlining" causes stack overflows within web workers.... ... If I understand correctly...

@jeffrafter
Copy link

@damaneice and I were digging into a problem that had the same behavior in jasoncharnes/run.rb#7 and tried a lot of different configurations. The reason this is happening has something to do with the _compile_array internals and the problem is solved using emscripten 1.38.32+ and adding the flag -Os or -Oz (https://emscripten.org/docs/optimizing/Optimizing-Code.html#optimizing-code-size) to the emcc call when building the WASM.

@SimonBiggs
Copy link
Contributor Author

I'm having a look through the following link to see what might be needed to port the current emscripten patches over:

emscripten-core/emscripten@1.38.30...1.38.32

@rth
Copy link
Member

rth commented May 27, 2020

@SimonBiggs #480 is another attemps to rebuild for 1.38.35 to approach this more iteratively. Building just CPython does work, but there is an issue with actually loading pyodide that I'm still investigating.

@subwaymatch
Copy link
Contributor

For me, Pyodide worker works in Chrome and Firefox (for both Windows 10/Mac 10.14), but I haven't tried loading any heavy packages. In Safari, I get a maximum call stack error as soon as I try to initialize Pyodide in a worker.

image

I'm switching back to a non-worker solution... 😞

Thanks @SimonBiggs for reporting this issue.

@georgiastuart
Copy link

I am also getting this error on Safari (but not Chrome). Is there a way to catch it so I can fallback to non-web-worker Pyodide?

@joemarshall
Copy link
Contributor

I just hit this also. Still happens in safari 14.4

@hoodmane
Copy link
Member

@joemarshall Did you get a fatal error now? How deep is the Python traceback vs the Javascript traceback? This info might help us set the Python recursion limit more appropriately.

But the issue is probably unfixable on our side: Safari needs to give web workers a bigger call stack. A lot of Python code was written with the assumption that the recursion limit would be 1000, if it turns out the recursion limit is 50, it might be hard to get it to run.

@alexmojaki
Copy link
Contributor

I got an error similar to the first screenshot in the issue in Chrome: when I imported one library it lead to a cascade of imports which went too deep. The solution for me was to first import the deeper library so that it didn't need to be imported again when I ran my actual import. Applying that idea above, you could import signal, then unittest, then numpy, then matplotlib. Of course none of that helps if you can't run any code in Safari at all.

@rth
Copy link
Member

rth commented Apr 28, 2021

Thanks for the report! There is some related discussion in #1541 to measure and potentially decrease the stack frame size (to be able to do deeper recursions).

@joemarshall
Copy link
Contributor

In safari I got exactly 100 lines of trace, but given the top and bottom appear to be wasm functions, I think it may trim the stack trace down to 100 or something. I also didn't have a debug build to hand so no names (was on a student's Mac).

   exception thrown: RangeError: Maximum call stack size exceeded.,<?>.wasm-function[1693]@[wasm code]
<?>.wasm-function[1646]@[wasm code]
<?>.wasm-function[23777]@[wasm code]
<?>.wasm-function[1697]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[795]@[wasm code]
<?>.wasm-function[23266]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[795]@[wasm code]
<?>.wasm-function[23266]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[765]@[wasm code]
<?>.wasm-function[766]@[wasm code]
<?>.wasm-function[3063]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[2733]@[wasm code]
<?>.wasm-function[2679]@[wasm code]
<?>.wasm-function[24224]@[wasm code]
<?>.wasm-function[1606]@[wasm code]
<?>.wasm-function[23763]@[wasm code]
<?>.wasm-function[743]@[wasm code]
<?>.wasm-function[752]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2734]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code]
<?>.wasm-function[17136]@[wasm code]
<?>.wasm-function[2737]@[wasm code]
<?>.wasm-function[747]@[wasm code]
<?>.wasm-function[748]@[wasm code]
<?>.wasm-function[16372]@[wasm code]
<?>.wasm-function[2745]@[wasm code]
<?>.wasm-function[2738]@[wasm code] 
ERROR in worker [object ErrorEvent] 

It reproduces in the linux webkit that is shipped with playwright though, so it is possible to debug on linux or WSL.

Looks like mono has seen this too:
mono/mono#15981

@alexmojaki
Copy link
Contributor

When I first came across this thread it gave me the impression that Pyodide was completely unable to run any code in a Safari web worker, particularly this heavily liked comment:

In Safari, I get a maximum call stack error as soon as I try to initialize Pyodide in a worker.

Maybe that was the case at the time, but it doesn't seem to be any more. In particular https://github.com/dodona-edu/papyros has it working, including the ability to use some reasonably complex libraries. This is on Safari 14.1 so I don't think it's a change in Safari. Did something change in Pyodide to prevent these recursion errors?

Having said that, we did find a place where some deep recursion happens in Pyodide itself which breaks Safari. Under the right conditions which I haven't quite figured out, eval_code produces a traceback like this:

File "/lib/python3.9/site-packages/_pyodide/_base.py", line 494, in eval_code_async
await CodeRunner(
File "/lib/python3.9/site-packages/_pyodide/_base.py", line 249, in compile
self._gen.send(self.ast)
File "/lib/python3.9/site-packages/_pyodide/_base.py", line 155, in _parse_and_compile_gen
_last_expr_to_raise(mod)
File "/lib/python3.9/site-packages/_pyodide/_base.py", line 116, in _last_expr_to_raise
raise_expr = deepcopy(_raise_template_ast)
File "/lib/python3.9/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/lib/python3.9/copy.py", line 270, in _reconstruct
state = deepcopy(state, memo)
File "/lib/python3.9/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/lib/python3.9/copy.py", line 230, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/lib/python3.9/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)

[...skipping many deepcopy frames...]

File "/lib/python3.9/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/lib/python3.9/copy.py", line 264, in _reconstruct
y = func(*args)
File "/lib/python3.9/copy.py", line 263, in <genexpr>
args = (deepcopy(arg, memo) for arg in args)
File "/lib/python3.9/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/lib/python3.9/copy.py", line 210, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/lib/python3.9/copy.py", line 210, in <listcomp>
y = [deepcopy(a, memo) for a in x]

I suggest that in general pyodide should avoid using deepcopy as much as possible. In this case, replacing raise_expr = deepcopy(_raise_template_ast) with raise_expr = ast.parse... should work well.

@rth
Copy link
Member

rth commented Dec 21, 2021

@alexmojaki Could you try with the 0.19.0a1 released yesterday?

Maybe that was the case at the time, but it doesn't seem to be any more

Yes, I think the situation with Safari has been improving over time, particularly for 14.1 and later.

@alexmojaki
Copy link
Contributor

It's not that easy to try, I only use Safari through Browserstack. It'd be great if Pyodide had an official demo of every version running in a web worker, even if it didn't support synchronous IO.

Reproduction of that stack trace involves running pyodide.eval_code("1/0") twice and maybe singing the right incantations to a full moon.

But what I'm asking is what changed in Pyodide between those earlier comments and 0.18.1. At least from this thread none of the links seem to indicate a change.

@rth
Copy link
Member

rth commented Dec 21, 2021

In 0.18.0 there was #1699 which increased the possible recursion depth, in 0.19.0 there are even more improvements in that area. See https://blog.pyodide.org/posts/function-pointer-cast-handling/ for more details.

It'd be great if Pyodide had an official demo of every version running in a web worker,

Yes, we clearly need it. Related to #1498

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants