Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_compiler_recursion_limit fails on wasm32-wasi #95335

Closed
iritkatriel opened this issue Jul 27, 2022 · 20 comments
Closed

test_compiler_recursion_limit fails on wasm32-wasi #95335

iritkatriel opened this issue Jul 27, 2022 · 20 comments
Labels
OS-wasi tests Tests in the Lib/test dir

Comments

@iritkatriel
Copy link
Member

iritkatriel commented Jul 27, 2022

The problem showed up after we merged #95107
and a couple of test were disabled for WASI in #95296

With the patch in iritkatriel@4ce85e8 we get:

Here is a full WASM stack trace generated with

WASMTIME_BACKTRACE_DETAILS=1 \
  wasmtime run --env PYTHONPATH=/builddir/wasi/$(cat pybuilddir.txt) --mapdir /::../../ -- \
  python.wasm -m test -v test_compile -m test_compiler_recursion_limit

wasi_stacktrace.txt

test_compiler_recursion_limit (test.test_compile.TestSpecifics.test_compiler_recursion_limit) ... Error: failed to run main module `python.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: out of bounds memory access
       wasm backtrace:
           0: 0x15d8eb - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5868:37
           1: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15
           2: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15
           3: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15
           4: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15
           5: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15
           6: 0x15ee62 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4846:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5802:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5870:15

Originally posted by @tiran in #93678 (comment)

@tiran tiran changed the title test_compiler_recursion_limit fails on WASM test_compiler_recursion_limit fails on wasm32-wasi Jul 27, 2022
@tiran
Copy link
Member

tiran commented Jul 27, 2022

  • wasmtime-cli 0.39.1
  • WASI-SDK 16.0 with clang version 14.0.4

@tiran
Copy link
Member

tiran commented Jul 27, 2022

To reproduce a similar problem before #95107, add sys.setrecursionlimit(1700) at the beginning of test_compiler_recursion_limit. The default recursion limit on WASI is 600. It will result in a call stack exhausted trap with 3,609 calls instead of a out of bounds memory access trap after 991 calls.

test_compiler_recursion_limit (test.test_compile.TestSpecifics.test_compiler_recursion_limit) ... Error: failed to run main module `python.wasm`

Caused by:
    0: failed to invoke command default
    1: wasm trap: call stack exhausted
       wasm backtrace:
           0: <unknown>!validate_keywords
           1: 0x15c9e4 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4844:9
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5813:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5881:15
           2: 0x15cde6 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4857:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5813:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5881:15
           3: 0x15cde6 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4857:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5813:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5881:15
           4: 0x15cde6 - compiler_call
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:4857:5
                     - compiler_visit_expr1
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5813:16
                     - compiler_visit_expr
                           at /python-wasm/cpython/builddir/wasi/../../Python/compile.c:5881:15
...
         3602: 0x1b6c85 - pymain_run_module
                           at /python-wasm/cpython/builddir/wasi/../../Modules/main.c:300:14
         3603: 0x1b63f1 - pymain_run_python
                           at /python-wasm/cpython/builddir/wasi/../../Modules/main.c:604:21
                     - Py_RunMain
                           at /python-wasm/cpython/builddir/wasi/../../Modules/main.c:689:5
         3604: 0x1b70d7 - pymain_main
                           at /python-wasm/cpython/builddir/wasi/../../Modules/main.c:719:12
         3605: 0x1b7186 - Py_BytesMain
                           at /python-wasm/cpython/builddir/wasi/../../Modules/main.c:743:12
         3606: 0x4c85 - __main_argc_argv
                           at /python-wasm/cpython/builddir/wasi/../../Programs/python.c:15:12
         3607: 0x321d5b - <unknown>!__main_void
         3608: 0x4c69 - <unknown>!_start
         3609: 0x339092 - <unknown>!_start.command_export

@markshannon
Copy link
Member

This reminds me of #30855 (comment)

Fiddling with the code to keep the stack use below some arbitrary limit imposed by the compiler and platform is not the way to fix this or other bugs of the sort.

The proper fix is use some sort of C stack check: #91079

For now, I would recommend reducing the recursion limit in the compiler.

TBH, we shouldn't care if some really weird code f()()()()()()()... fails in the compiler, as long as it fails gracefully.

The only thing we need to handle is long chains of if: .. elifs, which should be handled iteratively to avoid consuming excessive stack.

@markshannon
Copy link
Member

Until we implement #91079 we should use the standard recursion limit counter in the compiler, otherwise we can crash on this code:

def f(n):
    if n: 
        f(n-1)
    else:
        compile_deeply_recursive_ast()
f(900)

As we consume most of the stack in f, then crash in the compiler.

@iritkatriel
Copy link
Member Author

I’m not actually sure my PR added function calls that get repeated in a recursion. I don’t really know what I’m my PR would have caused this.

@markshannon
Copy link
Member

It was probably passing locations by value, instead of by pointer, that increased the stack consumption.
But, as I've said, it could be any change that tips the stack use over the limit.

We need to reduce the recursion by using iterative approaches, not fiddle with the code.

@markshannon
Copy link
Member

Converting recursion into iteration is a pain. So, I'd just reduce the depth for everything but if/elifs and handle those iteratively.

@markshannon
Copy link
Member

@pablogsal might have thoughts on converting if/elifs into a list from a tree in the parser.

@iritkatriel
Copy link
Member Author

I’m not sure the locations are passed down deep recursions though.

@markshannon
Copy link
Member

Probably, not. But they still need stack space to be passed into the ADDOP...s

@tiran
Copy link
Member

tiran commented Jul 28, 2022

It is very well possible that your commit change some unrelated properties of the code that results in different optimizations.

I asked the wasmtime developers about stack limits in my ticket bytecodealliance/wasmtime#4214 .

Currently there is no way for a wasm module to determine its own stack limit or otherwise see how big stack frames are, so there's no way for the wasm module itself to detect what its recursion limit should be.

The stack limit also depends on the WASM runtime and possible other parameters. So far we have been testing with wasmtime only.

@pablogsal
Copy link
Member

@pablogsal might have thoughts on converting if/elifs into a list from a tree in the parser.

I personally not very enthusiastic but also don't mind much if we are going to make our life easier as that's an allowed change we can make but also will break everyone analyzing ASTs.

@markshannon
Copy link
Member

Let's not break everyone.

Can the parser produce the tree without being recursive?

@pablogsal
Copy link
Member

Can the parser produce the tree without being recursive?

I will look into it, but the parser is an RDP so it's very nature is to recurse. We have a prototype with a stack machine but it was a pain to debug, a bit slower and it was not really better so we never used it.

I am slightly confused, why do you want the parser not to recurse? The parser is not reaching any limits here, no?

@markshannon
Copy link
Member

The parser is not reaching any limits here, no?

Not yet, no, but if we change the the compiler to not recurse, then the parser will be the limiting factor.

This is really only an issue for if/elif statements. We can quite reasonable reject expressions hundreds deep, but we need to accept if/elifs with thousands of clauses.

What about a grammar change, so that the parser iterates?
It would still need post-processing to create the current AST, though.

@pablogsal
Copy link
Member

pablogsal commented Aug 1, 2022

What about a grammar change, so that the parser iterates?
It would still need post-processing to create the current AST, though.

It depends on what do you mean by "iterating". If you are referring to make the elif be represented as a list, then that's a simple change but we need to alter the AST. If you are referring that the parser needs to iterate over the clauses and construct a linear version and then the tree-version, then as I mentioned previously, in general, that goes strongly about the current parser design. Not only iterating is quite strange to the parser design but also we put quite a lot of effort to make the parser emit the final AST with no post-processing needed. I would prefer not to go this way.

If you want the parser to fully not recurse we need to bring back the prototype we had to make the parser work as a bytecode interpreter. That would be the best solution, but that's not without downsides, though.

In any case, there are even more limiting factors: the ast constructors, optimizers, visitors and validators will recurse and they will become the limiting factor because they currently use the C stack to go down. So even if you have a tree created without recursion, the fact that the tree is very deep will crash these other parts unless you also redesign them.

@pablogsal
Copy link
Member

pablogsal commented Aug 1, 2022

I think the best change here is to bite the bullet and change the ast so clauses are in a list instead of a nested tree. That would break everyone but it won't put a lot of crazy redesign burden on the interpreter.

Another alternative is just the status quo, which is suboptimal but it is not that bad. In my machine the parser handles a conditional with less than 10000 elifs with no problem s(although the parser fails and this is platform specific). More than that we raise MemoryError. I agree that having arbitrarily big chains of conditionals would be a great thing, but if the cost is redesigning several parts of the compiler pipeline, I don't think is worth the effort and the risk of introducing bugs, and behaviour changes and other stuff.

@markshannon
Copy link
Member

It depends on what do you mean by "iterating".

I meant changing the grammar, not the parser.
Something like (in vaguely EBNF syntax)

if_stmt:
   "if" cond ":" ( "elif" cond ":" clause )* ("else" ":" clause)

We could then post-process to convert back to the deeper tree the AST currently has.
The transformation is not complicated, but it does slow things down.

In my machine the parser handles a conditional with less than 10000 elifs with no problem

On linux, sure. But we want portability, which means controlling recursion.

@pablogsal
Copy link
Member

pablogsal commented Aug 1, 2022

Ah, I see what you mean. 👍

We could then post-process to convert back to the deeper tree the AST currently has.

I would be ok with that if the parser is the one doing the transformation (no-post processing needed). I argued before why is important that the parser produces the final AST with no post-processing needed. The key would be here how to avoid consuming C stack during the processing which I think should not be a problem. Something like (pseudocode):

current_node = root
for clause in clauses:
    current_node.elif = Elif_Node(clause)
    current_node = current_node.elif 

This will consume double memory for the elif chain because all nodes need to be alive at the same time.

The transformation is not complicated, but it does slow things down.

I suppose it depends on how slower this gets, but we should be aware that the tradeoff may not be worth it.


In any case, I think is going to be much much much better for everyone in the long run if we do something like you propose (( "elif" cond ":" clause )* ) but we just make the elif a list in the AST instead of use a tree branch. That would solve it for every single piece of the pipeline. The only downside is that it will break every AST tool, but this is a change we are allowed to make.

In any case, this problem exists for more things, for example:

x+x+x+x+x... thousands of times ... +x+x+x

also creates a tree that will exhaust the stack in the compiler, the parser and every single piece in the middle.

@iritkatriel iritkatriel added the tests Tests in the Lib/test dir label Nov 23, 2023
@brettcannon brettcannon added type-bug An unexpected behavior, bug, or error and removed type-bug An unexpected behavior, bug, or error labels Mar 1, 2024
@brettcannon
Copy link
Member

Since the tests are all passing on WASI at this point, I'm closing this as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-wasi tests Tests in the Lib/test dir
Projects
None yet
Development

No branches or pull requests

5 participants