Deadlock involving dealloc #250

astoff · 2023-11-03T20:09:29Z

I'm facing a deadlock which I think cannot be avoided from the Python API alone. The situation is as follows:

A LuaRuntime is running on thread B.
It calls a Python function. The argument is a Lua object.
The Lua object gets diligently converted to a pure-Python data structure.
The Python data structure is passed to thread A.
Thread B blocks until thread A returns something.

I've traced the execution and the deadlock happens on this line of _LuaObject.__dealloc__:

Line 823 in 67ba194

locked = lock_runtime(self._runtime)

My assumption is that the "diligent conversion" step 3 generates some garbage that gets collected a bit later, when thread A is running. As evidence to this theory:

I haven't observed a deadlock yet if I call gc.collect(0) between steps 3 and 4.
I observe no deadlock if I remove the __dealloc__ method.

(As a data point for you, in case it is hard to find a fully general solution to this issue, my Lua runs are realtively short and I don't mind imperfect garbage collection as in the second point above.)

The text was updated successfully, but these errors were encountered:

scoder · 2023-11-04T11:39:21Z

Hmm, interesting case. That makes locking the runtime appear much less appealing inside of __dealloc__. It's difficult to avoid, though.

The lock that Lupa uses is re-entrant, so this rarely poses problems. It probably requires a setup as in your case to trigger it: passing Python objects over to other Lua threads inside of a Lua call.

One possible solution that comes to my mind is to request the lock in a non-blocking way in cases where we cannot afford waiting (in __dealloc__ specifically), and if we fail to acquire it, add the object to a list in the LuaRuntime and deallocate that list the next time right before we release the lock (i.e. when we definitely own it any way). That would probably keep some objects alive longer than necessary, and there might also be situations where this has other unexpected effects (we're dealing with threads here, so anything can happen, really), but it should still work as before in most cases and only behave differently in cases where it's currently dead-lock prone already. And probably provides an improvement in that case.

astoff · 2023-11-06T21:45:48Z

I agree with what you wrote, and I can contribute a test case which currently fails for me:

def test_lupa_gc_deadlock():

    def assert_no_deadlock(thread):
        thread.start()
        thread.join(1)
        assert not thread.is_alive(), "thread didn't finish"

    def trigger_gc(ref):
        del ref[0]

    lua = LuaRuntime()
    ref = [lua.eval("{}")]
    lua.execute(
        "f,x=...; f(x)",
        assert_no_deadlock,
        Thread(target=trigger_gc, args=[ref]),
    )

Closes #250

scoder · 2024-01-28T12:30:35Z

I started fixing this, but the test that you provided is still failing. Might just be a missing cleanup call somewhere.

See #255

scoder added a commit that referenced this issue Jan 28, 2024

Try to solve deadlock when deallocating Python objects in threads.

a0c8c1d

Closes #250

scoder mentioned this issue Jan 28, 2024

Try to solve deadlock when deallocating Python objects in threads #255

Merged

scoder closed this as completed in #255 May 31, 2024

scoder closed this as completed in 6029ef1 May 31, 2024

scoder added this to the 2.2 milestone May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock involving dealloc #250

Deadlock involving dealloc #250

astoff commented Nov 3, 2023

scoder commented Nov 4, 2023

astoff commented Nov 6, 2023

scoder commented Jan 28, 2024

Deadlock involving __dealloc__ #250

Deadlock involving __dealloc__ #250

Comments

astoff commented Nov 3, 2023

scoder commented Nov 4, 2023

astoff commented Nov 6, 2023

scoder commented Jan 28, 2024

Deadlock involving dealloc #250

Deadlock involving dealloc #250