Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock involving __dealloc__ #250

Closed
astoff opened this issue Nov 3, 2023 · 3 comments · Fixed by #255
Closed

Deadlock involving __dealloc__ #250

astoff opened this issue Nov 3, 2023 · 3 comments · Fixed by #255
Milestone

Comments

@astoff
Copy link

astoff commented Nov 3, 2023

I'm facing a deadlock which I think cannot be avoided from the Python API alone. The situation is as follows:

  1. A LuaRuntime is running on thread B.
  2. It calls a Python function. The argument is a Lua object.
  3. The Lua object gets diligently converted to a pure-Python data structure.
  4. The Python data structure is passed to thread A.
  5. Thread B blocks until thread A returns something.

I've traced the execution and the deadlock happens on this line of _LuaObject.__dealloc__:

locked = lock_runtime(self._runtime)

My assumption is that the "diligent conversion" step 3 generates some garbage that gets collected a bit later, when thread A is running. As evidence to this theory:

  • I haven't observed a deadlock yet if I call gc.collect(0) between steps 3 and 4.
  • I observe no deadlock if I remove the __dealloc__ method.

(As a data point for you, in case it is hard to find a fully general solution to this issue, my Lua runs are realtively short and I don't mind imperfect garbage collection as in the second point above.)

@scoder
Copy link
Owner

scoder commented Nov 4, 2023

Hmm, interesting case. That makes locking the runtime appear much less appealing inside of __dealloc__. It's difficult to avoid, though.

The lock that Lupa uses is re-entrant, so this rarely poses problems. It probably requires a setup as in your case to trigger it: passing Python objects over to other Lua threads inside of a Lua call.

One possible solution that comes to my mind is to request the lock in a non-blocking way in cases where we cannot afford waiting (in __dealloc__ specifically), and if we fail to acquire it, add the object to a list in the LuaRuntime and deallocate that list the next time right before we release the lock (i.e. when we definitely own it any way). That would probably keep some objects alive longer than necessary, and there might also be situations where this has other unexpected effects (we're dealing with threads here, so anything can happen, really), but it should still work as before in most cases and only behave differently in cases where it's currently dead-lock prone already. And probably provides an improvement in that case.

@astoff
Copy link
Author

astoff commented Nov 6, 2023

I agree with what you wrote, and I can contribute a test case which currently fails for me:

def test_lupa_gc_deadlock():

    def assert_no_deadlock(thread):
        thread.start()
        thread.join(1)
        assert not thread.is_alive(), "thread didn't finish"

    def trigger_gc(ref):
        del ref[0]

    lua = LuaRuntime()
    ref = [lua.eval("{}")]
    lua.execute(
        "f,x=...; f(x)",
        assert_no_deadlock,
        Thread(target=trigger_gc, args=[ref]),
    )

@scoder
Copy link
Owner

scoder commented Jan 28, 2024

I started fixing this, but the test that you provided is still failing. Might just be a missing cleanup call somewhere.

See #255

@scoder scoder closed this as completed in 6029ef1 May 31, 2024
@scoder scoder added this to the 2.2 milestone May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants