Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.0 causes multiprocessing workers to hang on exit #255

Open
reticulatedpines opened this issue Apr 11, 2023 · 4 comments
Open

1.1.0 causes multiprocessing workers to hang on exit #255

reticulatedpines opened this issue Apr 11, 2023 · 4 comments
Labels

Comments

@reticulatedpines
Copy link

Possible regression from 1.0.0, it seems that 1.1.0 doesn't let multiprocessing workers exit (even though they join). This means the main script also refuses to exit. Only tested on Linux (Debian Bullseye, and Fedora 37, identical behaviour).

I don't have much experience with vncdotool, multiprocessing or twisted, so I could easily be missing something obvious here. I do have a simple repro script though so it's hopefully easy to test. I tested in a venv with only these listed by freeze:

attrs==22.2.0
Automat==22.10.0
constantly==15.1.0
hyperlink==21.0.0
idna==3.4
incremental==22.10.0
Pillow==9.5.0
pycryptodomex==3.17
six==1.16.0
Twisted==22.10.0
typing-extensions==4.5.0
vncdotool==1.1.0
zope.interface==6.0

I didn't control for system packages, please let me know if that's relevant.

vncdotool version
1.1.0

VNC server and version
None required.

Steps to reproduce
Make a python3 venv, install vncdotool 1.1.0. Run the following repro script, it will fail to exit. Or fail to repro, let's find out :)
Downgrade to 1.0.0 and it should exit.

#!/usr/bin/env python3

import sys
import multiprocessing

import twisted
import vncdotool
from vncdotool import api

def test_worker():
    return

def main():
    print("vncdotool version: %s" % vncdotool.__version__)
    print("twisted version: %s" % twisted.__version__)

    with vncdotool.api.connect(":1234") as v:
        # no server required, ConnectionRefused still triggers the hang
        pass

    worker = multiprocessing.Process(target=test_worker)
    worker.start()
    worker.join()
    print("please can we exit now?")
    exit(0)
    # this hangs for me, with vncdotool==1.1.0, twisted==22.10.0
    # hangs with 1.1.0, 21.2.0
    # exits okay with 1.0.0 + 22.10.0
    # exits okay with 1.0.0 + 21.2.0

if __name__ == "__main__":
    # setting spawn start method here doesn't prevent the problem
    multiprocessing.set_start_method("spawn")

    main()

Expected result
Above repro script should exit.

Which erroneous result did you get instead
Script hangs just after final print statement.

Additional information
Possibly related to #188
But that lists using spawn start method as a workaround, and that doesn't help me.

@pmhahn
Copy link
Collaborator

pmhahn commented Apr 11, 2023

Using vncdotool.api creates a background-daemon-thread, which runs the Twisted Reactor: If not explicitly terminated by calling api.shutdown() it will keep running. While it is marked as a daemon thread, which should not prevent Python from terminating, but mixing threading with multi-processing leads to all kinds of obscure issues.

gdb -p 117319 --batch -ex 'thread apply all py-bt'

Thread 3 (Thread 0x7f3fa3fff700 (LWP 117322) "python3"):
Traceback (most recent call first):
  File "/usr/lib/python3.9/threading.py", line 312, in wait
    waiter.acquire()
  File "/usr/lib/python3.9/queue.py", line 171, in get
    self.not_empty.wait()
  File "vncdotool/venv/lib/python3.9/site-packages/twisted/_threads/_threadworker.py", line 46, in work
    for task in iter(queue.get, _stop):
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
    self._bootstrap_inner()

Thread 2 (Thread 0x7f3fa8b2d700 (LWP 117321) "python3"):
Traceback (most recent call first):
  File "vncdotool/venv/lib/python3.9/site-packages/twisted/internet/epollreactor.py", line 227, in doPoll
    l = self._poller.poll(timeout, len(self._selectables))
  File "vncdotool/venv/lib/python3.9/site-packages/twisted/internet/base.py", line 1331, in mainLoop
    reactorBaseSelf.doIteration(t)
  File "vncdotool/venv/lib/python3.9/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 912, in _bootstrap
    self._bootstrap_inner()

Thread 1 (Thread 0x7f3faab4c740 (LWP 117319) "python3"):
Traceback (most recent call first):
  File "/usr/lib/python3.9/threading.py", line 1428, in _shutdown
    lock.acquire()

Looking at the threads and their .name attribute:

  1. is the main thread, which is calling exit()
  2. is Twisted, the main reactor by vncdotool.api.connect()
  3. is PoolThread-twisted.internet.reactor-0 from twisted/python/threadpool.py; According to the code it inherits daemon from its parent thread 2.

PS: As stated above calling api.shutdown() before sys.exit(0) mitigates the issue, but we should find a real fix.

@reticulatedpines
Copy link
Author

Ah, great, thanks for the info and quick workaround! I didn't know where to even start debugging twisted + multiprocessing. I can confirm that shutdown() lets it exit cleanly for me here, too. I thought it was weird that join() still succeeded, makes sense now.

It's curious that 1.0.0 doesn't exhibit this behaviour, I've run 100s of tests and it's been consistent, so I'd guess not a race or similar. It's beyond my ability to diagnose, so I'll leave that up to you.

Possibly docstring for api.connect() could mention shutdown() as being wise to call before attempting to exit? I don't know if that's an appropriate place - I would have seen it, but who knows if I'm typical.

Maybe the api object could be made into a context manager too, then there's an obvious place to automatically call shutdown(), though you'd have to nest two contexts just to get a client which is kind of ugly. Since nobody is using api as a context manager currently it shouldn't break existing code.

    with vncdotool.api as a:
        with a.connect(":1234") as client:
            client.stuff()
        # client teardown happens here
    # api teardown here

I don't love it, but I guess it's not terrible looking.

Less likely to be appropriate, shutdown() could implicitly occur on leaving the context manager block for the connect()? Perhaps that's too surprising a side effect, since it's the client that leaves scope, not the api. More likely to break existing code.

Possibly, use atexit: https://docs.python.org/3/library/atexit.html
I have no experience using this, it reads like conceptually the right thing to do. Register the handler to call shutdown() whenever the main reactor is started. It's stdlib which is nice.

@reticulatedpines
Copy link
Author

Sadly for me, vncdotool.api.shutdown() doesn't help me in the larger code I was working on. Perhaps I found two different hang bugs. I'll try and work up another clean repro case.

@reticulatedpines
Copy link
Author

Found my mistake: in the more complex code, I had multiple processes each handling a VNC connection. I'd missed adding api.shutdown() to one exit route. Your workaround is still effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants