Skip to content

Commit

Permalink
add support for dumping current threads
Browse files Browse the repository at this point in the history
We are seeing some strange hangs in CI, e.g.

https://github.com/qtile/qtile/actions/runs/8239452558/job/22532678205

which prints lots of stacks, but they are only the test stacks, and not
qtile's stacks. However, you can see the test suite waiting and eventually
SIGKILLING qtile,

        Killing qtile forcefully
        qtile exited with exitcode: -9

so clearly qtile is deadlocked somewhere, but we have no idea where.

Python since 3.3 has a nice thing where you can print all the threads of a
currently running qtile, which might give us a clue where it's deadlocked.
The output looks like:

    Thread 0x00007f5e1bfff640 (most recent call first):
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker
      File "/usr/lib/python3.10/threading.py", line 953 in run
      File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
      File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

    Thread 0x00007f5e2892a640 (most recent call first):
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker
      File "/usr/lib/python3.10/threading.py", line 953 in run
      File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
      File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

    Thread 0x00007f5e29b91640 (most recent call first):
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker
      File "/usr/lib/python3.10/threading.py", line 953 in run
      File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
      File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

    Thread 0x00007f5e29280640 (most recent call first):
      File "/usr/lib/python3.10/concurrent/futures/thread.py", line 81 in _worker
      File "/usr/lib/python3.10/threading.py", line 953 in run
      File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
      File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap

    Current thread 0x00007f5e2cebb1c0 (most recent call first):
      File "/usr/lib/python3.10/selectors.py", line 469 in select
      File "/usr/lib/python3.10/asyncio/base_events.py", line 1871 in _run_once
      File "/usr/lib/python3.10/asyncio/base_events.py", line 603 in run_forever
      File "/usr/lib/python3.10/asyncio/base_events.py", line 636 in run_until_complete
      File "/usr/lib/python3.10/asyncio/runners.py", line 44 in run
      File "/home/tycho/.local/lib/python3.10/site-packages/libqtile/core/manager.py", line 204 in loop
      File "/home/tycho/.local/lib/python3.10/site-packages/libqtile/scripts/start.py", line 109 in start
      File "/home/tycho/.local/lib/python3.10/site-packages/libqtile/scripts/main.py", line 83 in main
      File "/home/tycho/.local/bin/qtile", line 8 in <module>

Hopefully this will help us debug some of these random failures.

Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
  • Loading branch information
tych0 committed Mar 13, 2024
1 parent 04354d5 commit c4d3c6e
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
7 changes: 7 additions & 0 deletions libqtile/scripts/main.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import argparse
import faulthandler
import logging
import signal
import sys
from pathlib import Path

Expand All @@ -24,6 +26,11 @@ def check_folder(value):


def main():
faulthandler.enable(all_threads=True)
# This is a bit unfortunate. We use SIGUSR1&2 for reloading config &
# restarting qtile, so we overload SIGWINCH here to dump threads.
faulthandler.register(signal.SIGWINCH, all_threads=True)

parent_parser = argparse.ArgumentParser(add_help=False)
parent_parser.add_argument(
"-l",
Expand Down
7 changes: 6 additions & 1 deletion test/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import logging
import multiprocessing
import os
import signal
import subprocess
import sys
import tempfile
Expand Down Expand Up @@ -227,10 +228,14 @@ def terminate(self):
self.proc.join(10)

if self.proc.is_alive():
# uh oh, we're hung somewhere. give it another second to print
# some stack traces
self.proc.join(1)
os.kill(self.proc.pid, signal.SIGWINCH)
print("Killing qtile forcefully", file=sys.stderr)
# desperate times... this probably messes with multiprocessing...
try:
os.kill(self.proc.pid, 9)
os.kill(self.proc.pid, signal.SIGKILL)
self.proc.join()
except OSError:
# The process may have died due to some other error
Expand Down

0 comments on commit c4d3c6e

Please sign in to comment.