Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"thread sanitizer" (tsan) CI job is hanging #118729

Closed
ericsnowcurrently opened this issue May 7, 2024 · 3 comments
Closed

"thread sanitizer" (tsan) CI job is hanging #118729

ericsnowcurrently opened this issue May 7, 2024 · 3 comments
Labels
3.13 bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented May 7, 2024

Bug report

Bug description:

At least since yesterday, the "tsan" CI jobs (Thread sanitizer (free threading) / Thread sanitizer) have been regularly hanging on test_importlib. When that happens, the job must be cancelled and restarted, sometimes multiple times.

Example logs:

(expand)

(from https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712)

Run ./python -m test --tsan -j4
Using random seed: 1572451380
0:00:00 load avg: 1.28 Run 22 tests in parallel using 4 worker processes
0:00:04 load avg: 1.28 [ 1/22] test_capi.test_pyatomic passed
0:00:09 load avg: 1.49 [ 2/22] test_code passed
0:00:14 load avg: 1.70 [ 3/22] test_functools passed
0:00:17 load avg: 1.96 [ 4/22] test_capi.test_mem passed
0:00:30 load avg: 2.33 [ 5/22] test_httpservers passed -- running (1): test_enum (30.0 sec)
0:00:34 load avg: 2.33 [ 6/22] test_imaplib passed -- running (1): test_enum (34.5 sec)
0:00:47 load avg: 2.70 [ 7/22] test_enum passed (47.0 sec) -- running (1): test_importlib (30.1 sec)
/usr/bin/llvm-symbolizer-18: error: '[stack]': No such file or directory
0:01:17 load avg: 5.66 running (4): test_importlib (1 min), test_logging (42.5 sec), test_io (47.0 sec), test_queue (30.0 sec)
0:01:24 load avg: 5.53 [ 8/22] test_logging passed (49.2 sec) -- running (3): test_importlib (1 min 6 sec), test_io (53.7 sec), test_queue (36.7 sec)
/home/runner/work/cpython/cpython/Lib/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=8735) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
0:01:54 load avg: 5.73 running (3): test_importlib (1 min 36 sec), test_io (1 min 23 sec), test_queue (1 min 6 sec)
0:02:19 load avg: 5.66 [ 9/22] test_signal passed (55.7 sec) -- running (3): test_importlib (2 min 2 sec), test_io (1 min 49 sec), test_queue (1 min 32 sec)
0:02:37 load avg: 5.18 [[10](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:11)/22] test_io passed (2 min 7 sec) -- running (2): test_importlib (2 min 20 sec), test_queue (1 min 50 sec)
/usr/bin/llvm-symbolizer-18: error: '[stack]': No such file or directory
0:02:43 load avg: 5.00 [[11](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:12)/22] test_queue passed (1 min 56 sec) -- running (1): test_importlib (2 min 26 sec)
/usr/bin/llvm-symbolizer-18: error: '[stack]': No such file or directory
0:02:49 load avg: 5.00 [[12](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:13)/22] test_sqlite3 passed -- running (1): test_importlib (2 min 32 sec)
0:02:55 load avg: 4.85 [[13](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:14)/22] test_syslog passed -- running (2): test_importlib (2 min 37 sec), test_socket (35.2 sec)
/usr/bin/llvm-symbolizer-18: error: '[stack]': No such file or directory
0:03:01 load avg: 4.70 [[14](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:15)/22] test_thread passed -- running (2): test_importlib (2 min 43 sec), test_socket (41.4 sec)
0:03:04 load avg: 4.70 [[15](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:16)/22] test_threadedtempfile passed -- running (2): test_importlib (2 min 47 sec), test_socket (45.0 sec)
/usr/bin/llvm-symbolizer-18: error: '[stack]': No such file or directory
0:03:08 load avg: 4.88 [[16](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:17)/22] test_ssl passed -- running (2): test_importlib (2 min 51 sec), test_socket (49.1 sec)
0:03:16 load avg: 4.59 [[17](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:18)/22] test_threading_local passed -- running (2): test_importlib (2 min 59 sec), test_socket (56.9 sec)
0:03:24 load avg: 4.38 [[18](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:19)/22] test_threadsignals passed -- running (2): test_importlib (3 min 7 sec), test_socket (1 min 4 sec)
0:03:43 load avg: 4.21 [[19](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:20)/22] test_socket passed (1 min 23 sec) -- running (2): test_importlib (3 min 25 sec), test_threading (38.3 sec)
0:04:13 load avg: 3.86 running (3): test_importlib (3 min 55 sec), test_threading (1 min 8 sec), test_weakref (48.3 sec)
0:04:22 load avg: 3.80 [[20](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:21)/22] test_threading passed (1 min 18 sec) -- running (2): test_importlib (4 min 5 sec), test_weakref (58.1 sec)
0:04:52 load avg: 3.17 running (2): test_importlib (4 min 35 sec), test_weakref (1 min 28 sec)
0:04:55 load avg: 3.08 [[21](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:22)/22] test_weakref passed (1 min 30 sec) -- running (1): test_importlib (4 min 37 sec)
0:05:25 load avg: 2.26 running (1): test_importlib (5 min 7 sec)
0:05:55 load avg: 1.76 running (1): test_importlib (5 min 37 sec)
0:06:25 load avg: 1.46 running (1): test_importlib (6 min 7 sec)
0:06:55 load avg: 1.28 running (1): test_importlib (6 min 37 sec)
0:07:25 load avg: 1.23 running (1): test_importlib (7 min 7 sec)
0:07:55 load avg: 1.15 running (1): test_importlib (7 min 38 sec)
0:08:25 load avg: 1.09 running (1): test_importlib (8 min 8 sec)
0:08:55 load avg: 1.05 running (1): test_importlib (8 min 38 sec)
0:09:25 load avg: 1.03 running (1): test_importlib (9 min 8 sec)
0:09:55 load avg: 1.08 running (1): test_importlib (9 min 38 sec)
0:10:25 load avg: 1.05 running (1): test_importlib (10 min 8 sec)
0:10:55 load avg: 1.03 running (1): test_importlib (10 min 38 sec)
0:11:25 load avg: 1.02 running (1): test_importlib (11 min 8 sec)
0:11:55 load avg: 1.01 running (1): test_importlib (11 min 38 sec)
0:12:25 load avg: 1.00 running (1): test_importlib (12 min 8 sec)
0:12:55 load avg: 1.00 running (1): test_importlib (12 min 38 sec)
0:13:25 load avg: 1.00 running (1): test_importlib (13 min 8 sec)
0:13:55 load avg: 1.00 running (1): test_importlib (13 min 38 sec)
0:14:25 load avg: 1.05 running (1): test_importlib (14 min 8 sec)
0:14:55 load avg: 1.03 running (1): test_importlib (14 min 38 sec)
0:15:25 load avg: 1.02 running (1): test_importlib (15 min 8 sec)
0:15:55 load avg: 1.01 running (1): test_importlib (15 min 38 sec)
0:16:25 load avg: 1.00 running (1): test_importlib (16 min 8 sec)
0:16:55 load avg: 1.00 running (1): test_importlib (16 min 38 sec)
0:17:25 load avg: 1.00 running (1): test_importlib (17 min 8 sec)
0:17:55 load avg: 1.00 running (1): test_importlib (17 min 38 sec)
0:18:25 load avg: 1.00 running (1): test_importlib (18 min 8 sec)
0:18:55 load avg: 1.00 running (1): test_importlib (18 min 38 sec)
0:19:25 load avg: 1.06 running (1): test_importlib (19 min 8 sec)
0:19:55 load avg: 1.04 running (1): test_importlib (19 min 38 sec)
0:20:25 load avg: 1.02 running (1): test_importlib (20 min 8 sec)
0:20:55 load avg: 1.01 running (1): test_importlib (20 min 38 sec)
0:21:25 load avg: 1.01 running (1): test_importlib (21 min 8 sec)
0:21:55 load avg: 1.00 running (1): test_importlib (21 min 38 sec)
0:[22](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:23):25 load avg: 1.00 running (1): test_importlib (22 min 8 sec)
0:22:55 load avg: 1.00 running (1): test_importlib (22 min 38 sec)
0:23:25 load avg: 1.00 running (1): test_importlib (23 min 8 sec)
0:[23](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:24):55 load avg: 1.06 running (1): test_importlib (23 min 38 sec)
0:[24](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:25):25 load avg: 1.04 running (1): test_importlib (24 min 8 sec)
0:24:55 load avg: 1.02 running (1): test_importlib (24 min 38 sec)
0:[25](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:26):25 load avg: 1.01 running (1): test_importlib (25 min 8 sec)
0:25:55 load avg: 1.01 running (1): test_importlib (25 min 38 sec)
0:26:25 load avg: 1.07 running (1): test_importlib ([26](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:27) min 8 sec)
0:26:55 load avg: 1.04 running (1): test_importlib (26 min 38 sec)
0:27:25 load avg: 1.02 running (1): test_importlib ([27](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:28) min 8 sec)
0:27:55 load avg: 1.01 running (1): test_importlib (27 min 38 sec)
0:[28](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:29):25 load avg: 1.06 running (1): test_importlib (28 min 8 sec)
0:28:55 load avg: 1.10 running (1): test_importlib (28 min 38 sec)
0:[29](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:30):25 load avg: 1.06 running (1): test_importlib (29 min 8 sec)
0:29:55 load avg: 1.11 running (1): test_importlib (29 min 38 sec)
0:[30](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:31):25 load avg: 1.07 running (1): test_importlib (30 min 8 sec)
0:30:55 load avg: 1.04 running (1): test_importlib (30 min 38 sec)
0:[31](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:32):25 load avg: 1.02 running (1): test_importlib (31 min 8 sec)
0:31:55 load avg: 1.01 running (1): test_importlib (31 min 38 sec)
0:[32](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:33):25 load avg: 1.08 running (1): test_importlib (32 min 8 sec)
0:32:55 load avg: 1.10 running (1): test_importlib (32 min 38 sec)
0:[33](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:34):25 load avg: 1.06 running (1): test_importlib (33 min 8 sec)
0:33:55 load avg: 1.09 running (1): test_importlib (33 min 38 sec)
0:[34](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:35):25 load avg: 1.05 running (1): test_importlib (34 min 8 sec)
0:34:55 load avg: 1.08 running (1): test_importlib (34 min 38 sec)
0:[35](https://github.com/python/cpython/actions/runs/8987721571/job/24686950755?pr=118712#step:12:36):25 load avg: 1.05 running (1): test_importlib (35 min 8 sec)
0:35:55 load avg: 1.08 running (1): test_importlib (35 min 38 sec)
Error: The operation was canceled.

Logs from a successful run (test_importlib finishes at a bit over a minute):

(expand)

(from https://github.com/python/cpython/actions/runs/8979326734/job/24661801478)

Run ./python -m test --tsan -j4
Using random seed: 1782869376
0:00:00 load avg: 1.17 Run 22 tests in parallel using 4 worker processes
0:00:04 load avg: 1.40 [ 1/22] test_capi.test_pyatomic passed
0:00:08 load avg: 1.61 [ 2/22] test_code passed
0:00:14 load avg: 1.80 [ 3/22] test_functools passed
0:00:16 load avg: 1.80 [ 4/22] test_capi.test_mem passed
0:00:29 load avg: 2.21 [ 5/22] test_httpservers passed
0:00:34 load avg: 2.43 [ 6/22] test_imaplib passed -- running (1): test_enum (33.9 sec)
0:00:45 load avg: 2.67 [ 7/22] test_enum passed (44.7 sec)
LLVMSymbolizer: error reading file: No such file or directory
0:01:15 load avg: 5.14 running (4): test_importlib (58.3 sec), test_logging (40.8 sec), test_io (45.7 sec), test_queue (30.0 sec)
0:01:24 load avg: 5.96 [ 8/22] test_logging passed (50.6 sec) -- running (3): test_importlib (1 min 8 sec), test_io (55.5 sec), test_queue (39.8 sec)
/home/runner/work/cpython/cpython/Lib/multiprocessing/popen_fork.py:67: DeprecationWarning: This process (pid=6705) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
0:01:31 load avg: 6.04 [ 9/22] test_importlib passed (1 min 15 sec) -- running (2): test_io (1 min 2 sec), test_queue (46.7 sec)
0:02:01 load avg: 5.99 running (3): test_signal (36.9 sec), test_io (1 min 32 sec), test_queue (1 min 16 sec)
0:02:19 load avg: 6.27 [[10](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:11)/22] test_signal passed (54.9 sec) -- running (3): test_socket (48.0 sec), test_io (1 min 50 sec), test_queue (1 min 34 sec)
0:02:32 load avg: 6.47 [[11](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:12)/22] test_sqlite3 passed -- running (3): test_socket (1 min), test_io (2 min 2 sec), test_queue (1 min 47 sec)
0:02:35 load avg: 6.83 [[12](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:13)/22] test_io passed (2 min 5 sec) -- running (2): test_socket (1 min 3 sec), test_queue (1 min 50 sec)
LLVMSymbolizer: error reading file: No such file or directory
0:02:41 load avg: 8.93 [[13](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:14)/22] test_queue passed (1 min 56 sec) -- running (1): test_socket (1 min 9 sec)
LLVMSymbolizer: error reading file: No such file or directory
0:02:41 load avg: 8.93 [[14](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:15)/22] test_syslog passed -- running (1): test_socket (1 min 9 sec)
LLVMSymbolizer: error reading file: No such file or directory
0:02:45 load avg: 8.61 [[15](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:16)/22] test_threadedtempfile passed -- running (1): test_socket (1 min 13 sec)
0:02:47 load avg: 8.61 [[16](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:17)/22] test_thread passed -- running (1): test_socket (1 min 15 sec)
0:02:53 load avg: 7.67 [[17](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:18)/22] test_threading_local passed -- running (1): test_socket (1 min 21 sec)
0:02:57 load avg: 7.67 [[18](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:19)/22] test_ssl passed -- running (1): test_socket (1 min 25 sec)
0:03:00 load avg: 7.21 [[19](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:20)/22] test_socket passed (1 min 28 sec)
0:03:00 load avg: 7.21 [[20](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:21)/22] test_threadsignals passed
0:03:30 load avg: 5.24 running (2): test_weakref (33.6 sec), test_threading (45.8 sec)
0:03:46 load avg: 4.52 [[21](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:22)/22] test_threading passed (1 min 1 sec) -- running (1): test_weakref (49.8 sec)
0:04:15 load avg: 3.21 [22/22] test_weakref passed (1 min 18 sec)

== Tests result: SUCCESS ==

All [22](https://github.com/python/cpython/actions/runs/8979326734/job/24661801478#step:12:23) tests OK.

Total duration: 4 min 15 sec
Total tests: run=5,876 skipped=162
Total test files: run=22/22
Result: SUCCESS

I noticed this with my one of my PRs, gh-118157, which makes some changes related to importing extension modules. However, I don't know if that PR is at fault. I definitely don't have much insight into what might be causing the hangs.

CC @colesbury

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

@ericsnowcurrently ericsnowcurrently added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes topic-free-threading labels May 7, 2024
@colesbury
Copy link
Contributor

I think it's due to #118727. test_importlib.py runs pool_in_threads.py, which may crash. When it crashes it can hang the job (possibly due to issues with faulthandler; I'm not 100% sure).

@colesbury
Copy link
Contributor

We may want to add a skip for test_multiprocessing_pool_circular_import, which runs pool_in_threads.py, until the underlying issue is fixed.

colesbury added a commit to colesbury/cpython that referenced this issue May 7, 2024
…_import`

The `pool_in_threads.py` test file may crash in free-threaded builds,
which can lead to the Tsan test hanging. Skip it for now until we fix
the underlying issue.
Yhg1s pushed a commit that referenced this issue May 7, 2024
…t` (#118732)

The `pool_in_threads.py` test file may crash in free-threaded builds,
which can lead to the Tsan test hanging. Skip it for now until we fix
the underlying issue.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
…_import` (python#118732)

The `pool_in_threads.py` test file may crash in free-threaded builds,
which can lead to the Tsan test hanging. Skip it for now until we fix
the underlying issue.
@colesbury
Copy link
Contributor

The bug was fixed and the test was reenabled in #118745.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants