Skip to content

Commit c99d475

Browse files
committed
Document SharedMemory × subint_forkserver incompat
New `ai/conc-anal/` doc: `mp.SharedMemory` is fork-without-exec unsafe — child inherits parent's `resource_tracker` fd → EBADF on first shm op; leaked `/shm_list` cascades `FileExistsError` across parametrize variants. Canonical CPython issue class, NOT a tractor bug. Includes two longer-term mitigation paths (reset inherited tracker fd vs migrate off `mp.shared_memory`). Also, update `tests/test_shm.py`: - comment out `subint_forkserver` from skip list - rewrite reason with precise failure-mode descriptions + link to the analysis doc (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
1 parent 6d76b60 commit c99d475

2 files changed

Lines changed: 134 additions & 5 deletions

File tree

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# `subint_forkserver` × `multiprocessing.SharedMemory`: incompatible-by-mp-design
2+
3+
Surfaced by `tests/test_shm.py` under
4+
`--spawn-backend=subint_forkserver`. Both test functions
5+
fail with distinct symptoms that share one root cause:
6+
**`multiprocessing.resource_tracker` is fork-without-exec
7+
unsafe.**
8+
9+
## TL;DR
10+
11+
`mp.shared_memory.SharedMemory` registers each shm
12+
allocation with the per-process
13+
`multiprocessing.resource_tracker` singleton. The
14+
tracker is a daemon process started lazily, and the
15+
parent owns a unix-pipe-fd to it. When the parent
16+
forks-without-execing into a `subint_forkserver`
17+
child, the child inherits that fd — but the fd refers
18+
to the *parent's* tracker, which the child has no
19+
business writing to.
20+
21+
Two manifestations:
22+
23+
1. **`test_child_attaches_alot`** — child loops 1000×
24+
`attach_shm_list()`. First `mp.SharedMemory` call
25+
in the child triggers
26+
`resource_tracker._ensure_running_and_write`
27+
`_teardown_dead_process``os.close(self._fd)` on
28+
an fd the child should never have touched. Surfaces
29+
as `OSError: [Errno 9] Bad file descriptor`
30+
wrapped in `tractor.RemoteActorError`.
31+
32+
2. **`test_parent_writer_child_reader[*]`** — first
33+
parametrize variant "passes" (with
34+
`resource_tracker: leaked shared_memory` warning)
35+
because nobody ever cleans up `/shm_list`.
36+
Subsequent variants then fail with
37+
`FileExistsError: '/shm_list'` because the leak
38+
persists across the parametrize loop and forkserver
39+
children can't `shm_open(create=True)` an existing
40+
key. Trio backend doesn't surface this because
41+
each subactor `exec`s a fresh interpreter →
42+
independent resource tracker per subactor → no
43+
inherited-fd issue, and the test's pre-existing
44+
leak is masked by the per-process tracker reset.
45+
46+
## Why trio backend works
47+
48+
Under `--spawn-backend=trio`, each subactor is born
49+
via `python -m tractor._child` (full `execve`) →
50+
fresh interpreter → fresh module-level globals →
51+
`mp.resource_tracker._resource_tracker` is `None`
52+
until first use → `mp.SharedMemory` constructs its
53+
own tracker, talks to its own pipe-fd. No cross-
54+
process fd inheritance.
55+
56+
Under `subint_forkserver`, the child is
57+
`os.fork()`'d from a worker thread of the parent
58+
(no `exec`) → inherits parent's
59+
`mp.resource_tracker._resource_tracker._fd`
60+
EBADF / cross-talk on first `mp.SharedMemory`
61+
operation in the child.
62+
63+
## Status
64+
65+
**Not a tractor bug.** This is the canonical
66+
"fork-without-exec breaks `multiprocessing`
67+
internals" class — see CPython issues:
68+
69+
- https://bugs.python.org/issue38119
70+
- https://bugs.python.org/issue45209
71+
72+
Pure-`fork` start method has the same incompatibility;
73+
that's why `mp` itself defaults to `spawn` on macOS
74+
and `forkserver`/`spawn` on Linux post-3.14.
75+
76+
## Mitigation
77+
78+
`tests/test_shm.py` is module-marked with
79+
`pytest.mark.skipon_spawn_backend('subint_forkserver',
80+
'subint', reason=...)` pointing at this doc.
81+
82+
Two longer-term options if we ever want shm tests under
83+
`subint_forkserver`:
84+
85+
1. **Reset the inherited tracker fd in the child
86+
prelude**
87+
`tractor/spawn/_subint_forkserver.py::_child_target`
88+
already calls `_close_inherited_fds()`. We could
89+
additionally explicitly clear
90+
`multiprocessing.resource_tracker._resource_tracker`
91+
so the child re-creates a fresh tracker on first
92+
shm op. **Caveat**: this means each
93+
forkserver-subactor spawns its own resource-tracker
94+
daemon-process, multiplying daemon-proc count by
95+
subactor count. mp authors deliberately avoided
96+
this — the tracker is meant to be a per-mp-context
97+
singleton.
98+
99+
2. **Stop using `multiprocessing.shared_memory`**
100+
migrate to `posix_ipc` directly (no resource
101+
tracker) or finish the `hotbaud`-based ringbuf
102+
transport that already supersedes shm in many
103+
`tractor` IPC paths.
104+
105+
Neither is in scope for the
106+
`subint_forkserver`-backend-lands PR; both are tracked
107+
out as future work.
108+
109+
## Reproducer
110+
111+
```sh
112+
# fail mode 1 (EBADF on resource_tracker._fd):
113+
./py314/bin/python -m pytest \
114+
tests/test_shm.py::test_child_attaches_alot \
115+
--spawn-backend=subint_forkserver --tb=short
116+
117+
# fail mode 2 (FileExistsError on /shm_list):
118+
./py314/bin/python -m pytest \
119+
tests/test_shm.py::test_parent_writer_child_reader \
120+
--spawn-backend=subint_forkserver
121+
122+
# baseline (passes):
123+
./py314/bin/python -m pytest \
124+
tests/test_shm.py --spawn-backend=trio
125+
```

tests/test_shm.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,18 @@
1616

1717
pytestmark = pytest.mark.skipon_spawn_backend(
1818
'subint',
19-
'subint_forkserver',
19+
# 'subint_forkserver',
2020
reason=(
2121
'subint: GIL-contention hanging class.\n'
2222
'subint_forkserver: `multiprocessing.SharedMemory` '
23-
'has known issues with fork-without-exec (mp\'s '
24-
'resource_tracker and SharedMemory internals assume '
25-
'fresh-process state). RemoteActorError surfaces from '
26-
'the shm-attach path. TODO, put issue link!\n'
23+
'is fork-without-exec unsafe — child inherits parent\'s '
24+
'`resource_tracker` fd → EBADF on first shm op '
25+
'(`test_child_attaches_alot`); leaked `/shm_list` from '
26+
'a "passing" run cascades into `FileExistsError` across '
27+
'parametrize variants (`test_parent_writer_child_reader`). '
28+
'Canonical CPython issue class, NOT a tractor bug; full '
29+
'tracker doc:\n'
30+
'ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md'
2731
)
2832
)
2933

0 commit comments

Comments
 (0)