|
| 1 | +# `subint_forkserver` × `multiprocessing.SharedMemory`: incompatible-by-mp-design |
| 2 | + |
| 3 | +Surfaced by `tests/test_shm.py` under |
| 4 | +`--spawn-backend=subint_forkserver`. Both test functions |
| 5 | +fail with distinct symptoms that share one root cause: |
| 6 | +**`multiprocessing.resource_tracker` is fork-without-exec |
| 7 | +unsafe.** |
| 8 | + |
| 9 | +## TL;DR |
| 10 | + |
| 11 | +`mp.shared_memory.SharedMemory` registers each shm |
| 12 | +allocation with the per-process |
| 13 | +`multiprocessing.resource_tracker` singleton. The |
| 14 | +tracker is a daemon process started lazily, and the |
| 15 | +parent owns a unix-pipe-fd to it. When the parent |
| 16 | +forks-without-execing into a `subint_forkserver` |
| 17 | +child, the child inherits that fd — but the fd refers |
| 18 | +to the *parent's* tracker, which the child has no |
| 19 | +business writing to. |
| 20 | + |
| 21 | +Two manifestations: |
| 22 | + |
| 23 | +1. **`test_child_attaches_alot`** — child loops 1000× |
| 24 | + `attach_shm_list()`. First `mp.SharedMemory` call |
| 25 | + in the child triggers |
| 26 | + `resource_tracker._ensure_running_and_write` → |
| 27 | + `_teardown_dead_process` → `os.close(self._fd)` on |
| 28 | + an fd the child should never have touched. Surfaces |
| 29 | + as `OSError: [Errno 9] Bad file descriptor` |
| 30 | + wrapped in `tractor.RemoteActorError`. |
| 31 | + |
| 32 | +2. **`test_parent_writer_child_reader[*]`** — first |
| 33 | + parametrize variant "passes" (with |
| 34 | + `resource_tracker: leaked shared_memory` warning) |
| 35 | + because nobody ever cleans up `/shm_list`. |
| 36 | + Subsequent variants then fail with |
| 37 | + `FileExistsError: '/shm_list'` because the leak |
| 38 | + persists across the parametrize loop and forkserver |
| 39 | + children can't `shm_open(create=True)` an existing |
| 40 | + key. Trio backend doesn't surface this because |
| 41 | + each subactor `exec`s a fresh interpreter → |
| 42 | + independent resource tracker per subactor → no |
| 43 | + inherited-fd issue, and the test's pre-existing |
| 44 | + leak is masked by the per-process tracker reset. |
| 45 | + |
| 46 | +## Why trio backend works |
| 47 | + |
| 48 | +Under `--spawn-backend=trio`, each subactor is born |
| 49 | +via `python -m tractor._child` (full `execve`) → |
| 50 | +fresh interpreter → fresh module-level globals → |
| 51 | +`mp.resource_tracker._resource_tracker` is `None` |
| 52 | +until first use → `mp.SharedMemory` constructs its |
| 53 | +own tracker, talks to its own pipe-fd. No cross- |
| 54 | +process fd inheritance. |
| 55 | + |
| 56 | +Under `subint_forkserver`, the child is |
| 57 | +`os.fork()`'d from a worker thread of the parent |
| 58 | +(no `exec`) → inherits parent's |
| 59 | +`mp.resource_tracker._resource_tracker._fd` → |
| 60 | +EBADF / cross-talk on first `mp.SharedMemory` |
| 61 | +operation in the child. |
| 62 | + |
| 63 | +## Status |
| 64 | + |
| 65 | +**Not a tractor bug.** This is the canonical |
| 66 | +"fork-without-exec breaks `multiprocessing` |
| 67 | +internals" class — see CPython issues: |
| 68 | + |
| 69 | +- https://bugs.python.org/issue38119 |
| 70 | +- https://bugs.python.org/issue45209 |
| 71 | + |
| 72 | +Pure-`fork` start method has the same incompatibility; |
| 73 | +that's why `mp` itself defaults to `spawn` on macOS |
| 74 | +and `forkserver`/`spawn` on Linux post-3.14. |
| 75 | + |
| 76 | +## Mitigation |
| 77 | + |
| 78 | +`tests/test_shm.py` is module-marked with |
| 79 | +`pytest.mark.skipon_spawn_backend('subint_forkserver', |
| 80 | +'subint', reason=...)` pointing at this doc. |
| 81 | + |
| 82 | +Two longer-term options if we ever want shm tests under |
| 83 | +`subint_forkserver`: |
| 84 | + |
| 85 | +1. **Reset the inherited tracker fd in the child |
| 86 | + prelude** — |
| 87 | + `tractor/spawn/_subint_forkserver.py::_child_target` |
| 88 | + already calls `_close_inherited_fds()`. We could |
| 89 | + additionally explicitly clear |
| 90 | + `multiprocessing.resource_tracker._resource_tracker` |
| 91 | + so the child re-creates a fresh tracker on first |
| 92 | + shm op. **Caveat**: this means each |
| 93 | + forkserver-subactor spawns its own resource-tracker |
| 94 | + daemon-process, multiplying daemon-proc count by |
| 95 | + subactor count. mp authors deliberately avoided |
| 96 | + this — the tracker is meant to be a per-mp-context |
| 97 | + singleton. |
| 98 | + |
| 99 | +2. **Stop using `multiprocessing.shared_memory`** — |
| 100 | + migrate to `posix_ipc` directly (no resource |
| 101 | + tracker) or finish the `hotbaud`-based ringbuf |
| 102 | + transport that already supersedes shm in many |
| 103 | + `tractor` IPC paths. |
| 104 | + |
| 105 | +Neither is in scope for the |
| 106 | +`subint_forkserver`-backend-lands PR; both are tracked |
| 107 | +out as future work. |
| 108 | + |
| 109 | +## Reproducer |
| 110 | + |
| 111 | +```sh |
| 112 | +# fail mode 1 (EBADF on resource_tracker._fd): |
| 113 | +./py314/bin/python -m pytest \ |
| 114 | + tests/test_shm.py::test_child_attaches_alot \ |
| 115 | + --spawn-backend=subint_forkserver --tb=short |
| 116 | + |
| 117 | +# fail mode 2 (FileExistsError on /shm_list): |
| 118 | +./py314/bin/python -m pytest \ |
| 119 | + tests/test_shm.py::test_parent_writer_child_reader \ |
| 120 | + --spawn-backend=subint_forkserver |
| 121 | + |
| 122 | +# baseline (passes): |
| 123 | +./py314/bin/python -m pytest \ |
| 124 | + tests/test_shm.py --spawn-backend=trio |
| 125 | +``` |
0 commit comments