Skip to content

Commit 7cd47ef

Browse files
committed
Doc ruled-out fix + capture-pipe aside
Two new sections in `subint_forkserver_test_cancellation_leak_issue.md` documenting continued investigation of the `test_nested_multierrors[subint_forkserver]` peer- channel-loop hang: 1. **"Attempted fix (DID NOT work) — hypothesis (3)"**: tried sync-closing peer channels' raw socket fds from `_serve_ipc_eps`'s finally block (iterate `server._peers`, `_chan._transport. stream.socket.close()`). Theory was that sync close would propagate as `EBADF` / `ClosedResourceError` into the stuck `recv_some()` and unblock it. Result: identical hang. Either trio holds an internal fd reference that survives external close, or the stuck recv isn't even the root blocker. Either way: ruled out, experiment reverted, skip-mark restored. 2. **"Aside: `-s` flag changes behavior for peer- intensive tests"**: noticed `test_context_stream_semantics.py` under `subint_forkserver` hangs with default `--capture=fd` but passes with `-s` (`--capture=no`). Working hypothesis: subactors inherit pytest's capture pipe (fds 1,2 — which `_close_inherited_fds` deliberately preserves); verbose subactor logging fills the buffer, writes block, deadlock. Fix direction (if confirmed): redirect subactor stdout/stderr to `/dev/null` or a file in `_actor_child_main`. Not a blocker on the main investigation; deserves its own mini-tracker. Both sections are diagnosis-only — no code changes in this commit. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
1 parent 76d1206 commit 7cd47ef

1 file changed

Lines changed: 56 additions & 0 deletions

File tree

ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,62 @@ Candidate follow-up experiments:
395395
re-raise means it should still exit. Unless
396396
something higher up swallows it.
397397

398+
### Attempted fix (DID NOT work) — hypothesis (3)
399+
400+
Tried: in `_serve_ipc_eps` finally, after closing
401+
listeners, also iterate `server._peers` and
402+
sync-close each peer channel's underlying stream
403+
socket fd:
404+
405+
```python
406+
for _uid, _chans in list(server._peers.items()):
407+
for _chan in _chans:
408+
try:
409+
_stream = _chan._transport.stream if _chan._transport else None
410+
if _stream is not None:
411+
_stream.socket.close() # sync fd close
412+
except (AttributeError, OSError):
413+
pass
414+
```
415+
416+
Theory: closing the socket fd from outside the stuck
417+
recv task would make the recv see EBADF /
418+
ClosedResourceError and unblock.
419+
420+
Result: `test_nested_multierrors[subint_forkserver]`
421+
still hangs identically. Either:
422+
- The sync `socket.close()` doesn't propagate into
423+
trio's in-flight `recv_some()` the way I expected
424+
(trio may hold an internal reference that keeps the
425+
fd open even after an external close), or
426+
- The stuck recv isn't even the root blocker and the
427+
peer handlers never reach the finally for some
428+
reason I haven't understood yet.
429+
430+
Either way, the sync-close hypothesis is **ruled
431+
out**. Reverted the experiment, restored the skip-
432+
mark on the test.
433+
434+
### Aside: `-s` flag changes behavior for peer-intensive tests
435+
436+
While exploring, noticed
437+
`tests/test_context_stream_semantics.py` under
438+
`--spawn-backend=subint_forkserver` hangs with
439+
pytest's default `--capture=fd` but passes with
440+
`-s` (`--capture=no`). Hypothesis (unverified): fork
441+
children inherit pytest's capture pipe for stdout/
442+
stderr (fds 1,2 — we preserve these in
443+
`_close_inherited_fds`). When subactor logging is
444+
verbose, the capture pipe buffer fills, writes block,
445+
child can't progress, deadlock.
446+
447+
If confirmed, fix direction: redirect subactor
448+
stdout/stderr to `/dev/null` (or a file) in
449+
`_actor_child_main` so subactors don't hold pytest's
450+
capture pipe open. Not a blocker on the main
451+
peer-chan-loop investigation; deserves its own mini-
452+
tracker.
453+
398454
## Stopgap (landed)
399455

400456
`test_nested_multierrors` skip-marked under

0 commit comments

Comments
 (0)