-
-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surviving fork() #1614
Comments
I'm a newbie when it comes to the inner workings of the garbage collector, so apologies if this is way off, but |
@Harrison88 Yeah, the problem is that I only want to freeze a specific set of objects, not all of them. All these options involve compromises so it's still worth considering, but that's the downside. I also asked here, and didn't get any better answers so far: https://stackoverflow.com/questions/62387453/how-can-i-intentionally-leak-an-object-so-that-it-will-never-ever-be-garbage-co Anyway, long story short, it sounds like the best we can do is to use a ctypes hack on CPython, and on PyPy rely on the fact that they don't try to collect module-level globals during shutdown, and if any other Python VMs show up then we'll figure out what to do with them then. |
if fork detecting is implemented, what will happen if after forking one continues calling asynchronous methods, such as ...it seems that
is this not safe enough for trio purposes? |
I think the goal is that after forking, the child will effectively stop being in trio and switch back to synchronous mode. There's no way to let both processes continue calling But, if we do things properly, it should be possible to call That point about But now that I think about it, shutting down without Documenting that forked children must call In principle, with enough work, I guess we could make it so that if the task that called fork completes, its outcome gets propagated out of the child's copy of the original Maybe the best option then is to adjust the yield and task-finished paths to check for fork, and if detected then print a message and call
I guess the other option is to provide some API for forking, that like takes care of calling Oh, another thing we need to figure out what to do with in the child is our |
Actually, maybe this is fine, because we can use I always wondered why we had a |
but this isn't a trio problem, is it? even if i get a shiny new trio context, i can still shoot myself in the leg by trying to read that socket in both processes. what if i don't do that? for example, this code seems to work exactly like you'd expect: import trio
import os
async def print_after(text, delay):
await trio.sleep(delay)
print(text)
async def main():
print("hello")
async with trio.open_nursery() as nursery:
name = "child" if not os.fork() else "parent"
nursery.start_soon(print_after, f"hello from {name}", 1)
nursery.start_soon(print_after, f"bye from {name}", 2)
trio.run(main) is this code dangerous in some way? perhaps the I/O thing that's used under the hood gets shared between the processes in a way?
if i add an object with a finalizer to the above code, class Foo:
def __del__(self):
print("finalizing foo")
foo = Foo() i'll be getting two in other words, if a process-launching library doesn't prevent finalizers from running in child, it's bound to cause problems regardless of whether the code is using trio or not. this would be a problem with that library, not trio, wouldn't it? |
Yes, exactly – I mean things like, process A calls
Running finalizers twice isn't always bad... often it's what you'd want. But in Trio we have a very specific issue:
|
My solution would be to go through the process's open file descriptors and close them all on the OS level (well, except for stdin/out/err and the one we opened to communicate between parent and child). Then send some descendant of Thus, our high-level interface could be along the lines of |
@smurfix if the user calls The goal here isn't to add a high-level |
I'm not sure I'm qualified to write here, but to me @smurfix suggestion looks reasonable if the closing of FDs is limited to FDs that trio owns at the time of Something like:
I realize that this does invoke the destructors that @njsmith doesn't wanna call, but I don't see this as such a big problem: With all that said, it may not be worth it at all… For instance |
Of course you are! |
I guess another option we could consider would be to raise an error from a pre-fork hook, so that libraries that try to use I'm not 100% certain that pre-fork hook scan raise an exception; we'd need to check that. |
Ah, turns it out that exceptions in at-fork handlers are ignored: >>> os.register_at_fork(before=lambda: 1/0)
>>> os.fork()
Exception ignored in: <function <lambda> at 0x7f39ca5760d0>
Traceback (most recent call last):
File "<stdin>", line 1, in <lambda>
ZeroDivisionError: division by zero
326598
>>> 0 So never mind that. |
Originally posted by @oakkitten in #608 (comment)
It would certainly be nicer if this worked! (Though note, there is a workaround for multiprocessing in the mean time: use
set_start_method("spawn")
orset_start_method("forkserver")
.)So, what how can Trio survive if you call
os.fork()
inside atrio.run()
? Which breaks down into two questions really: (1) how do we detect when afork
has happened? (2) once we've detected it, how do we clean things up in the nicest way possible?Detection
The simplest approach is to save the value of
os.getpid()
when the runner is created, and then whenever we access the runner we check whetheros.getpid() == the_saved_value
. If our PID changes, then we're in a new process. This is what asyncio does.Doing this everywhere that we access
GLOBAL_RUN_STATE
does add some overhead. (Contrary to what you may have heard, with recent glibc on Linuxgetpid()
always does a full syscall, and syscalls have gotten more expensive in these days of meltdown/spectre/etc.)The other option is to use
os.register_at_fork
. This totally removes the overhead. The downsides are that it's 3.7+ only. And, in theory, it's not quite as robust (it relies on the code that callsfork()
also informing the Python interpreter that it has calledfork()
). But these limitations aren't too terrible. 3.6 is at the trailing edge of our support window, and we'll drop support within the next few years; in the mean time, telling folks who needfork()
survival to upgrade to 3.7+ seems reasonable. And there are a lot of reasons why calling fork and then trying to continue running Python code will go badly if you don't tell the Python interpreter what you're doing; it's not just Trio that can break.So my feeling is we should probably start with
os.register_at_fork
. And maybe consider adding agetpid()
check at the top oftrio.run
as a just-in-case belt-and-suspenders kind of thing?Cleaning up afterwards
So the obvious thing we need to do is to clear the child's
GLOBAL_RUN_CONTEXT
, so that Trio functions in the child won't try to mess with the parent's state.There's also a question about what to do with the old run state. We could try to clean it up, by closing things. We could drop references to it, and let the GC try to clean it up. Or we could intentionally leak it.
Of these, I think leaking is really the only viable option... there are a few bits of state that we could plausibly clean up (e.g. closing the epoll/kqueue/IOCP handle), but the big thing is all the tasks and their stacks and any
__del__
methods they might have. Trying to do explicit clean up on these will have unpredictable results, since it requires running arbitrary user code, that's definitely not expecting to be run inside a forked child. (I guess in theory we could reclaim the memory while avoiding running any destructors, but this would still leak all the descriptors etc., and anyway Python doesn't give us any way to do that.)This is also why if you use
fork()
in a process with multiple threads, all the other thread stacks are just leaked: there's nothing else you can reasonably do with them.So I guess what we want to do is just stash the old run state into a global somewhere, so it's pinned in memory until the process exits?
I'm actually not 100% sure how to reliably convince Python to never garbage collect an object graph. During shutdown, Python does try to collect module globals, and I'm not finding any great docs on that...
Maybe spawn a daemon thread that holds the pinned references on its stack, and then have it sleep forever? Weird gross hack but might work...
The text was updated successfully, but these errors were encountered: