Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surviving fork() #1614

Open
njsmith opened this issue Jun 14, 2020 · 13 comments
Open

Surviving fork() #1614

njsmith opened this issue Jun 14, 2020 · 13 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jun 14, 2020

i ran into [the "no calling run inside run" error message] while testing with pytest-trio. i'm launching another process that is using trio using multiprocessing, but since i'm launching it from an async fixture, i get this error. i suppose the issue is that multiprocessing forks, so the context remains

Originally posted by @oakkitten in #608 (comment)

It would certainly be nicer if this worked! (Though note, there is a workaround for multiprocessing in the mean time: use set_start_method("spawn") or set_start_method("forkserver").)

So, what how can Trio survive if you call os.fork() inside a trio.run()? Which breaks down into two questions really: (1) how do we detect when a fork has happened? (2) once we've detected it, how do we clean things up in the nicest way possible?

Detection

The simplest approach is to save the value of os.getpid() when the runner is created, and then whenever we access the runner we check whether os.getpid() == the_saved_value. If our PID changes, then we're in a new process. This is what asyncio does.

Doing this everywhere that we access GLOBAL_RUN_STATE does add some overhead. (Contrary to what you may have heard, with recent glibc on Linux getpid() always does a full syscall, and syscalls have gotten more expensive in these days of meltdown/spectre/etc.)

The other option is to use os.register_at_fork. This totally removes the overhead. The downsides are that it's 3.7+ only. And, in theory, it's not quite as robust (it relies on the code that calls fork() also informing the Python interpreter that it has called fork()). But these limitations aren't too terrible. 3.6 is at the trailing edge of our support window, and we'll drop support within the next few years; in the mean time, telling folks who need fork() survival to upgrade to 3.7+ seems reasonable. And there are a lot of reasons why calling fork and then trying to continue running Python code will go badly if you don't tell the Python interpreter what you're doing; it's not just Trio that can break.

So my feeling is we should probably start with os.register_at_fork. And maybe consider adding a getpid() check at the top of trio.run as a just-in-case belt-and-suspenders kind of thing?

Cleaning up afterwards

So the obvious thing we need to do is to clear the child's GLOBAL_RUN_CONTEXT, so that Trio functions in the child won't try to mess with the parent's state.

There's also a question about what to do with the old run state. We could try to clean it up, by closing things. We could drop references to it, and let the GC try to clean it up. Or we could intentionally leak it.

Of these, I think leaking is really the only viable option... there are a few bits of state that we could plausibly clean up (e.g. closing the epoll/kqueue/IOCP handle), but the big thing is all the tasks and their stacks and any __del__ methods they might have. Trying to do explicit clean up on these will have unpredictable results, since it requires running arbitrary user code, that's definitely not expecting to be run inside a forked child. (I guess in theory we could reclaim the memory while avoiding running any destructors, but this would still leak all the descriptors etc., and anyway Python doesn't give us any way to do that.)

This is also why if you use fork() in a process with multiple threads, all the other thread stacks are just leaked: there's nothing else you can reasonably do with them.

So I guess what we want to do is just stash the old run state into a global somewhere, so it's pinned in memory until the process exits?

I'm actually not 100% sure how to reliably convince Python to never garbage collect an object graph. During shutdown, Python does try to collect module globals, and I'm not finding any great docs on that...

Maybe spawn a daemon thread that holds the pinned references on its stack, and then have it sleep forever? Weird gross hack but might work...

@Harrison88
Copy link

I'm a newbie when it comes to the inner workings of the garbage collector, so apologies if this is way off, but gc.freeze() sounds like what you're talking about preventing collection of an object graph. https://docs.python.org/3/library/gc.html#gc.freeze
Or do you need to be able to freeze only that one object, and not every object currently tracked by the garbage collector?
This blog post about Instagram's dealing with the gc also seems relevant: https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172

@njsmith
Copy link
Member Author

njsmith commented Jun 18, 2020

@Harrison88 Yeah, the problem is that I only want to freeze a specific set of objects, not all of them. All these options involve compromises so it's still worth considering, but that's the downside.

I also asked here, and didn't get any better answers so far: https://stackoverflow.com/questions/62387453/how-can-i-intentionally-leak-an-object-so-that-it-will-never-ever-be-garbage-co

Anyway, long story short, it sounds like the best we can do is to use a ctypes hack on CPython, and on PyPy rely on the fact that they don't try to collect module-level globals during shutdown, and if any other Python VMs show up then we'll figure out what to do with them then.

@oakkitten
Copy link

if fork detecting is implemented, what will happen if after forking one continues calling asynchronous methods, such as await trio.sleep(1), in both the child and the parent? it seems to work for simple cases, should i expect this to blow up?

...it seems that multiprocessing uses os._exit() that forgoes calling __del__:

Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.

Note: The standard way to exit is sys.exit(n). _exit() should normally only be used in the child process after a fork().

is this not safe enough for trio purposes?

@njsmith
Copy link
Member Author

njsmith commented Jun 19, 2020

I think the goal is that after forking, the child will effectively stop being in trio and switch back to synchronous mode. There's no way to let both processes continue calling await trio.whatever(), because after a fork they end up with parts of their io state shared, and parts not shared – so for example you might try to read from a socket in one process, but then the other process gets the data. Better if trying to await in the child just gives you a straightforward error up front.

But, if we do things properly, it should be possible to call trio.run again in the child process to start a new independent event loop, and then of course you'd be able to await there.

That point about _exit is interesting actually... It's helpful that multiprocessing uses that, it makes supporting multiprocessing easier, but I was thinking that we want to support more than just multiprocessing, so we shouldn't rely on details like this.

But now that I think about it, shutting down without _exit is actually not trivial! It's easy enough to disable trio functionality in the child so it acts like it's in synchronous mode. But we'll still have trio's coroutine driver on the call stack, so eventually we'll either have to kill the program via _exit or else unwind through the trio core code.

Documenting that forked children must call _exit would certainly be the simplest solution. But I do prefer to minimize the number of places where the docs have to sternly insist you do things a certain way; users are only human, and we all make mistakes.

In principle, with enough work, I guess we could make it so that if the task that called fork completes, its outcome gets propagated out of the child's copy of the original trio.run. (sort of promoting that task to become the main task.) This sounds pretty messy though. And I'm not sure it's really useful... I think most forks will be buried behind some abstraction layer (like multiprocessing), so when the child finishes, it has no way of knowing how far away the actual end of the task is – it might be way down the task's call stack.

Maybe the best option then is to adjust the yield and task-finished paths to check for fork, and if detected then print a message and call _exit?

KeyboardInterrupt might cause particular problems, since if you write a plain if not fork(): .... child code .... inside a trio task, then a KeyboardInterrupt will tend to escape that and start trying to propagate up the task tree. I guess there's not a whole lot we can do about that though.

I guess the other option is to provide some API for forking, that like takes care of calling _exit for you at appropriate times? I think this would just be a finally: _exit() block though, so not providing a lot of value, esp if we assume most forks will be coming from existing libraries that don't know about trio.


Oh, another thing we need to figure out what to do with in the child is our KeyboardInterrupt handler. (Ugh, and do open_signal_receiver handlers need anything special? Maybe they want their own fork hooks, separate from the core. Mask signals in before-fork, unmask in parent-after-fork, restore-original-handlers-then-unmask in child-after-fork? Of course if you fork from a background thread, then the before-fork handler can't access the signal API, even though the thread will become the main thread in the child, so that's fun.)

@njsmith
Copy link
Member Author

njsmith commented Jun 19, 2020

Of course if you fork from a background thread, then the before-fork handler can't access the signal API, even though the thread will become the main thread in the child, so that's fun.

Actually, maybe this is fine, because we can use pthread_sigmask to block signals just in whatever thread we're on.

I always wondered why we had a pthread_sigmask wrapper in the signal module, since Python's signal handling model seems to make it useless. Maybe this is why!

@oakkitten
Copy link

so for example you might try to read from a socket in one process, but then the other process gets the data

but this isn't a trio problem, is it? even if i get a shiny new trio context, i can still shoot myself in the leg by trying to read that socket in both processes. what if i don't do that? for example, this code seems to work exactly like you'd expect:

import trio
import os

async def print_after(text, delay):
    await trio.sleep(delay)
    print(text)

async def main():
    print("hello")
    async with trio.open_nursery() as nursery:
        name = "child" if not os.fork() else "parent"
        nursery.start_soon(print_after, f"hello from {name}", 1)
        nursery.start_soon(print_after, f"bye from {name}", 2)

trio.run(main)

is this code dangerous in some way? perhaps the I/O thing that's used under the hood gets shared between the processes in a way?

we want to support more than just multiprocessing, so we shouldn't rely on details like this

if i add an object with a finalizer to the above code,

class Foo:
    def __del__(self):
        print("finalizing foo")

foo = Foo()

i'll be getting two finalizing foo in the output if i run it. as i understand, even if you prevent trio context and friends from being finalized, the finalizer of foo will still run twice. so i am already shooting myself in the foot here. as long as the internals of trio themselves can survive being garbage collected twice, i wouldn't expect trio to help me out here.

in other words, if a process-launching library doesn't prevent finalizers from running in child, it's bound to cause problems regardless of whether the code is using trio or not. this would be a problem with that library, not trio, wouldn't it?

@njsmith
Copy link
Member Author

njsmith commented Jun 20, 2020

perhaps the I/O thing that's used under the hood gets shared between the processes in a way?

Yes, exactly – I mean things like, process A calls read, and then process B – which hasn't called read – gets notified about the data arriving, and process A doesn't.

i'll be getting two finalizing foo in the output if i run it. as i understand, even if you prevent trio context and friends from being finalized, the finalizer of foo will still run twice. so i am already shooting myself in the foot here. as long as the internals of trio themselves can survive being garbage collected twice, i wouldn't expect trio to help me out here.

Running finalizers twice isn't always bad... often it's what you'd want. But in Trio we have a very specific issue:

  • Because of the IO mixup described above, the child can't keep executing whatever Trio tasks the user has running. We can't even cancel them to stop them running, because if a task did async with some_trio_related_object: ..., then the cancellation will try running the __aexit__ method, which will probably try to access Trio state, which doesn't work. So the child has to just "freeze" all the user's tasks in place.
  • The user's tasks are represented by Python objects, specifically "coroutine objects"
  • And coroutine objects have a built-in finalizer that we can't disable, and which tries to execute the task a little bit further to try to give __exit__ blocks etc. a chance to run. But that's exactly what we can't allow to happen. So we have to stop Python from running those finalizers somehow...

@smurfix
Copy link
Contributor

smurfix commented Jun 20, 2020

My solution would be to go through the process's open file descriptors and close them all on the OS level (well, except for stdin/out/err and the one we opened to communicate between parent and child). Then send some descendant of BaseException to every coroutine, repeatedly, until they all terminate. Then run the child's code.

Thus, our high-level interface could be along the lines of socket_to_child = await trio.fork(new_main, *new_args), or maybe some variant of trio.open_process, and the child process would behave exactly as if we had called trio.run(new_main, socket_to_parent, *new_args).

@njsmith
Copy link
Member Author

njsmith commented Jun 20, 2020

@smurfix if the user calls os.fork() directly, or via some independent library like multiprocessing, then we don't know which file descriptors are being used to communicate between parent and child.

The goal here isn't to add a high-level multiprocessing-competitor, just to make the low-level infrastructure robust that high-level libraries have a chance at working :-)

@ntninja
Copy link

ntninja commented Jun 24, 2020

I'm not sure I'm qualified to write here, but to me @smurfix suggestion looks reasonable if the closing of FDs is limited to FDs that trio owns at the time of os.fork(): The internal event loop and anything it either currently polls or may poll at some point later.

Something like:

  1. Register a global handler for os.register_at_fork on import that cleans up trio.run(…) contexts on fork.
  2. When looking at the run context of the current thread in the after_in_child callback:
    1. Mark all tasks, except the coroutine that called os.fork() and possible some system tasks as cancelled in that context
    2. Mark all trio-managed objects of that context as raising “BrokenPipeError” and close their underlying FDs (if applicable), including any place that trio directly or indirectly returns a:
    • trio.socket.SocketType
    • trio.open_file result
    • trio.run_process result
    • trio.lowlevel.FdStream
    • Probably more?
    1. Close any FDs related to the context's event loop and recreate the event loop from scratch
    2. Schedule all cancelled coroutines until they all exit, if they try to do I/O they should get “BrokenPipeError” per the above – how would this interact with trio.sleep() calls inside cancellation shielding? Would the shield just be ignored or would these still work?
  3. When looking at a run context of another thread in that callback:
    1. Mark all tasks of that context as cancelled
    2. Perform step 2 of the above
    3. Close the event loop for good
    4. Perform step 4
  4. Return from the callback, the task that called os.fork() should now be the only trio task left running

I realize that this does invoke the destructors that @njsmith doesn't wanna call, but I don't see this as such a big problem: fork() is weird and the semantics of fork in the presence of user-mode multitasking/coroutines aren't defined. And with all parent-shared OS-resources (that we know of) being closed before destroying the coroutines the possibility of disaster should be limited.

With all that said, it may not be worth it at all… For instance asyncio just gave up on it: https://bugs.python.org/issue21998

@pquentin
Copy link
Member

I'm not sure I'm qualified to write here

Of course you are!

@njsmith
Copy link
Member Author

njsmith commented Jun 24, 2020

I guess another option we could consider would be to raise an error from a pre-fork hook, so that libraries that try to use os.fork() will fail early? (And the error message would say something like "if you're trying to use multiprocessing, here's the workaround to avoid fork: ...")

I'm not 100% certain that pre-fork hook scan raise an exception; we'd need to check that.

@njsmith
Copy link
Member Author

njsmith commented Jun 24, 2020

Ah, turns it out that exceptions in at-fork handlers are ignored:

>>> os.register_at_fork(before=lambda: 1/0)
>>> os.fork()
Exception ignored in: <function <lambda> at 0x7f39ca5760d0>
Traceback (most recent call last):
  File "<stdin>", line 1, in <lambda>
ZeroDivisionError: division by zero
326598
>>> 0

So never mind that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants