Possible solution to "stalled" Runner.run ? #29

dwinston · 2018-06-14T09:38:56Z

I don't know if this is still an issue / was solved, but I recall folks like @montoyjh mentioning that builder runs mysteriously stalled for a long time without seemingly doing any process_item/update_targets work. I came across the following in an email list to which I subscribe, and I thought it might be applicable to our MultiprocProcessor:

So, how is multiprocessing.Pool broken? The way it works is to start subprocesses by using the POSIX (aka Unix) fork() syscall, which clones all the memory in the process. Typically you then call some variant of execv(), which replaces the copy with a completely new process, but not multiprocessing.Pool. So, when you start a subprocess like this it’s a copy of everything in the parent Python process.

For example, any state stored in a Python module will be copied. I’ve seen child processes in the pool get really confused and try to rotate the parent process' logs, because the Python standard library logging module stores some state about log handling and that gets copied wholesale.

Also, any threads in the parent process are now missing in the forked copy. If your library starts some threads in the background your library will be broken in the subprocess (it’s obscure, but I maintain a library that actually does this). More commonly, thread locks might end up in a bad state, which is my guess for what happened in my case.

In short, the end result is a pool of processes that work some of the time—but sometimes break mysteriously. Python does support a process pool that doesn’t do this horrible fork()-only trick, but you need to know it’s there, and you need to know that the default is broken.

The key idea is to override the context's default "fork" start method. It's the default, but the docs state that "safely forking a multithreaded process is problematic." I know MultiprocProcessor starts a separate thread for updating targets, and uses locks for synchronization, so there may be something screwy happening that can be avoided e.g. by setting the start method to be "forkserver" if supported on the host OS, and "spawn" otherwise.

The text was updated successfully, but these errors were encountered:

montoyjh · 2018-06-14T19:21:33Z

At least part of the reason I was having issues is if there are unforeseen bugs in update_targets, the update_targets process seems to hang and limit the remaining processes somehow.

shyamd closed this as completed Aug 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible solution to "stalled" Runner.run ? #29

Possible solution to "stalled" Runner.run ? #29

dwinston commented Jun 14, 2018

montoyjh commented Jun 14, 2018

Possible solution to "stalled" Runner.run ? #29

Possible solution to "stalled" Runner.run ? #29

Comments

dwinston commented Jun 14, 2018

montoyjh commented Jun 14, 2018