Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible solution to "stalled" Runner.run ? #29

Closed
dwinston opened this issue Jun 14, 2018 · 1 comment
Closed

Possible solution to "stalled" Runner.run ? #29

dwinston opened this issue Jun 14, 2018 · 1 comment

Comments

@dwinston
Copy link
Member

I don't know if this is still an issue / was solved, but I recall folks like @montoyjh mentioning that builder runs mysteriously stalled for a long time without seemingly doing any process_item/update_targets work. I came across the following in an email list to which I subscribe, and I thought it might be applicable to our MultiprocProcessor:

So, how is multiprocessing.Pool broken? The way it works is to start subprocesses by using the POSIX (aka Unix) fork() syscall, which clones all the memory in the process. Typically you then call some variant of execv(), which replaces the copy with a completely new process, but not multiprocessing.Pool. So, when you start a subprocess like this it’s a copy of everything in the parent Python process.

For example, any state stored in a Python module will be copied. I’ve seen child processes in the pool get really confused and try to rotate the parent process' logs, because the Python standard library logging module stores some state about log handling and that gets copied wholesale.

Also, any threads in the parent process are now missing in the forked copy. If your library starts some threads in the background your library will be broken in the subprocess (it’s obscure, but I maintain a library that actually does this). More commonly, thread locks might end up in a bad state, which is my guess for what happened in my case.

In short, the end result is a pool of processes that work some of the time—but sometimes break mysteriously. Python does support a process pool that doesn’t do this horrible fork()-only trick, but you need to know it’s there, and you need to know that the default is broken.

The key idea is to override the context's default "fork" start method. It's the default, but the docs state that "safely forking a multithreaded process is problematic." I know MultiprocProcessor starts a separate thread for updating targets, and uses locks for synchronization, so there may be something screwy happening that can be avoided e.g. by setting the start method to be "forkserver" if supported on the host OS, and "spawn" otherwise.

@montoyjh
Copy link
Contributor

At least part of the reason I was having issues is if there are unforeseen bugs in update_targets, the update_targets process seems to hang and limit the remaining processes somehow.

@shyamd shyamd closed this as completed Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants