Allow registering after-fork initializers in multiprocessing #74014

pitrou · 2017-03-16T16:45:39Z

BPO	29828
Nosy	@pitrou, @njsmith, @1st1, @applio
Superseder	bpo-16500: Allow registering at-fork handlers

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-05-27.15:52:57.054>
created_at = <Date 2017-03-16.16:45:39.265>
labels = ['3.7', 'type-feature', 'library']
title = 'Allow registering after-fork initializers in multiprocessing'
updated_at = <Date 2017-05-27.15:52:57.052>
user = 'https://github.com/pitrou'

bugs.python.org fields:

activity = <Date 2017-05-27.15:52:57.052>
actor = 'pitrou'
assignee = 'none'
closed = True
closed_date = <Date 2017-05-27.15:52:57.054>
closer = 'pitrou'
components = ['Library (Lib)']
creation = <Date 2017-03-16.16:45:39.265>
creator = 'pitrou'
dependencies = []
files = []
hgrepos = []
issue_num = 29828
keywords = []
message_count = 11.0
messages = ['289721', '289726', '289728', '289730', '289731', '289733', '289735', '294134', '294237', '294238', '294598']
nosy_count = 5.0
nosy_names = ['pitrou', 'njs', 'sbt', 'yselivanov', 'davin']
pr_nums = []
priority = 'normal'
resolution = 'duplicate'
stage = 'resolved'
status = 'closed'
superseder = '16500'
type = 'enhancement'
url = 'https://bugs.python.org/issue29828'
versions = ['Python 3.7']

pitrou · 2017-03-16T16:45:39Z

Currently, multiprocessing has hard-coded logic to re-seed the Python random generator (in the random module) whenever a process is forked. This is present in two places: Popen._launch in popen_fork.py and serve_one in forkserver.py (for the "fork" and "forkserver" spawn methods, respectively).

However, other libraries would like to benefit from this mechanism. For example, Numpy has its own random number generator that would also benefit from re-seeding after fork(). Currently, this is solvable using multiprocessing.Pool which has an initializer argument. However, concurrent.futures' ProcessPool does not offer such facility; nor do other ways of launching child processes, such as (simply) instantiating a new Process object.

Therefore, I'd like to propose adding a new top-level function in multiprocessing (and also a new Context method) to register a new initializer function for use after fork(). That way, each library can add its own initializers if desired, freeing users from the burden of doing so in their applications.

1st1 · 2017-03-16T18:24:33Z

Maybe a better way would be to proceed with http://bugs.python.org/issue16500?

pitrou · 2017-03-16T19:01:01Z

That issue seems to have stalled as it seems to have focussed on low-level APIs, and also because it is proposing a new module with the API question that entails.

Another possible stance is that os.fork() should be left as-is, as a low-level primitive, and this functionality should be provided by the higher-level multiprocessing module.

applio · 2017-03-16T20:50:19Z

Having a read through bpo-16500 and bpo-6721, I worry that this could again become bogged down with similar concerns.

With the specific example of NumPy, I am not sure I would want its random number generator to be reseeded with each forked process. There are many situations where I very much need to preserve the original seed and/or current PRNG state.

I do not yet see a clear, motivating use case even after reading those two older issues. I worry that if it were added it would (almost?) never get used either because the need is rare or because developers will more often think of how this can be solved in their own target functions when they first start up. The suggestion of a top-level function and Context method make good sense to me as a place to offer such a thing but is there a clearer use case?

njsmith · 2017-03-16T20:55:05Z

I think ideally on numpy's end we would reseed iff the RNG was unseeded. Now that I think about it I'm a little surprised that we haven't had more complaints about this, so I guess it's not a super urgent issue, but that would be an improvement over the status quo, I think.

pitrou · 2017-03-16T21:00:38Z

The use case is quite clear here. The specific need to re-seed the Numpy PRNG has already come up in two different projects I work on: Numba and Dask. I wouldn't be surprised if other libraries have similar needs.

If you want a reproducible RNG sequence, you should actually use a specific, explicit seed (and possibly instantiate a dedicated random state instead of using the default one). When not using an explicit seed, people expect different random numbers regardless of whether a function is executed in one or several processes.

Note that multiprocessing *already* re-seeds the stdlib PRNG after fork, so re-seeding the Numpy PRNG is consistent with current behaviour.

About it being rarely used: the aim is not use by application developers but by library authors; e.g. Numpy itself could register the re-seeding callback, which would free users from doing it themselves. It doesn't have to be used a lot to be useful.

1st1 · 2017-03-16T21:37:45Z

BTW, why can't you use pthread_atfork in numpy?

pitrou · 2017-05-22T08:47:08Z

BTW, why can't you use pthread_atfork in numpy?

I suppose Numpy could use that, but it's much more involved than registering a Python-level callback. Also it would only work because Numpy actually implements its random generator in C.

pitrou · 2017-05-23T08:02:37Z

For those who don't follow both issues: I finally submitted a PR for http://bugs.python.org/issue16500, aka have at-fork handlers that work with all Python-issued fork() calls (including subprocess).

pitrou · 2017-05-23T08:04:04Z

The motivation for that decisions might seem a bit secondary, but I was worried that libraries who want to register a multiprocessing-based at-fork handler would always have to pay the import cost for multiprocessing. With the registration logic inside the posix module, the import cost goes away.

pitrou · 2017-05-27T15:52:57Z

Superseded by bpo-16500, which has now been resolved!

pitrou added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 16, 2017

pitrou closed this as completed May 27, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

jakirkham mentioned this issue May 17, 2022

ENH: Use PyOS_AfterFork_* for after-fork operations (like w/multiprocessing) numpy/numpy#21528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow registering after-fork initializers in multiprocessing #74014

Allow registering after-fork initializers in multiprocessing #74014

pitrou commented Mar 16, 2017

pitrou commented Mar 16, 2017

1st1 commented Mar 16, 2017

pitrou commented Mar 16, 2017

applio commented Mar 16, 2017

njsmith commented Mar 16, 2017

pitrou commented Mar 16, 2017

1st1 commented Mar 16, 2017

pitrou commented May 22, 2017

pitrou commented May 23, 2017

pitrou commented May 23, 2017

pitrou commented May 27, 2017

Allow registering after-fork initializers in multiprocessing #74014

Allow registering after-fork initializers in multiprocessing #74014

Comments

pitrou commented Mar 16, 2017

pitrou commented Mar 16, 2017

1st1 commented Mar 16, 2017

pitrou commented Mar 16, 2017

applio commented Mar 16, 2017

njsmith commented Mar 16, 2017

pitrou commented Mar 16, 2017

1st1 commented Mar 16, 2017

pitrou commented May 22, 2017

pitrou commented May 23, 2017

pitrou commented May 23, 2017

pitrou commented May 27, 2017