Compileall script: add option to use multiple cores #60308
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = 'https://github.com/brettcannon' closed_at = <Date 2014-09-12.14:40:30.609> created_at = <Date 2012-10-01.20:46:54.073> labels = ['type-feature', 'library'] title = 'Compileall script: add option to use multiple cores' updated_at = <Date 2020-02-27.08:02:38.705> user = 'https://github.com/dholth'
activity = <Date 2020-02-27.08:02:38.705> actor = 'gregory.p.smith' assignee = 'brett.cannon' closed = True closed_date = <Date 2014-09-12.14:40:30.609> closer = 'brett.cannon' components = ['Library (Lib)'] creation = <Date 2012-10-01.20:46:54.073> creator = 'dholth' dependencies =  files = ['33079', '34339', '34368', '34381', '34383', '34384', '34400', '34401', '34404', '35018', '35054', '35055', '35056', '35106', '36590'] hgrepos =  issue_num = 16104 keywords = ['patch'] message_count = 28.0 messages = ['171744', '171758', '205805', '213200', '213209', '213298', '213301', '213303', '213304', '213307', '213308', '213317', '213340', '213417', '213419', '213422', '214450', '217118', '217173', '217261', '217264', '217399', '217586', '226684', '226822', '226823', '226824', '362786'] nosy_count = 7.0 nosy_names = ['brett.cannon', 'gregory.p.smith', 'eric.araujo', 'dholth', 'Claudiu.Popa', 'python-dev', 'Jim.Jewett'] pr_nums =  priority = 'low' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue16104' versions = ['Python 3.5']
The text was updated successfully, but these errors were encountered:
Here's a new version which catches ImportError for concurrent.futures and raises ValueError in
Added a new patch with improvements suggested by Jim. Thanks!
I removed the handling of processes=1, because it can still be useful: having a background worker which processes the files received from _walk_dir. Also, it checks that compile_dir receives a positive *processes* value, otherwise it raises a ValueError. As a side note, I just found that ProcessPoolExecutor / ThreadPoolExecutor don't verify the value of processes, leading to certain types of errors (see bpo-21362 for more details).
Trying to put bounds on the disagreements. Does anyone disagree with any of the following:
(1) compileall currently runs single-threaded in a single process.
(2) This enhancement intends to allow parallelization by process.
(3) Users MAY need to express whether they (require/forbid/are expressly apathetic concerning) paralellization.
(3A) There is some doubt that this even needs to be user-controlled.
(3B) If it is user-controlled, the patch proposes adding a "processes" parameter to do this.
(3C) There have been suggestions of other names (notably "workers"), but *if* it is user-controlled, the idea of a new parameter is not controversial.
(4) Users MAY need to control the degree of parallelization.
(4A) If so, setting the value of the new parameter to a positive integer > 1 is an acceptable solution.
(4B) There is not yet consensus on how to represent "Use multi-processing, with the default degree for this system.", "Do NOT use multiprocessing.", or "I don't care."
(4C) Suggested values have included 1, 0, -1, any negative number, None, and specific strings. The precise mapping between some of these and the three cases of 4B is not agreed.
(5) If multiprocessing is explicitly requested, what should happen when it is not available?
(5A) Fall back to the current way, without multi-processing.
(5B) Fall back to the current way, without multi-processing, but issue a Warning.
(5C) Raise an Exception. (ValueError, ImportError, NotImplemented?)
(6) Portions of the documentation unrelated to this should be fixed. But ideally, that would be done separately, and it will NOT be a pre-requisite to this patch.
Another potential value set
None (the default) ==> let the system parallelize as best it can -- as it does in multiprocessing. If the system picks "not in parallel at all", that is also OK, and no warning is raised.
0 ==> Do not parallelize.
positive integers ==> Use that many processes.
negative ==> ValueError
Would these uses of 0 and negative be too surprising for someone?
Updated patch according to the python-dev thread: