-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compileall script: add option to use multiple cores #60308
Comments
compileall would benefit approximately linearly from additional CPU cores. There should be an option. The noisy output would have to change. Right now it prints "compiling" and then "done" synchronously with doing the actual work. |
This should probably use concurrent.futures instead of multiprocessing directly, but yes it would be useful. Then again, the whole module should probably be rewritten to use importlib as well. |
Hello! Here's a draft patch. It adds a new *processes* parameter to *compile_dir* and a new command line parameter as well. |
Patch looks good. Some comments on Rietveld. |
Thank you for the review, Éric! Here's the updated patch. |
FTR, py_compile and compileall use importlib in 3.4. |
This looks ready to me. One thing: “make -j0” is the spelling for “run using all available cores”, whereas “compileall -j0” will use one process. I don’t know if this should be documented, changed or ignored. |
I vote for changed so that -j0 uses all available cores as os.cpu_count() states. |
I agree. I'll modify the patch. |
+ if args.processes <= 0: Could you add a test for -j0? (i.e. check that “compileall -j0” calls the function with “processes=os.cpu_count()”) |
regrtest does that, checking for j <=0. |
Here's a test for j0 == os.cpu_count. |
Importing ProcessExecutor at the top-level means compileall will crash on systems which don't have multiprocessing support. |
Here's a new patch which addresses Éric's last comments. |
Neither do I, but you will probably get an ImportError of some sort. |
Here's a new version which catches ImportError for concurrent.futures and raises ValueError in |
What can I do to move this forward? I believe all concerns have been addressed and it seems ready to me. |
Added a new version of the patch which incorporates suggestions made by Jim. Thanks for the review! |
ProcessPoolExecutor already defaults to using cpu_count if max_workers is None. Consistency with that might be useful too. (and a default of 1 to mean nothing in parallel is sensible...) |
Added a new patch with improvements suggested by Jim. Thanks! I removed the handling of processes=1, because it can still be useful: having a background worker which processes the files received from _walk_dir. Also, it checks that compile_dir receives a positive *processes* value, otherwise it raises a ValueError. As a side note, I just found that ProcessPoolExecutor / ThreadPoolExecutor don't verify the value of processes, leading to certain types of errors (see bpo-21362 for more details). |
Add new patch with fixes proposed by Berker Peksag. Thanks for the review. Hopefully, this is the last iteration of this patch. |
Trying to put bounds on the disagreements. Does anyone disagree with any of the following: (1) compileall currently runs single-threaded in a single process. (2) This enhancement intends to allow parallelization by process. (3) Users MAY need to express whether they (require/forbid/are expressly apathetic concerning) paralellization. (3A) There is some doubt that this even needs to be user-controlled. (3B) If it is user-controlled, the patch proposes adding a "processes" parameter to do this. (3C) There have been suggestions of other names (notably "workers"), but *if* it is user-controlled, the idea of a new parameter is not controversial. (4) Users MAY need to control the degree of parallelization. (4A) If so, setting the value of the new parameter to a positive integer > 1 is an acceptable solution. (4B) There is not yet consensus on how to represent "Use multi-processing, with the default degree for this system.", "Do NOT use multiprocessing.", or "I don't care." (4C) Suggested values have included 1, 0, -1, any negative number, None, and specific strings. The precise mapping between some of these and the three cases of 4B is not agreed. (5) If multiprocessing is explicitly requested, what should happen when it is not available? (5A) Fall back to the current way, without multi-processing. (5B) Fall back to the current way, without multi-processing, but issue a Warning. (5C) Raise an Exception. (ValueError, ImportError, NotImplemented?) (6) Portions of the documentation unrelated to this should be fixed. But ideally, that would be done separately, and it will NOT be a pre-requisite to this patch. --------------------------------------------------- Another potential value set None (the default) ==> let the system parallelize as best it can -- as it does in multiprocessing. If the system picks "not in parallel at all", that is also OK, and no warning is raised. 0 ==> Do not parallelize. positive integers ==> Use that many processes. negative ==> ValueError Would these uses of 0 and negative be too surprising for someone? |
Updated patch according to the python-dev thread:
|
If there is nothing left to do for this patch, can it be committed? |
New changeset 9efefcab817e by Brett Cannon in branch 'default': |
Thanks for the patch, Claudiu! |
Thank you for committing it. :-) |
This caused a regression in behavior. compileall.compile_dir()'s ddir= parameter no longer does the right thing for any subdirectories. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: