Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compileall script: add option to use multiple cores #60308

Closed
dholth mannequin opened this issue Oct 1, 2012 · 28 comments
Closed

Compileall script: add option to use multiple cores #60308

dholth mannequin opened this issue Oct 1, 2012 · 28 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@dholth
Copy link
Mannequin

dholth mannequin commented Oct 1, 2012

BPO 16104
Nosy @brettcannon, @gpshead, @merwok, @dholth, @PCManticore, @JimJJewett
Files
  • compileall_v1.patch
  • issue16104.patch: Remove trailing whitespaces
  • issue16104_1.patch
  • issue16104_2.patch: Minor doc fixes.
  • issue16104_3.patch
  • issue16104_4.patch
  • issue16104_5.patch
  • issue16104_6.patch
  • issue16104_7.patch: Skip a couple of tests if concurrent.futures is unavailable
  • issue16104_8.patch
  • issue16104_9.patch
  • issue16104_10.patch
  • issue16104_11.patch: Minor doc update
  • issue16104_12.patch
  • 16104.patch: Update the patch for the tip.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2014-09-12.14:40:30.609>
    created_at = <Date 2012-10-01.20:46:54.073>
    labels = ['type-feature', 'library']
    title = 'Compileall script: add option to use multiple cores'
    updated_at = <Date 2020-02-27.08:02:38.705>
    user = 'https://github.com/dholth'

    bugs.python.org fields:

    activity = <Date 2020-02-27.08:02:38.705>
    actor = 'gregory.p.smith'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2014-09-12.14:40:30.609>
    closer = 'brett.cannon'
    components = ['Library (Lib)']
    creation = <Date 2012-10-01.20:46:54.073>
    creator = 'dholth'
    dependencies = []
    files = ['33079', '34339', '34368', '34381', '34383', '34384', '34400', '34401', '34404', '35018', '35054', '35055', '35056', '35106', '36590']
    hgrepos = []
    issue_num = 16104
    keywords = ['patch']
    message_count = 28.0
    messages = ['171744', '171758', '205805', '213200', '213209', '213298', '213301', '213303', '213304', '213307', '213308', '213317', '213340', '213417', '213419', '213422', '214450', '217118', '217173', '217261', '217264', '217399', '217586', '226684', '226822', '226823', '226824', '362786']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'gregory.p.smith', 'eric.araujo', 'dholth', 'Claudiu.Popa', 'python-dev', 'Jim.Jewett']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16104'
    versions = ['Python 3.5']

    @dholth
    Copy link
    Mannequin Author

    dholth mannequin commented Oct 1, 2012

    compileall would benefit approximately linearly from additional CPU cores. There should be an option.

    The noisy output would have to change. Right now it prints "compiling" and then "done" synchronously with doing the actual work.

    @brettcannon
    Copy link
    Member

    brettcannon commented Oct 1, 2012

    This should probably use concurrent.futures instead of multiprocessing directly, but yes it would be useful.

    Then again, the whole module should probably be rewritten to use importlib as well.

    @brettcannon brettcannon added the stdlib Python modules in the Lib dir label Oct 1, 2012
    @brettcannon brettcannon self-assigned this Mar 26, 2013
    @brettcannon brettcannon removed their assignment Apr 28, 2013
    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Dec 10, 2013

    Hello!

    Here's a draft patch. It adds a new *processes* parameter to *compile_dir* and a new command line parameter as well.

    @merwok
    Copy link
    Member

    merwok commented Mar 12, 2014

    Patch looks good. Some comments on Rietveld.

    @merwok merwok added the type-feature A feature request or enhancement label Mar 12, 2014
    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 12, 2014

    Thank you for the review, Éric! Here's the updated patch.

    @merwok
    Copy link
    Member

    merwok commented Mar 12, 2014

    FTR, py_compile and compileall use importlib in 3.4.

    @merwok
    Copy link
    Member

    merwok commented Mar 12, 2014

    This looks ready to me.

    One thing: “make -j0” is the spelling for “run using all available cores”, whereas “compileall -j0” will use one process. I don’t know if this should be documented, changed or ignored.

    @brettcannon
    Copy link
    Member

    brettcannon commented Mar 12, 2014

    I vote for changed so that -j0 uses all available cores as os.cpu_count() states.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 12, 2014

    I agree. I'll modify the patch.

    @merwok
    Copy link
    Member

    merwok commented Mar 12, 2014

    + if args.processes <= 0:
    Is that correct? For make, I think I’ve always seen “-j0”, not negative values.

    Could you add a test for -j0? (i.e. check that “compileall -j0” calls the function with “processes=os.cpu_count()”)

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 12, 2014

    regrtest does that, checking for j <=0.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 12, 2014

    Here's a test for j0 == os.cpu_count.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 13, 2014

    Importing ProcessExecutor at the top-level means compileall will crash on systems which don't have multiprocessing support.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 13, 2014

    Here's a new patch which addresses Éric's last comments.
    Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?

    @pitrou
    Copy link
    Member

    pitrou commented Mar 13, 2014

    Here's a new patch which addresses Éric's last comments.
    Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?

    Neither do I, but you will probably get an ImportError of some sort.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 13, 2014

    Here's a new version which catches ImportError for concurrent.futures and raises ValueError in compile_dir if processes was specified and concurrent.futures is unavailable. The only issue is that I don't know if this should be a ValueError or not. For instance, zipfile uses RuntimeError if lzma is unavailable.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Mar 22, 2014

    What can I do to move this forward? I believe all concerns have been addressed and it seems ready to me.

    @terryjreedy terryjreedy changed the title Use multiprocessing in compileall script Compileall script: add option to use multiple cores Apr 23, 2014
    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Apr 24, 2014

    Added a new version of the patch which incorporates suggestions made by Jim. Thanks for the review!

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 25, 2014

    ProcessPoolExecutor already defaults to using cpu_count if max_workers is None. Consistency with that might be useful too. (and a default of 1 to mean nothing in parallel is sensible...)

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Apr 27, 2014

    Added a new patch with improvements suggested by Jim. Thanks!

    I removed the handling of processes=1, because it can still be useful: having a background worker which processes the files received from _walk_dir. Also, it checks that compile_dir receives a positive *processes* value, otherwise it raises a ValueError. As a side note, I just found that ProcessPoolExecutor / ThreadPoolExecutor don't verify the value of processes, leading to certain types of errors (see bpo-21362 for more details).
    Jim, the default for processes is still None, meaning "do not use multiple process", because the purpose of ProcessPoolExecutor makes it easy for it to use processes=None=os.cpu_count(). Here we want the user to be explicit about wanting multiple processes or not.

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Apr 27, 2014

    Add new patch with fixes proposed by Berker Peksag. Thanks for the review. Hopefully, this is the last iteration of this patch.

    @jimjjewett
    Copy link
    Mannequin

    jimjjewett mannequin commented Apr 28, 2014

    Trying to put bounds on the disagreements. Does anyone disagree with any of the following:

    (1) compileall currently runs single-threaded in a single process.

    (2) This enhancement intends to allow parallelization by process.

    (3) Users MAY need to express whether they (require/forbid/are expressly apathetic concerning) paralellization.

    (3A) There is some doubt that this even needs to be user-controlled.

    (3B) If it is user-controlled, the patch proposes adding a "processes" parameter to do this.

    (3C) There have been suggestions of other names (notably "workers"), but *if* it is user-controlled, the idea of a new parameter is not controversial.

    (4) Users MAY need to control the degree of parallelization.

    (4A) If so, setting the value of the new parameter to a positive integer > 1 is an acceptable solution.

    (4B) There is not yet consensus on how to represent "Use multi-processing, with the default degree for this system.", "Do NOT use multiprocessing.", or "I don't care."

    (4C) Suggested values have included 1, 0, -1, any negative number, None, and specific strings. The precise mapping between some of these and the three cases of 4B is not agreed.

    (5) If multiprocessing is explicitly requested, what should happen when it is not available?

    (5A) Fall back to the current way, without multi-processing.

    (5B) Fall back to the current way, without multi-processing, but issue a Warning.

    (5C) Raise an Exception. (ValueError, ImportError, NotImplemented?)

    (6) Portions of the documentation unrelated to this should be fixed. But ideally, that would be done separately, and it will NOT be a pre-requisite to this patch.

    ---------------------------------------------------

    Another potential value set

    None (the default) ==> let the system parallelize as best it can -- as it does in multiprocessing. If the system picks "not in parallel at all", that is also OK, and no warning is raised.

    0 ==> Do not parallelize.

    positive integers ==> Use that many processes.

    negative ==> ValueError

    Would these uses of 0 and negative be too surprising for someone?

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Apr 30, 2014

    Updated patch according to the python-dev thread:

    • processes renamed to workers
    • workers defaults to 1
    • When workers is equal to 0, then os.cpu_count will be used
    • When workers > 1, multiple processes will be used
    • When workers == 1, run normally (no multiple processes)
    • Negative values raises a ValueError
    • Will raise NotImplementedError if multiprocessing can't be used
      (when workers equals to 0 or > 1)

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Sep 10, 2014

    If there is nothing left to do for this patch, can it be committed?

    @brettcannon brettcannon self-assigned this Sep 10, 2014
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2014

    New changeset 9efefcab817e by Brett Cannon in branch 'default':
    Issue bpo-16104: Allow compileall to do parallel bytecode compilation.
    http://hg.python.org/cpython/rev/9efefcab817e

    @brettcannon
    Copy link
    Member

    brettcannon commented Sep 12, 2014

    Thanks for the patch, Claudiu!

    @PCManticore
    Copy link
    Mannequin

    PCManticore mannequin commented Sep 12, 2014

    Thank you for committing it. :-)

    @gpshead
    Copy link
    Member

    gpshead commented Feb 27, 2020

    This caused a regression in behavior. compileall.compile_dir()'s ddir= parameter no longer does the right thing for any subdirectories.

    https://bugs.python.org/issue39769

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants