Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster pyc compilation #1422

Merged
merged 14 commits into from Feb 2, 2022
Merged

Faster pyc compilation #1422

merged 14 commits into from Feb 2, 2022

Conversation

chrisburr
Copy link
Contributor

@chrisburr chrisburr commented Jan 31, 2022

This PR included all but one of the changes I've mentioned on gitter. My C++ knowlege is lacking so I'm not sure how idiomatic this PR is, feel free to sugest major changes. I'm also happy for someone else to take over this PR if it's easier.

Fixes #1413

Performance comparision

I picked a few packages I'm familiar with to use for benchmarking here:

micromamba create --prefix /tmp/test-env --yes celery pandas tqdm sqlalchemy uproot scikit-hep jupyterlab

I've ran the command several times to make sure the caches were populated.

I've also tried various other sets of packages, installers generated using conda-constructor and running on slower filesystems (NFS, Ceph volumes, ...). All of my test cases have been significantly faster.

Lastest micromamba release from inside an exisiting environment

Slowed by calling deactivate (see #1413)

________________________________________________________
Executed in   45.73 secs    fish           external
   usr time   41.04 secs  610.00 micros   41.04 secs
   sys time    8.31 secs  193.00 micros    8.31 secs

Lastest micromamba release outside an exisiting environment

________________________________________________________
Executed in   19.87 secs    fish           external
   usr time   16.14 secs    1.67 millis   16.13 secs
   sys time    3.48 secs    0.00 millis    3.48 secs

This PR

________________________________________________________
Executed in    9.57 secs    fish           external
   usr time   11.92 secs  800.00 micros   11.92 secs
   sys time    1.81 secs    0.00 micros    1.81 secs

void compile_python_sources(std::ostream& out)
{
out << "from compileall import compile_file\n";
out << "from concurrent.futures import ProcessPoolExecutor\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was introduced in python 3.2

I am not sure we should break compatibility with older pythons, especially 2.7 might still be used (at least to recreate historical envs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely (I'm still forced to Python 2.7 myself).

I have a fallback to use -m compileall for Python < 3.5 (basically the same condition that used to be used for adding -j0 before).

Worst case scenario is that the pyc files fail to be installed and will be generated on first import (most of the time at least, permissions or read only filesystems might mess things up).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply use multiprocessing?

I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.

Copy link
Collaborator

@jonashaag jonashaag Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply use multiprocessing?

I started from the implementation in cpython which uses concurrent.futures and saw not reason not to use it.

I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.

With max_workers=None ProcessPoolExecutor uses the number of cores and takes care of the batching for sending data the subprocesses.

We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall

It already exists in cpython however it:

  • Waits for stdin to be closed before compilation starts
  • It only parallelises when recursing into directories

I'm inclined to improve upstream rather than focusing on it to much here so that -m compileall can be used for Python 3.11+ if they approve of the change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that sounds like a much better plan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI with multiprocessing you don't get the batching for free, you'll have to do it manually.

@wolfv
Copy link
Member

wolfv commented Jan 31, 2022

Super exciting! I will give this a proper review tomorrow :)

@dhirschfeld
Copy link
Contributor

Aside: plotly is a good package to test pyc compilation times as it has historically taken a long time (xref: #1206)

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

I tried running this but in verbose mode I see the following output:

      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program

And no pyc files are compiled.

Any ideas? :)

Full traceback
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "<string>", line 1, in <module>
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        _fixup_main_from_path(data['init_main_from_path'])
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        prepare(preparation_data)
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        _run_code(code, mod_globals, init_globals,
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
        main_content = runpy.run_path(main_path,
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._adjust_process_count()
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        _fixup_main_from_path(data['init_main_from_path'])
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        _run_code(code, mod_globals, init_globals,
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        return Popen(process_obj)
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._launch(process_obj)
        self._launch(process_obj)
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        prep_data = spawn.get_preparation_data(process_obj._name)
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        raise RuntimeError('''
        prep_data = spawn.get_preparation_data(process_obj._name)
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.    _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.

        raise RuntimeError('''
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 689, in submit
        raise BrokenProcessPool(self._broken)
    concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

Transaction finished

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

I modified the script slightly:

from compileall import compile_file
from concurrent.futures import ProcessPoolExecutor
import sys

def main():
    results = []
    with sys.stdin:
        with ProcessPoolExecutor(max_workers=None) as executor:
            while True:
                name = sys.stdin.readline().strip()
                if not name:
                    break
                results.append(executor.submit(compile_file, name, quiet=1))
            success = all(r.result() for r in results)
    return success

if __name__ == "__main__":
    success = main()
    sys.exit(int(not success))

And that seems to improve the situation (testing on M1 osx-arm64)

@chrisburr
Copy link
Contributor Author

Ah yes, I forgot macOS and windows not properly supporting fork (bpo-33725). Thanks for fixing it.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

@chrisburr did you have a chance to test this PR on Windows? Not sure, but it seems to hang indefinitely.

@chrisburr
Copy link
Contributor Author

I have no access to a windows machine unfortunately. I might be able to spin up a VM it’ll probably take me a while to figure out how to get an environment up and running.

Does it work if you try installing a python 2.7 environment? (i.e. triggering the fallback to using -m compileall)

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

OK, no worries, I can have a look later on the conda-forge Windows VM.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

@chrisburr I was under the impression that the python process should terminate when it receives an "empty" file (e.g. "").

However, I can't find a place where we would send such an empty string via stdin. Am I looking at it wrong?

The Windows process seems stuck waiting at the "draining" point.

@chrisburr
Copy link
Contributor Author

There must be something different about how Windows handles the end of input streams. It's supposed to terminate when this is triggered:

    void TransactionContext::wait_for_pyc_compation() {
        if (m_pyc_process) {
            std::error_code ec = m_pyc_process->close(reproc::stream::in);

Reproducing in a shell:

echo -n -e "hello\nworld\n" | python -c '
import sys
while x := sys.stdin.readline():
    print(repr(x))
print("Ended", repr(sys.stdin.readline()))
print("Ended", repr(sys.stdin.readline()))'

prints:

'hello\n'
'world\n'
Ended ''
Ended ''

I've not found any mention about this from searching, perhaps we just need to explicitly send a newline before closing.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

@chrisburr you also used options.env.behavior = reproc::env::empty; which is usually really nice. The downside is that a dynamically linked micromamba doesn't (at least on Windows) because it can't find DLLs without the proper %PATH%.

I wanted to see if that influencees the tests here.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

I did actually find this isseu on reproc which sounds similar to what we're experiencing here: DaanDeMeyer/reproc#41

You may well be right, btw. I need to debug further to see what's going on on Windows.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

Locally on Windows this seems to run fine, btw! Let's see if we get more enlightening info out of this CI run

@@ -97,6 +97,7 @@ construct(const fs::path& prefix, bool extract_conda_pkgs, bool extract_tarball)
std::string pkg_name = index["name"];

index["fn"] = entry.path().filename();
bool found_match = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also part of the other PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly there as it would have made debugging conda/constructor#487 much easier.

This can be either PR or forgotten about entirely, as you prefer.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

Looks like the removal of the environment cleaning has fixed this. I think that's only because we are linking against dll's in the tests and if micromamba dependencies are not found at runtime, the micromamba activate calls don't work.

I guess there are two workarounds:

  • Put everything necessary for activation inside the script generated by wrapped_subprocess call so that micromamba doesn't even need to be runnable for that call -- then we should be able to run with a completely cleaned env
  • keep the current behavior with non-cleaned env on Windows and only when we're in a dynamic link mode (controlled by a flag from CMake)

It looks like this is working really great. I did some checking if all files are compiled and "it looks like it" but maybe we shoudl add tests?

And is this also doign the proper bookkeeping in conda-meta/my-pkg-123.321-hash.json? Adding all compiled files there and SHA256 sums?

Actually, in the future we could put the compiled files as part of the pkgs cache folder and that would make recreating environments with the same Python version even faster (or at least less CPU hungry!).

@chrisburr
Copy link
Contributor Author

I guess there are two workarounds:

Would inheriting only inheriting the PATH variable be good enough?

It looks like this is working really great. I did some checking if all files are compiled and "it looks like it" but maybe we shoudl add tests?

Definitely, I can try to look at this. Actually I've already found a bug, non-cpython bytecode isn't handled correctly:

return directory / fs::path("__pycache__")
/ concat(py_file_stem.c_str(), ".cpython-", py_ver_nodot, ".pyc");

And is this also doign the proper bookkeeping in conda-meta/my-pkg-123.321-hash.json? Adding all compiled files there and SHA256 sums?

I don't think they're computed for these files?

$ cat ~/mambaforge/conda-meta/sphinx-4.3.1-pyh6c4a22f_0.json
...
      {
        "_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/text.cpython-39.pyc",
        "path_type": "pyc_file"
      },
      {
        "_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/xml.cpython-39.pyc",
        "path_type": "pyc_file"
      },
...

@chrisburr
Copy link
Contributor Author

Actually should these files always be added to conda-meta/my-pkg-123.321-hash.json even if compilation fails or is disabled? The reasoning being that they'll be automatically generated regardless in most cases and we want to make sure they're cleaned up.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

true, i just checked conda's behavior and you're right, they don't record the SHA256 for those files (so we don't have to, either).

Regarding recording them even if compilation doesn't succeed -- tough question. I wish there was a specification :)
I think historically we haven't recorded them if compilation fails, but I would need to double check also conda's behavior there.

@wolfv
Copy link
Member

wolfv commented Feb 1, 2022

Also, yes, I think inheriting PATH only should be good enough!

@chrisburr
Copy link
Contributor Author

Regarding recording them even if compilation doesn't succeed -- tough question. I wish there was a specification :)
I think historically we haven't recorded them if compilation fails, but I would need to double check also conda's behavior there.

I checked the code and it appears to always register them.

To check for sure, try installing for ppc64le:

$ CONDA_SUBDIR=linux-ppc64le conda create --name test python celery -v
...
pyc file failed to compile successfully (run_command failed)
python_exe_full_path: /home/cburr/miniconda3/envs/test/bin/python3.9
py_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/prompt_toolkit/layout/utils.py
pyc_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
compile rc: 255
compile stdout:
compile stderr: qemu-ppc64le-static: Could not open '/lib64/ld64.so.2': No such file or directory
...


$ ls /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
ls: cannot access '/home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc': No such file or directory


$ rg -C 3 jpcntx.cpython-39.pyc /home/cburr/miniconda3/envs/test/conda-meta/
/home/cburr/miniconda3/envs/test/conda-meta/pip-22.0.2-pyhd8ed1ab_0.json
721-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/gb2312prober.cpython-39.pyc",
722-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/hebrewprober.cpython-39.pyc",
723-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jisfreq.cpython-39.pyc",
724:    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
725-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langbulgarianmodel.cpython-39.pyc",
726-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langgreekmodel.cpython-39.pyc",
727-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langhebrewmodel.cpython-39.pyc",
--
5484-        "path_type": "pyc_file"
5485-      },
5486-      {
5487:        "_path": "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
5488-        "path_type": "pyc_file"
5489-      },
5490-      {

@maresb
Copy link
Contributor

maresb commented Feb 2, 2022

This is very exciting! I wonder to what extent this helps with the lack of speedup here jupyter/docker-stacks#1213 (comment) which have always baffled me.

@wolfv
Copy link
Member

wolfv commented Feb 2, 2022

@maresb with "regular" mamba we're using the same "linking" methods that conda would use so I think in that particular case nothing should change (for now). But maybe docker-stacks wants to switch to micromamba and then .. hopefully they'd see a speed increase :)

@maresb
Copy link
Contributor

maresb commented Feb 2, 2022

@wolfv, great points! My recent contributions to micromamba-docker and micromamba have been partially motivated by wanting a docker-stacks replacement. With 0.20.0 we squished some bugs which made bootstrapping mamba with micromamba infeasible, so indeed now may be the time for another attempt.

@jonashaag
Copy link
Collaborator

FYI python -c 'import site; print(site.getsitepackages()[0])'

@@ -180,13 +179,8 @@ namespace mamba
m_pyc_process = std::make_unique<reproc::process>();

reproc::options options;
#ifndef _WIN32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might be able to not do this with static micromamba builds. We can add a variable or something like that later.

@wolfv
Copy link
Member

wolfv commented Feb 2, 2022

Hmm, looks like the env cannot be solved on Windows because vc 9.* is needed for python 2.7. Maybe we used to get that from the defaults channel?

@wolfv
Copy link
Member

wolfv commented Feb 2, 2022

I think one last thing we might want to add here is to be able to optionally set a number of threads to be used. It does currently max out the CPU and on some systems that's not great (e.g. if they are constrained cloud workers etc).

We could use the same computation as for extract_threads

@wolfv wolfv merged commit d7cce6c into mamba-org:master Feb 2, 2022
@wolfv
Copy link
Member

wolfv commented Feb 2, 2022

Let's see how this holds up in real life! :)

@baszalmstra
Copy link
Contributor

baszalmstra commented Feb 2, 2022

Once this is released we’ll start testing it right away! Ill report back here asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Installation is notably slower when running in an active environment
6 participants