Faster pyc compilation #1422

chrisburr · 2022-01-31T16:52:11Z

This PR included all but one of the changes I've mentioned on gitter. My C++ knowlege is lacking so I'm not sure how idiomatic this PR is, feel free to sugest major changes. I'm also happy for someone else to take over this PR if it's easier.

Fixes #1413

Performance comparision

I picked a few packages I'm familiar with to use for benchmarking here:

micromamba create --prefix /tmp/test-env --yes celery pandas tqdm sqlalchemy uproot scikit-hep jupyterlab

I've ran the command several times to make sure the caches were populated.

I've also tried various other sets of packages, installers generated using conda-constructor and running on slower filesystems (NFS, Ceph volumes, ...). All of my test cases have been significantly faster.

Lastest micromamba release from inside an exisiting environment

Slowed by calling deactivate (see #1413)

________________________________________________________
Executed in   45.73 secs    fish           external
   usr time   41.04 secs  610.00 micros   41.04 secs
   sys time    8.31 secs  193.00 micros    8.31 secs

Lastest micromamba release outside an exisiting environment

________________________________________________________
Executed in   19.87 secs    fish           external
   usr time   16.14 secs    1.67 millis   16.13 secs
   sys time    3.48 secs    0.00 millis    3.48 secs

This PR

________________________________________________________
Executed in    9.57 secs    fish           external
   usr time   11.92 secs  800.00 micros   11.92 secs
   sys time    1.81 secs    0.00 micros    1.81 secs

wolfv · 2022-01-31T19:38:01Z

libmamba/src/core/transaction_context.cpp

+    void compile_python_sources(std::ostream& out)
+    {
+        out << "from compileall import compile_file\n";
+        out << "from concurrent.futures import ProcessPoolExecutor\n";


Hmm, this was introduced in python 3.2

I am not sure we should break compatibility with older pythons, especially 2.7 might still be used (at least to recreate historical envs).

Definitely (I'm still forced to Python 2.7 myself).

I have a fallback to use -m compileall for Python < 3.5 (basically the same condition that used to be used for adding -j0 before).

Worst case scenario is that the pyc files fail to be installed and will be generated on first import (most of the time at least, permissions or read only filesystems might mess things up).

Why not simply use multiprocessing?

I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.

We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall

Why not simply use multiprocessing?

I started from the implementation in cpython which uses concurrent.futures and saw not reason not to use it.

I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.

With max_workers=None ProcessPoolExecutor uses the number of cores and takes care of the batching for sending data the subprocesses.

We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall

It already exists in cpython however it:

Waits for stdin to be closed before compilation starts

It only parallelises when recursing into directories

I'm inclined to improve upstream rather than focusing on it to much here so that -m compileall can be used for Python 3.11+ if they approve of the change.

Yes that sounds like a much better plan

FYI with multiprocessing you don't get the batching for free, you'll have to do it manually.

wolfv · 2022-01-31T19:38:35Z

Super exciting! I will give this a proper review tomorrow :)

libmamba/src/core/package_handling.cpp

dhirschfeld · 2022-02-01T00:29:02Z

Aside: plotly is a good package to test pyc compilation times as it has historically taken a long time (xref: #1206)

wolfv · 2022-02-01T09:04:49Z

I tried running this but in verbose mode I see the following output:

      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program

And no pyc files are compiled.

Any ideas? :)

Full traceback

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "<string>", line 1, in <module>
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        _fixup_main_from_path(data['init_main_from_path'])
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        prepare(preparation_data)
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        _run_code(code, mod_globals, init_globals,
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
        main_content = runpy.run_path(main_path,
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._adjust_process_count()
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        _fixup_main_from_path(data['init_main_from_path'])
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        _run_code(code, mod_globals, init_globals,
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        return Popen(process_obj)
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        self._launch(process_obj)
        self._launch(process_obj)
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        prep_data = spawn.get_preparation_data(process_obj._name)
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        raise RuntimeError('''
        prep_data = spawn.get_preparation_data(process_obj._name)
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.    _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.

        raise RuntimeError('''
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
        exitcode = _main(fd, parent_sentinel)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
        prepare(preparation_data)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
        _fixup_main_from_path(data['init_main_from_path'])
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
        main_content = runpy.run_path(main_path,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 269, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 96, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 705, in submit
        self._adjust_process_count()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 683, in _adjust_process_count
        p.start()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/process.py", line 121, in start
        self._popen = self._Popen(self)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/context.py", line 284, in _Popen
        return Popen(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
        super().__init__(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
        self._launch(process_obj)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
        prep_data = spawn.get_preparation_data(process_obj._name)
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
        _check_not_importing_main()
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
        raise RuntimeError('''
    RuntimeError:
            An attempt has been made to start a new process before the
            current process has finished its bootstrapping phase.

            This probably means that you are not using fork to start your
            child processes and you have forgotten to use the proper idiom
            in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce an executable.
    Traceback (most recent call last):
      File "/var/folders/4x/plfnxvhs0rg43pttd200crxr0000gn/T/mambafT6MLqfEJA2", line 11, in <module>
        results.append(executor.submit(compile_file, name, quiet=1))
      File "/Users/wolfvollprecht/micromamba/envs/testpyc/lib/python3.10/concurrent/futures/process.py", line 689, in submit
        raise BrokenProcessPool(self._broken)
    concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

Transaction finished

wolfv · 2022-02-01T09:23:02Z

I modified the script slightly:

from compileall import compile_file
from concurrent.futures import ProcessPoolExecutor
import sys

def main():
    results = []
    with sys.stdin:
        with ProcessPoolExecutor(max_workers=None) as executor:
            while True:
                name = sys.stdin.readline().strip()
                if not name:
                    break
                results.append(executor.submit(compile_file, name, quiet=1))
            success = all(r.result() for r in results)
    return success

if __name__ == "__main__":
    success = main()
    sys.exit(int(not success))

And that seems to improve the situation (testing on M1 osx-arm64)

chrisburr · 2022-02-01T09:58:24Z

Ah yes, I forgot macOS and windows not properly supporting fork (bpo-33725). Thanks for fixing it.

wolfv · 2022-02-01T10:24:34Z

@chrisburr did you have a chance to test this PR on Windows? Not sure, but it seems to hang indefinitely.

chrisburr · 2022-02-01T10:45:11Z

I have no access to a windows machine unfortunately. I might be able to spin up a VM it’ll probably take me a while to figure out how to get an environment up and running.

Does it work if you try installing a python 2.7 environment? (i.e. triggering the fallback to using -m compileall)

wolfv · 2022-02-01T10:47:41Z

OK, no worries, I can have a look later on the conda-forge Windows VM.

wolfv · 2022-02-01T14:47:47Z

@chrisburr I was under the impression that the python process should terminate when it receives an "empty" file (e.g. "").

However, I can't find a place where we would send such an empty string via stdin. Am I looking at it wrong?

The Windows process seems stuck waiting at the "draining" point.

chrisburr · 2022-02-01T15:13:39Z

There must be something different about how Windows handles the end of input streams. It's supposed to terminate when this is triggered:

    void TransactionContext::wait_for_pyc_compation() {
        if (m_pyc_process) {
            std::error_code ec = m_pyc_process->close(reproc::stream::in);

Reproducing in a shell:

echo -n -e "hello\nworld\n" | python -c '
import sys
while x := sys.stdin.readline():
    print(repr(x))
print("Ended", repr(sys.stdin.readline()))
print("Ended", repr(sys.stdin.readline()))'

prints:

'hello\n'
'world\n'
Ended ''
Ended ''

I've not found any mention about this from searching, perhaps we just need to explicitly send a newline before closing.

wolfv · 2022-02-01T15:25:06Z

@chrisburr you also used options.env.behavior = reproc::env::empty; which is usually really nice. The downside is that a dynamically linked micromamba doesn't (at least on Windows) because it can't find DLLs without the proper %PATH%.

I wanted to see if that influencees the tests here.

wolfv · 2022-02-01T15:28:58Z

I did actually find this isseu on reproc which sounds similar to what we're experiencing here: DaanDeMeyer/reproc#41

You may well be right, btw. I need to debug further to see what's going on on Windows.

wolfv · 2022-02-01T15:38:33Z

Locally on Windows this seems to run fine, btw! Let's see if we get more enlightening info out of this CI run

wolfv · 2022-02-01T16:08:41Z

micromamba/src/constructor.cpp

@@ -97,6 +97,7 @@ construct(const fs::path& prefix, bool extract_conda_pkgs, bool extract_tarball)
                std::string pkg_name = index["name"];

                index["fn"] = entry.path().filename();
+                bool found_match = false;


Is this also part of the other PR?

It's mostly there as it would have made debugging conda/constructor#487 much easier.

This can be either PR or forgotten about entirely, as you prefer.

wolfv · 2022-02-01T16:14:14Z

Looks like the removal of the environment cleaning has fixed this. I think that's only because we are linking against dll's in the tests and if micromamba dependencies are not found at runtime, the micromamba activate calls don't work.

I guess there are two workarounds:

Put everything necessary for activation inside the script generated by wrapped_subprocess call so that micromamba doesn't even need to be runnable for that call -- then we should be able to run with a completely cleaned env
keep the current behavior with non-cleaned env on Windows and only when we're in a dynamic link mode (controlled by a flag from CMake)

It looks like this is working really great. I did some checking if all files are compiled and "it looks like it" but maybe we shoudl add tests?

And is this also doign the proper bookkeeping in conda-meta/my-pkg-123.321-hash.json? Adding all compiled files there and SHA256 sums?

Actually, in the future we could put the compiled files as part of the pkgs cache folder and that would make recreating environments with the same Python version even faster (or at least less CPU hungry!).

chrisburr · 2022-02-01T18:13:32Z

I guess there are two workarounds:

Would inheriting only inheriting the PATH variable be good enough?

It looks like this is working really great. I did some checking if all files are compiled and "it looks like it" but maybe we shoudl add tests?

Definitely, I can try to look at this. Actually I've already found a bug, non-cpython bytecode isn't handled correctly:

mamba/libmamba/src/core/link.cpp

Lines 81 to 82 in 5961d76

    
           return directory / fs::path("__pycache__") 
        
                  / concat(py_file_stem.c_str(), ".cpython-", py_ver_nodot, ".pyc");

And is this also doign the proper bookkeeping in conda-meta/my-pkg-123.321-hash.json? Adding all compiled files there and SHA256 sums?

I don't think they're computed for these files?

$ cat ~/mambaforge/conda-meta/sphinx-4.3.1-pyh6c4a22f_0.json
...
      {
        "_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/text.cpython-39.pyc",
        "path_type": "pyc_file"
      },
      {
        "_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/xml.cpython-39.pyc",
        "path_type": "pyc_file"
      },
...

chrisburr · 2022-02-01T18:18:03Z

Actually should these files always be added to conda-meta/my-pkg-123.321-hash.json even if compilation fails or is disabled? The reasoning being that they'll be automatically generated regardless in most cases and we want to make sure they're cleaned up.

wolfv · 2022-02-01T18:43:52Z

true, i just checked conda's behavior and you're right, they don't record the SHA256 for those files (so we don't have to, either).

Regarding recording them even if compilation doesn't succeed -- tough question. I wish there was a specification :)
I think historically we haven't recorded them if compilation fails, but I would need to double check also conda's behavior there.

wolfv · 2022-02-01T18:44:14Z

Also, yes, I think inheriting PATH only should be good enough!

chrisburr · 2022-02-01T19:20:07Z

Regarding recording them even if compilation doesn't succeed -- tough question. I wish there was a specification :)
I think historically we haven't recorded them if compilation fails, but I would need to double check also conda's behavior there.

I checked the code and it appears to always register them.

To check for sure, try installing for ppc64le:

$ CONDA_SUBDIR=linux-ppc64le conda create --name test python celery -v
...
pyc file failed to compile successfully (run_command failed)
python_exe_full_path: /home/cburr/miniconda3/envs/test/bin/python3.9
py_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/prompt_toolkit/layout/utils.py
pyc_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
compile rc: 255
compile stdout:
compile stderr: qemu-ppc64le-static: Could not open '/lib64/ld64.so.2': No such file or directory
...


$ ls /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
ls: cannot access '/home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc': No such file or directory


$ rg -C 3 jpcntx.cpython-39.pyc /home/cburr/miniconda3/envs/test/conda-meta/
/home/cburr/miniconda3/envs/test/conda-meta/pip-22.0.2-pyhd8ed1ab_0.json
721-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/gb2312prober.cpython-39.pyc",
722-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/hebrewprober.cpython-39.pyc",
723-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jisfreq.cpython-39.pyc",
724:    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
725-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langbulgarianmodel.cpython-39.pyc",
726-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langgreekmodel.cpython-39.pyc",
727-    "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langhebrewmodel.cpython-39.pyc",
--
5484-        "path_type": "pyc_file"
5485-      },
5486-      {
5487:        "_path": "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
5488-        "path_type": "pyc_file"
5489-      },
5490-      {

maresb · 2022-02-02T07:42:56Z

This is very exciting! I wonder to what extent this helps with the lack of speedup here jupyter/docker-stacks#1213 (comment) which have always baffled me.

wolfv · 2022-02-02T07:48:04Z

@maresb with "regular" mamba we're using the same "linking" methods that conda would use so I think in that particular case nothing should change (for now). But maybe docker-stacks wants to switch to micromamba and then .. hopefully they'd see a speed increase :)

maresb · 2022-02-02T07:56:38Z

@wolfv, great points! My recent contributions to micromamba-docker and micromamba have been partially motivated by wanting a docker-stacks replacement. With 0.20.0 we squished some bugs which made bootstrapping mamba with micromamba infeasible, so indeed now may be the time for another attempt.

jonashaag · 2022-02-02T10:36:58Z

FYI python -c 'import site; print(site.getsitepackages()[0])'

wolfv · 2022-02-02T10:54:42Z

libmamba/src/core/transaction_context.cpp

@@ -180,13 +179,8 @@ namespace mamba
        m_pyc_process = std::make_unique<reproc::process>();

        reproc::options options;
+#ifndef _WIN32


we might be able to not do this with static micromamba builds. We can add a variable or something like that later.

wolfv · 2022-02-02T11:37:21Z

Hmm, looks like the env cannot be solved on Windows because vc 9.* is needed for python 2.7. Maybe we used to get that from the defaults channel?

wolfv · 2022-02-02T13:52:51Z

I think one last thing we might want to add here is to be able to optionally set a number of threads to be used. It does currently max out the CPU and on some systems that's not great (e.g. if they are constrained cloud workers etc).

We could use the same computation as for extract_threads

wolfv · 2022-02-02T17:46:54Z

Let's see how this holds up in real life! :)

baszalmstra · 2022-02-02T17:48:06Z

Once this is released we’ll start testing it right away! Ill report back here asap.

wolfv reviewed Jan 31, 2022

View reviewed changes

jonashaag reviewed Jan 31, 2022

View reviewed changes

libmamba/src/core/package_handling.cpp Outdated Show resolved Hide resolved

chrisburr mentioned this pull request Feb 1, 2022

Parallel extraction for constructor #1427

Open

wolfv force-pushed the faster-pyc branch from 054f9a4 to e307596 Compare February 1, 2022 15:23

wolfv reviewed Feb 1, 2022

View reviewed changes

chrisburr force-pushed the faster-pyc branch from 0923019 to 99bf7fa Compare February 2, 2022 07:16

chrisburr force-pushed the faster-pyc branch from be391a6 to 1e53a49 Compare February 2, 2022 07:45

chrisburr force-pushed the faster-pyc branch from 1e53a49 to 12a3126 Compare February 2, 2022 08:04

chrisburr force-pushed the faster-pyc branch from 12a3126 to 6408916 Compare February 2, 2022 08:35

chrisburr and others added 10 commits February 2, 2022 09:43

Only share environment with subprocesses if needed

61c5de7

Show a warning if repodata_record.json generation fails in constructor

336ca3a

Optimise compilation of python sources for "noarch: python"

56ad68e

use __main__ function in compile pyc.py

f58c60b

use correct file name for include

f90fe57

correct typo, do not use empty env when compiling

73271a0

Add test of pyc compilation

63d9fa9

Always write pyc files to conda-meta/*.json

389261d

Disable testing pypy for test_pyc_compilation as it's expected to fail

92d1eb3

Clean environment for compiling pyc files

eaa4112

chrisburr force-pushed the faster-pyc branch from 6408916 to eaa4112 Compare February 2, 2022 08:44

Fix test_pyc_compilation for Windows

b4d8e56

Keep the host environment for pyc compilation on Windows

59eb620

wolfv reviewed Feb 2, 2022

View reviewed changes

Python 2.7 on Windows requires -c default

43b3d21

Respect MAMBA_EXTRACT_THREADS when compiling pyc sources

ea5c0ef

wolfv merged commit d7cce6c into mamba-org:master Feb 2, 2022

chrisburr deleted the faster-pyc branch February 2, 2022 18:11

austin1howard mentioned this pull request Feb 10, 2022

[micromamba] 0.21.0 breaks PIP_ config env variables #1468

Closed

Faster pyc compilation #1422

Faster pyc compilation #1422

Conversation

chrisburr commented Jan 31, 2022 • edited

Performance comparision

Lastest micromamba release from inside an exisiting environment

Lastest micromamba release outside an exisiting environment

This PR

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonashaag Jan 31, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wolfv commented Jan 31, 2022

dhirschfeld commented Feb 1, 2022

wolfv commented Feb 1, 2022 • edited

wolfv commented Feb 1, 2022

chrisburr commented Feb 1, 2022

wolfv commented Feb 1, 2022

chrisburr commented Feb 1, 2022

wolfv commented Feb 1, 2022

wolfv commented Feb 1, 2022

chrisburr commented Feb 1, 2022

wolfv commented Feb 1, 2022

wolfv commented Feb 1, 2022

wolfv commented Feb 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wolfv commented Feb 1, 2022

chrisburr commented Feb 1, 2022

chrisburr commented Feb 1, 2022

wolfv commented Feb 1, 2022

wolfv commented Feb 1, 2022

chrisburr commented Feb 1, 2022

maresb commented Feb 2, 2022

wolfv commented Feb 2, 2022

maresb commented Feb 2, 2022

jonashaag commented Feb 2, 2022

Choose a reason for hiding this comment

wolfv commented Feb 2, 2022

wolfv commented Feb 2, 2022

wolfv commented Feb 2, 2022

baszalmstra commented Feb 2, 2022 • edited

chrisburr commented Jan 31, 2022 •

edited

jonashaag Jan 31, 2022 •

edited

wolfv commented Feb 1, 2022 •

edited

baszalmstra commented Feb 2, 2022 •

edited