New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster pyc compilation #1422
Faster pyc compilation #1422
Conversation
void compile_python_sources(std::ostream& out) | ||
{ | ||
out << "from compileall import compile_file\n"; | ||
out << "from concurrent.futures import ProcessPoolExecutor\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was introduced in python 3.2
I am not sure we should break compatibility with older pythons, especially 2.7 might still be used (at least to recreate historical envs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely (I'm still forced to Python 2.7 myself).
I have a fallback to use -m compileall
for Python < 3.5
(basically the same condition that used to be used for adding -j0
before).
Worst case scenario is that the pyc files fail to be installed and will be generated on first import (most of the time at least, permissions or read only filesystems might mess things up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply use multiprocessing
?
I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply use multiprocessing?
I started from the implementation in cpython which uses concurrent.futures
and saw not reason not to use it.
I wonder if we should add batching because the number of pyc files may easily grow to 100k and this will create as many jobs as pyc files.
With max_workers=None
ProcessPoolExecutor
uses the number of cores and takes care of the batching for sending data the subprocesses.
We might also want to expose this Python code and not hide it here in the C++ code so that people can use it outside the install process. We could even make it compatible to the command-line interface of compileall
It already exists in cpython however it:
- Waits for stdin to be closed before compilation starts
- It only parallelises when recursing into directories
I'm inclined to improve upstream rather than focusing on it to much here so that -m compileall
can be used for Python 3.11+ if they approve of the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds like a much better plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI with multiprocessing
you don't get the batching for free, you'll have to do it manually.
Super exciting! I will give this a proper review tomorrow :) |
Aside: |
I tried running this but in verbose mode I see the following output:
And no Any ideas? :) Full traceback
|
I modified the script slightly:
And that seems to improve the situation (testing on M1 osx-arm64) |
Ah yes, I forgot macOS and windows not properly supporting |
@chrisburr did you have a chance to test this PR on Windows? Not sure, but it seems to hang indefinitely. |
I have no access to a windows machine unfortunately. I might be able to spin up a VM it’ll probably take me a while to figure out how to get an environment up and running. Does it work if you try installing a python 2.7 environment? (i.e. triggering the fallback to using |
OK, no worries, I can have a look later on the conda-forge Windows VM. |
@chrisburr I was under the impression that the python process should terminate when it receives an "empty" file (e.g. However, I can't find a place where we would send such an empty string via stdin. Am I looking at it wrong? The Windows process seems stuck waiting at the "draining" point. |
There must be something different about how Windows handles the end of input streams. It's supposed to terminate when this is triggered:
Reproducing in a shell: echo -n -e "hello\nworld\n" | python -c '
import sys
while x := sys.stdin.readline():
print(repr(x))
print("Ended", repr(sys.stdin.readline()))
print("Ended", repr(sys.stdin.readline()))' prints:
I've not found any mention about this from searching, perhaps we just need to explicitly send a newline before closing. |
@chrisburr you also used I wanted to see if that influencees the tests here. |
I did actually find this isseu on reproc which sounds similar to what we're experiencing here: DaanDeMeyer/reproc#41 You may well be right, btw. I need to debug further to see what's going on on Windows. |
Locally on Windows this seems to run fine, btw! Let's see if we get more enlightening info out of this CI run |
@@ -97,6 +97,7 @@ construct(const fs::path& prefix, bool extract_conda_pkgs, bool extract_tarball) | |||
std::string pkg_name = index["name"]; | |||
|
|||
index["fn"] = entry.path().filename(); | |||
bool found_match = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also part of the other PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly there as it would have made debugging conda/constructor#487 much easier.
This can be either PR or forgotten about entirely, as you prefer.
Looks like the removal of the environment cleaning has fixed this. I think that's only because we are linking against I guess there are two workarounds:
It looks like this is working really great. I did some checking if all files are compiled and "it looks like it" but maybe we shoudl add tests? And is this also doign the proper bookkeeping in Actually, in the future we could put the compiled files as part of the |
Would inheriting only inheriting the
Definitely, I can try to look at this. Actually I've already found a bug, non-cpython bytecode isn't handled correctly: mamba/libmamba/src/core/link.cpp Lines 81 to 82 in 5961d76
I don't think they're computed for these files? $ cat ~/mambaforge/conda-meta/sphinx-4.3.1-pyh6c4a22f_0.json
...
{
"_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/text.cpython-39.pyc",
"path_type": "pyc_file"
},
{
"_path": "lib/python3.9/site-packages/sphinx/writers/__pycache__/xml.cpython-39.pyc",
"path_type": "pyc_file"
},
... |
Actually should these files always be added to |
true, i just checked conda's behavior and you're right, they don't record the SHA256 for those files (so we don't have to, either). Regarding recording them even if compilation doesn't succeed -- tough question. I wish there was a specification :) |
Also, yes, I think inheriting |
I checked the code and it appears to always register them. To check for sure, try installing for ppc64le: $ CONDA_SUBDIR=linux-ppc64le conda create --name test python celery -v
...
pyc file failed to compile successfully (run_command failed)
python_exe_full_path: /home/cburr/miniconda3/envs/test/bin/python3.9
py_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/prompt_toolkit/layout/utils.py
pyc_full_path: /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
compile rc: 255
compile stdout:
compile stderr: qemu-ppc64le-static: Could not open '/lib64/ld64.so.2': No such file or directory
...
$ ls /home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc
ls: cannot access '/home/cburr/miniconda3/envs/test/lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc': No such file or directory
$ rg -C 3 jpcntx.cpython-39.pyc /home/cburr/miniconda3/envs/test/conda-meta/
/home/cburr/miniconda3/envs/test/conda-meta/pip-22.0.2-pyhd8ed1ab_0.json
721- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/gb2312prober.cpython-39.pyc",
722- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/hebrewprober.cpython-39.pyc",
723- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jisfreq.cpython-39.pyc",
724: "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
725- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langbulgarianmodel.cpython-39.pyc",
726- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langgreekmodel.cpython-39.pyc",
727- "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/langhebrewmodel.cpython-39.pyc",
--
5484- "path_type": "pyc_file"
5485- },
5486- {
5487: "_path": "lib/python3.9/site-packages/pip/_vendor/chardet/__pycache__/jpcntx.cpython-39.pyc",
5488- "path_type": "pyc_file"
5489- },
5490- { |
This is very exciting! I wonder to what extent this helps with the lack of speedup here jupyter/docker-stacks#1213 (comment) which have always baffled me. |
@maresb with "regular" mamba we're using the same "linking" methods that conda would use so I think in that particular case nothing should change (for now). But maybe |
@wolfv, great points! My recent contributions to micromamba-docker and micromamba have been partially motivated by wanting a docker-stacks replacement. With 0.20.0 we squished some bugs which made bootstrapping mamba with micromamba infeasible, so indeed now may be the time for another attempt. |
FYI |
@@ -180,13 +179,8 @@ namespace mamba | |||
m_pyc_process = std::make_unique<reproc::process>(); | |||
|
|||
reproc::options options; | |||
#ifndef _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might be able to not do this with static micromamba builds. We can add a variable or something like that later.
Hmm, looks like the env cannot be solved on Windows because |
I think one last thing we might want to add here is to be able to optionally set a number of threads to be used. It does currently max out the CPU and on some systems that's not great (e.g. if they are constrained cloud workers etc). We could use the same computation as for |
Let's see how this holds up in real life! :) |
Once this is released we’ll start testing it right away! Ill report back here asap. |
This PR included all but one of the changes I've mentioned on gitter. My C++ knowlege is lacking so I'm not sure how idiomatic this PR is, feel free to sugest major changes. I'm also happy for someone else to take over this PR if it's easier.
Fixes #1413
Performance comparision
I picked a few packages I'm familiar with to use for benchmarking here:
I've ran the command several times to make sure the caches were populated.
I've also tried various other sets of packages, installers generated using conda-constructor and running on slower filesystems (NFS, Ceph volumes, ...). All of my test cases have been significantly faster.
Lastest micromamba release from inside an exisiting environment
Slowed by calling
deactivate
(see #1413)Lastest micromamba release outside an exisiting environment
This PR