Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override the number of parallel compilation processes #1857

Open
6 tasks done
JustAnotherArchivist opened this issue Mar 26, 2021 · 8 comments
Open
6 tasks done

Override the number of parallel compilation processes #1857

JustAnotherArchivist opened this issue Mar 26, 2021 · 8 comments
Labels
third-party: python the problem is in the specific Python version(s)

Comments

@JustAnotherArchivist
Copy link

JustAnotherArchivist commented Mar 26, 2021

  • Platform information (e.g. Ubuntu Linux 16.04): Debian buster
  • OS architecture (e.g. amd64): x86_64
  • pyenv version: 3086e6e
  • Python version: system 3.7.3, installing 3.9.2
  • C Compiler information (e.g. gcc 7.3): 8.3.0
  • Please attach verbose build log as gist: useless, see below

I'm trying to figure out how to set the number of parallel processes used for compilation in a pyenv install 3.9.2. The background is that I have a small VM on a bigger host machine; the VM sees all host CPU cores but only has a limited amount of RAM, and compiling with the default parallelism causes the loadavg to shoot into the dozens and the machine to virtually freeze within seconds. As a result, I cannot actually run a build to completion, and the partial build log contains no information whatsoever on this parallelisation. (Let me know if it's needed anyway.)

I tried to set MAKE_OPTS and/or PYTHON_MAKE_OPTS to -j 4, but this appears to be ignored for the actual CPython build; specifically, the ./python -E ./setup.py build process (as it shows up in htop) still spawns as many gcc processes as there are CPUs regardless of those environment variables.

Digging in CPython a bit, it looks like this parallel compilation was only introduced relatively recently with 3.8.0 (bpo-36786). The patch there hardcodes -j0 in CPython's Makefile.pre.in, and I don't see any obvious way to override that through an environment variable or similar. If this is correct, the proper fix would likely require an upstream change, but even if that lands immediately, it won't help with existing 3.8.x and 3.9.x versions, so a workaround in pyenv may still be necessary. However, I attempted to patch either Makefile.pre.in or Lib/compileall.py without success, so my understanding here may well be incorrect.

@JustAnotherArchivist
Copy link
Author

After some more digging, I figured out that this happens in the extension building step due to Python's setup.py blindly setting the parallelism to True (which later uses os.cpu_count) if -j is used, and the actual value is completely ignored. This ugly patch works to enforce MAKE_OPTS=-j4:

diff --git a/setup.py b/setup.py
--- a/setup.py	2021-02-19 12:31:44.000000000 +0000
+++ b/setup.py	2021-03-26 18:40:33.171643349 +0000
@@ -352,7 +352,7 @@
         self.missing = []
         self.disabled_configure = []
         if '-j' in os.environ.get('MAKEFLAGS', ''):
-            self.parallel = True
+            self.parallel = 4
 
     def add(self, ext):
         self.extensions.append(ext)

Upstream issue: https://bugs.python.org/issue43634

@native-api
Copy link
Member

Since this is a limitation of setup.py, this is an issue of whatever Python you're compiling, not of Pyenv.
Pyenv diligently passes MAKE_OPTS/MAKEOPTS to the make invocation it makes and has no control of what Python's Makefile does after that.

@JustAnotherArchivist
Copy link
Author

I'm aware of that, which is why I filed an issue on bpo. But pyenv already fixes a number of bugs in CPython, and I figured this may be worth such a patch as well. If I'm the only person who'd ever need this though, I guess it doesn't warrant the effort. And seeing that nobody reacted to this in the past almost 6 weeks, I suppose that may well be the case.

@native-api
Copy link
Member

@JustAnotherArchivist You can add a patch for the Python version(s) you're experiencing problems with. See the patches in Python-build on how it's done.

@native-api native-api added the third-party the problem is in third-party software label May 6, 2021
@JustAnotherArchivist
Copy link
Author

That is in fact what I did to get it to work on that system. It's a rather unusual case I guess, and my patch was very hacky and is definitely not suitable for inclusion in pyenv. A proper patch would have to parse MAKEFLAGS and then assign self.parallel appropriately – which gets pretty complicated with all the different valid forms that I believe could appear in that string (-j4, -j=4, --jobs 4, etc.). If bpo-43634 gets fixed, I might consider backporting that patch for the unfixed CPython versions, but I don't plan to spend any time on it otherwise.

@native-api native-api added third-party: python the problem is in the specific Python version(s) and removed third-party the problem is in third-party software labels May 6, 2021
@rogerengelmann03
Copy link

Just for the record, I'm having a problem in the same family regarding parallelism. I am running pyenv to install python on a very large machine (perhaps 500 CPUs). At a point in the installation, the python installation is calling "compileall.py" with the "-j0" option. That is then spawning 500+ compileall.py processes, and the machine is halting the installation with "Too many open files", presumably because 500 processes are all trying to do file manipulation. So yes, it would be nice to be able to turn off the "-j0" flag. Here is the relevant part of the installation:
PYTHONPATH=/home/roger/.pyenv/versions/3.10.7/lib/python3.10 LD_LIBRARY_PATH=/scratch/tmp/python-build.20230930061844.188012/Python-3.10.7
./python -E -Wi /home/roger/.pyenv/versions/3.10.7/lib/python3.10/compileall.py
-j0 -d /home/roger/.pyenv/versions/3.10.7/lib/python3.10 -f
-x 'bad_coding|badsyntax|site-packages|lib2to3/tests/data'
/home/roger/.pyenv/versions/3.10.7/lib/python3.10
Listing '/home/roger/.pyenv/versions/3.10.7/lib/python3.10'...
Traceback (most recent call last):
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/compileall.py", line 462, in
exit_status = int(not main())
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/compileall.py", line 439, in main
if not compile_dir(dest, maxlevels, args.ddir,
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/compileall.py", line 103, in compile_dir
results = executor.map(partial(compile_file,
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/process.py", line 761, in map
results = super().map(partial(_process_chunk, fn),
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/_base.py", line 610, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/_base.py", line 610, in
fs = [self.submit(fn, *args) for args in zip(*iterables)]
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/process.py", line 733, in submit
self._start_executor_manager_thread()
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/process.py", line 673, in _start_executor_manager_thread
self._launch_processes()
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/process.py", line 700, in _launch_processes
self._spawn_process()
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/concurrent/futures/process.py", line 709, in _spawn_process
p.start()
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
return Popen(process_obj)
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/home/roger/.pyenv/versions/3.10.7/lib/python3.10/multiprocessing/popen_fork.py", line 65, in _launch
child_r, parent_w = os.pipe()
OSError: [Errno 24] Too many open files

Thank you,
--- Roger

@native-api
Copy link
Member

As specified earlier, we'll accept a patch from any interested parties. There are only a few ways to pass -j, it's not prohibitive to recognize them all, e.g. with a regex.

@JustAnotherArchivist
Copy link
Author

FWIW, the issue got fixed upstream in CPython 3.12 since extensions get built via a Makefile now. Backporting that sounds painful though.

@native-api native-api reopened this Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third-party: python the problem is in the specific Python version(s)
Projects
None yet
Development

No branches or pull requests

3 participants