-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel install with fall-back to serial install when no multiprocessing available #8215
Conversation
6d26d2e
to
cede64d
Compare
Hi Guys,
I get this single line: p.s. EDIT: |
3090c27
to
68f673a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nice to have the multiprocess
utils in a separate module cover the following:
freeze_support()
as import phase- Import guard of
Pool
- Abstracted
map_multiprocess
which handles errors like that raised duringPool
creation
@McSinyx at the end I reverted the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled the multiprocessing Pool on Windows Python 2.7, since python 2.7 is EOL anyhow, and the issue itself seems to be Windows' fork incompatibility in 2.7 only, I opted for a quick win. What do think?
I think it's OK, long as the behavior is correct for both w/ and w/o multiprocessing.
93093f9
to
48a2a8d
Compare
Heads up: This is on my "take a look at" list, but I'll have my hands full right now at least for the next few days, with pip 20.1.1. |
Hi @pradyunsg, since 21.1.1 is out, any chance you can take look :) |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
48a2a8d
to
4ddd162
Compare
4ddd162
to
7e347f5
Compare
Hi @bmartinn I see that the last few commits contain linting changes which are failing. May I suggest Running linters locally |
Thanks @deveshks ! |
@bmartinn Do the errors not show up with the |
@pradyunsg plain mypy passes, |
@bmartinn No worries, and no need to apologize. Looks like you're hitting limitations of typeshed (the source of mypy's type annotations for Python's standard library) and it's definitely possible we're hitting the issue that it's not your code but typeshed annotations that are missing. :) I'm also curious whether this points to a broader issue with our linting setup, so I'd appreciate if you could answer a couple of questions:
|
Thanks @pradyunsg , but I missed the "running linters" part, I was assuming my local flake8/mypy was enough.
Raw log :) |
260d4fe
to
47d0ccf
Compare
03fdc55
to
5e7332e
Compare
except KeyError: | ||
install_result = _single_install( | ||
install_args, req, suppress_exception=False) | ||
logger.debug('Successfully installed %s serially', req.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logs a successful install but just after, it may throw an exception: isinstance(install_result, BaseException)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbidoul when calling _single_install
with suppress_exception=False
it will not suppress the exception but raise it. This means you will not get isinstance(install_result, BaseException)
in this section, hence if we passed the _single_install
call, the installation was successful.
I did a quick test. On a my 4 cores nvme laptop, I tried installing 230 wheels with individual sizes in the few kb to 150 MB range (with --no-index and --no-deps to eliminate network and resolver effects). It took 23s with parallel-install and 35s without. This is significant (but not as much as I was hoping for if the install process is CPU bound). Now with my release manager hat on, I see this as part of the 21.1 milestone but I'm uncomfortable with merging this right now: although the feature is optional, the PR significantly changes the install process even if the case the feature is not enabled. So I feel we are too close to the release for such a big change to go in. |
The idea behind the implementation is not to change the default install process when the the feature is turned off (and as far as I can see it does not), that said I do see why it might look like it does. Basically if the feature is turned off it will just call |
@bmartinn I understand the default behaviour is meant to be the same, but the code to achieve that changes a lot and that's what makes me uncomfortable, as the RM, so close to the release. If I take the point of view of a reviewer (sorry to be late to the party), I'd say that this part of the code base always required me to squint to understand what was going on in the error handling. And the change arguably does not make it easier to reason about. I wonder if there could be some prior refactoring that would make parallel install easier to implement and read. |
Thanks @sbidoul totally understood :) |
Hi @sbidoul, as far as I understand milestone 21.1 is behind us, any reason not to merge? |
Hey @sbidoul not to be pushy but any chance to get some kind of deadline when will you have time for this? |
Hi @bmartinn, I had a look at this again, thanks for your patience. To be frank my review comment above still applies:
For instance, passing Also one thing I don't quite understand is is why there is this intermixing of parallel and sequential installation with this index array of results, and this "except KeyError" that signals that it was a sequential install. That makes the code hard to understand to me (although I may very well have missed a good reason). Would it be leaner if we'd split the requirements first then calling one function for parallel install and another one for sequential installs, and then grouping the results for display ? |
Hi @sbidoul
I think this is the major point to stress, not all requirements can be parallelized, only pre-built wheels can be safely installed in parallel as this is basically unzipping. Any local/git package install might actually run code, and hence for those we need to keep the execution order as set by the resolver. This is why we first have to install all wheels in parallel and then go over the other required packages, and install by order. There is no easy way out of it, at least as far as I can currently see... Now regrading the implementation itself, I have to admit that by now I lost context of your original comment. p.s. |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Sorry for the slow turnaround here.
I understand that. Nevertheless I suspect the code could be made easier to understand and maintain by doing something like
Also, in terms of correctness, installation may imply uninstallation. And in some cases, uninstallation may not be multiprocess-safe ? I'm thinking in particular to the uninstallation of packages that were installed with Pushing this reasoning a bit further I wonder if we should not be even stricter and says that parallel installation can only be enabled when all InstallRequirement are wheels (and are not already legacy installed). This would give us a clean code path without legacy baggage for parallel installation, and we can keep the legacy path untouched. |
I like this idea - it's a lot cleaner, and it emphasises that the legacy code paths are just that - legacy, and people shouldn't expect to see new features/improvements like this unless they move away from the legacy approaches. Encouraging people to use more modern approaches would be an extra benefit of this change, if we took that route. |
Or can be built as a wheel, I persume? In other words, it is still possible to parallelise the installation if some of the requirements are sdists or source archives, as long as they can all be (serially) built into wheels first. I’m +1 if that’s the case. |
@sbidoul actually for this specific reason any package that needs to be reinstalled (i.e. uninstall+install) will be installed serially (from the code:
@uranusjr I "think" that as part of pip's caching strategy, any source code based install (think installing directly from github for example), is first built into a wheel (before the "actual" install is called), then the temporary/cached wheel is passed to the install function. @sbidoul, can we agree on this approach? basically if all packages are wheels, install in parallel, otherwise install serially ? (with "reinstalled" packages installed serially, post the parallel process) |
Good! I had missed that bit.
I'd say if all package are wheels and not installed, install in parallel, otherwise install serially. This can later evolve to "if all packages are wheel and not legacy installed, install in parallel" |
another wrinkle to this is if any package is already installed and needs to be upgraded to another version (or |
Yep, I agree. |
Closing this out given the rebase conflicts and the lack of activity here! Thanks for filing this @bmartinn; please do feel welcome to pick this back up and file a new PR if you find the time to do so! ^>^ |
Based on the discussion in #8187 and #8169
First part of pip install parallelization, is installing the wheels (wheels only) using a multiprocessing Pool.
The reason for choosing Processes over Threads is the limiting factor here is actually CPU cycles in the installation process. Also the Wheel unpacking and copying is totally independent from one another and the rest of the process, and lastly order has no meaning in this stage as the requirements order is only important when building packages (and we are just unzipping the wheels).
Implementation details:
Run the wheel install in a multiprocessing Pool, this has x1.5 speedup factor when installing cached packages.
Packages that could not be installed (exception raised), will be installed serially once the Pool is done.
If multiprocessing.Pool is not supported by the platform, fall-back to serial installation.