Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-deps] Parallelize wheel download #8771

Closed
wants to merge 9 commits into from

Conversation

McSinyx
Copy link
Contributor

@McSinyx McSinyx commented Aug 16, 2020

Example output:

$ pip --use-feature=2020-resolver --use-feature=fast-deps install scipy
WARNING: pip is using lazily downloaded wheels using HTTP range requests to obtain dependency information. This experimental feature is enabled through --use-feature=fast-deps and it is not ready for production.
Collecting scipy
  Obtaining dependency information from scipy 1.5.2
Collecting numpy>=1.14.5
  Obtaining dependency information from numpy 1.19.1
Downloading 2 files (40.3 MB)
   |████████████████████████████████| 40.3 MB 822 kB/s 
Installing collected packages: numpy, scipy
Successfully installed numpy-1.19.1 scipy-1.5.2

@McSinyx McSinyx force-pushed the parallel-dl branch 6 times, most recently from 46b364c to 83ccf72 Compare August 19, 2020 15:06
@McSinyx McSinyx force-pushed the parallel-dl branch 2 times, most recently from 2a76451 to 846ec17 Compare August 20, 2020 04:59
@McSinyx McSinyx marked this pull request as ready for review August 20, 2020 05:02
@uranusjr
Copy link
Member

Sorry for being lazy, but one question without reading the implement: Does this single progress bar implementation only apply to fast-deps, or unconditionally to all downloads?

@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 20, 2020

It only applies to fast-deps. My bad: I added the news but forgot to update the title and the example implies my config as always.

@McSinyx McSinyx changed the title Parallelize wheel download [fast-deps] Parallelize wheel download Aug 20, 2020
@McSinyx McSinyx closed this Aug 24, 2020
@McSinyx McSinyx reopened this Aug 24, 2020
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Aug 25, 2020
@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 26, 2020

@pfmoore and @uranusjr, does this step in the right direction of what we discussed in GH-8698? If so, may I have an implementation review?

@McSinyx
Copy link
Contributor Author

McSinyx commented Sep 2, 2020

cdc6c38 provides _should_hide_progress, which would be useful for solving GH-8825. Is there any friction preventing this PR to be merged in the near future, or should I split the mentioned commit to a separate PR to facilitate the process (to be fair I'm not exactly comfortable about the second idea since the commit is quite tangled to other stuff in this PR, and I've burnt myself multiple times rebasing complex cases)?

I suppose that for this to be merged, we only need to consider the following points:

  • There's no change in pip's behavior without fast-deps
  • There's no foreseeable bug this introduces to fast-deps

For convenience purposes, I've made a small write-up on the behavior of pip with this patch.

@McSinyx
Copy link
Contributor Author

McSinyx commented Sep 8, 2020

Gentil ping! Also I hesitate to mark this as something that solves GH-825: should I?

@McSinyx
Copy link
Contributor Author

McSinyx commented Oct 1, 2020

Gentle ping! Off-topic: Wow it has been 23 days from last ping, time flies like crazy since I got back to school 😞

@pradyunsg
Copy link
Member

Pong. /cc @cosmicexplorer

I guess updating this PR is a good idea. :P

@pradyunsg
Copy link
Member

solves GH-825

Nah, let it be for now -- we'll make a big-ish celebratory post once all the little things are wrapped up nicely. :)

@McSinyx
Copy link
Contributor Author

McSinyx commented Oct 2, 2020

I guess updating this PR is a good idea. :P

Hmmm the conflict was introduced 9 days ago, why didn't brown truck tell me anything? BTW do you have 2FA enabled and if so how do you authenticate pushes? I'm putting a token in .git/config now and it's as safe as storing password as plain text: I wonder how 2FA makes things any more secure.

we'll make a big-ish celebratory post once all the little things are wrapped up nicely.

Woo-hoo 😄

@pradyunsg
Copy link
Member

BTW do you have 2FA enabled and if so how do you authenticate pushes?

SSH keys - using the ssh variant of the git URLs and adding my public key to the GitHub account.

@McSinyx
Copy link
Contributor Author

McSinyx commented Oct 2, 2020

Thanks, that's much better!

@McSinyx
Copy link
Contributor Author

McSinyx commented Oct 9, 2020

Would you mind taking a look at this, @uranusjr? I'd be really happy to get this to land in 20.3!

Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely cool!!! 🔥 🔥 🔥 I am currently making others excited about this!!

@@ -125,12 +133,24 @@ def _get_http_response_filename(resp, link):
return filename


def _http_get_download(session, link):
# type: (PipSession, Link) -> Response
def _http_get_download(session, link, head=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of HEAD and GET seem so different that I would wonder if instead we would want to make this method's error handling into a @contextmanager. This is also driven by the thought that we might want to extract link.url.split('#', 1)[0] into its own named method, because it's not immediately clear what it's doing if you don't know or remember that pypi urls are structured this way.

To avoid breaking other callers, we could keep _http_get_download() undisturbed, making use of the new url splitting method and wrapping the session.get(...) call in the new contextmanager. That is just what first came to my mind when thinking of how to reuse the error-handling mechanisms in this method -- not a blocking review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the utility of factoring this away into a method though, so def _http_head(session, link):, which wraps its session.head(...) in the error-handling contextmanager could perhaps be a way to keep the _FileToDownload object clean. Just suggestions.

Copy link
Contributor Author

@McSinyx McSinyx Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

we might want to extract link.url.split('#', 1)[0] into its own named method, because it's not immediately clear what it's doing if you don't know or remember that pypi urls are structured this way.

I also think that this is a good idea—the 1/0 numbers looks really cryptic. I'll add an reference to PEP 503 to clarify it for future readers. Edit: many spots in the codebase uses this same expression. I'd prefer to do this in a follow-up since it does not exactly fit in the scope of this PR.

Regarding splitting a separate method for HEAD, however, I don't feel that it would make the callers' code any cleaner, i.e. this level of abstraction feel just right for me, although it's just personal taste. I'll reword s/head=/just_head=/ to hopefully makes it make more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding head, it won’t make the caller’s code cleaner, but is still a good idea since having a boolean flag switching between behaviours is a smell in general. it is better to extract the session.method(...) call out of this function.

Copy link
Contributor Author

@McSinyx McSinyx Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cosmicexplorer and @uranusjr, does 307c4ae make it look better to you (sorry for the delay though)?

src/pip/_internal/network/download.py Outdated Show resolved Hide resolved
BatchDownloader will need to handle the progress bars by itself.

In addition, this commit changes the minimum response size
to have a progress bar from 40000 bytes to 4 times chunk size.
By using head requests for information about the file,
we can avoid opening too many connections at the same time
and making urllib3 connection pool full.
Unfortunately HEAD responses does not carry caching information,
so all files will be assumed to be downloaded from the index.
Hence, the speed report might be inaccurate if there are cached files.
@uranusjr
Copy link
Member

uranusjr commented Oct 9, 2020

How difficult would it be to have some tests around this? pip does not currently have a lot of tests around downloading packages via HTTP (since it currently just uses requests, which is quite mature, and only in very simple ways), which this would significantly change.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Feb 4, 2021
@pradyunsg
Copy link
Member

Closing as this is significantly out of date, and has merge conflicts. Please do feel free to resubmit an updated PR for this! ^>^

@pradyunsg pradyunsg closed this Mar 5, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants