Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: pushing requirements #2502

Closed
ghost opened this issue Oct 12, 2017 · 6 comments
Closed

Enhancement: pushing requirements #2502

ghost opened this issue Oct 12, 2017 · 6 comments

Comments

@ghost
Copy link

ghost commented Oct 12, 2017

This is long term, but the idea is this: warehouse cannot know for sure the dependencies for a package; that's impossible with the current packaging system (which includes source distributions). But what warehouse can know is this: for a given environment (32 bit Python 2.7 on Windows or such), pip is going to download certain dependencies for a certain package. Maybe not all of the time, but at least 95+% of the time pip will download the same dependencies for the same package for a given environment.

So when the next pip comes along, warehouse should provide an API that says "hey, the last person with your environment downloaded these dependencies, so you should probably download them too." pip wouldn't rely on these dependencies for the final installation, but it would download them in case they were needed, which would be the case in practically every installation.

@brainwane
Copy link
Contributor

@xoviat Thanks for your feature suggestion, and sorry for the slow response!

For context: Warehouse's maintainers have gotten limited funding to concentrate on improving and deploying it, and have kicked off work towards our development roadmap. Right now, the most urgent task is to improve Warehouse to the point where we can redirect pypi.python.org to pypi.org so the site is more sustainable and reliable.

what warehouse can know is this: for a given environment (32 bit Python 2.7 on Windows or such), pip is going to download certain dependencies for a certain package. Maybe not all of the time, but at least 95+% of the time pip will download the same dependencies for the same package for a given environment.

I think I'm missing something here, perhaps some aspect of the distribution toolchain I am unaware of. How can pip know this? And even if pip knows it, are we sure that means Warehouse knows it?

Since this feature isn't something that the legacy site has, I've moved it to a future milestone. I think you might also be interested in looking at packaging-problems#54.

Thanks and sorry again for the wait.

@ghost
Copy link
Author

ghost commented Feb 17, 2018

I think I'm missing something here, perhaps some aspect of the distribution toolchain I am unaware of. How can pip know this? And even if pip knows it, are we sure that means Warehouse knows it?

Consider source distribution A. If downloaded in environment B, source distribution A will require source distribution B. If downloaded in environment C, source distribution A will require source distribution D. Sort of like this:

if sys.platform == 'win32':
   install_requires=['B']
else:
   install_requires=['D']

Because source distributions are turing-complete, it's not possible for warehouse to deduce these dependencies using static analysis. But what warehouse can do is allow a client (pip) to report the requirements that it found, and the current environment (where the environment is the python implementation, the python version, and the operating system). Subject to statistical analysis, warehouse can then inform the next client what the requirements will be.

In addition, in the case of wheels, it's possible to deduce dependencies from static analysis. All of this information will allow the client to download dependencies significantly faster than it would have otherwise been able to do.

@brainwane
Copy link
Contributor

@pradyunsg Take a look at this thought when you have a chance?

@pradyunsg
Copy link
Contributor

Couple of things that I think might be a problem with this:

  • Privacy Concerns
    • This is actively using environment information of some users to expose different behaviors for other users.
  • Non Cache-able
    • I'm pretty sure these would not be cache-able requests. They're basically needing processing to happen with the use of a dB which is storing the user env-info.

If either of them is true, it'll make this a no-go.


IMO - A better way to do this is to move to having a way to store dependency information of source distributions statically. Wheels do this already (and this info is not always exposed) and making source distributions expose it too, is the next step here.

At the end of the day, if we can have better install-requires data on Warehouse, pip can use that to speed up resolution of package dependencies (by skipping useless downloads) and platforms like pyup.io can use that information for their infrastructure.

(yes, this is something I suggest for the PyPA roadmap)

@KOLANICH
Copy link

KOLANICH commented May 7, 2018

Privacy Concerns
This is actively using environment information of some users to expose different behaviors for other users.

I wonder if it is possible to immediately use the information for incrementing stats counter and then immediately discard it?

Also, how about setting up a Tor hidden service and using it for downloading packages from pypa by default for mitigating privacy risks and other risks like watering hole attacks?

@dstufft
Copy link
Member

dstufft commented Aug 12, 2018

I'm going to close this issue, because due to the technical and privacy related hurdles, it's almost certainly never going to be something that we implement here. As Pradyun mentioned, the path forward for this is to achieve a higher fidelity for the information that we can glean from the package files themselves. I'd also include moving more installs away from source distributions and onto installing from Wheels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants