Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have a way to filter packages by date #6215

Closed
astrofrog opened this issue Jul 20, 2019 · 11 comments
Closed

Have a way to filter packages by date #6215

astrofrog opened this issue Jul 20, 2019 · 11 comments

Comments

@astrofrog
Copy link

What's the problem this feature will solve?

There is currently no way to reproduce a pip install ... command from a certain date in the past. Even if I were to run e.g. pip install astropy==2.0.11, recent versions of dependencies would get picked up. It would be great if there was a way through the PyPI API to exclude packages added after a certain date, which could then be extended to a pip flag or configuration option. This would effectively enable reproducible installs from PyPI.

Describe the solution you'd like

I've developed a small package called pypi-timemachine which implements this feature by running a tiny local proxy PyPI server that excludes packages released after a certain date. This means that it's possible to then run a pip install command that will install the full stack of dependencies as if the user was at some date in the past.

Additional context

There are two ways I can see to implement this into PyPI - either the main index URL for PyPI could be made to take various filters as GET arguments, e.g.:

https://pypi.org/simple?date-max=2013-03-02T10:30:23

or new routes could be added such as

https://pypi.org/simple/snapshot/2013-03-02T10:30:23

There might be other ways to do this that I haven't thought of though. If there is interest in this kind of functionality, I'd be happy to try and give it a go. If there is no interest, feel free to close this and I can continue to maintain pypi-timemachine as a separate project.

@jamadden
Copy link
Contributor

It would be great if there was a way through the PyPI API to exclude packages added after a certain date, which could then be extended to a pip flag or configuration option. This would effectively enable reproducible installs from PyPI.

Interesting!

Unless I'm missing something, this seems like an implicit form of pip freeze > requirements.txt and then pip install -r requirements.txt. Can you talk about why one would want to use implicit date-based "freezing" rather than explicit version-based freezing?

@astrofrog
Copy link
Author

astrofrog commented Jul 20, 2019

@jamadden - just to give a very specific use case, I had to give a demo at SciPy last week which involved a package that ended up pulling in dozens of dependencies. I also gave this demo a year ago. However, something had broken in the mean time in one of the dependencies, and I didn't have time to try and debug it to figure out which one was the issue, and I also had no easy way of knowing which versions of dependencies I was using back then. Now I know I was able to do the demo a year or so ago (and I know on what date I demoed it). I didn't run pip freeze at the time when things worked, but by using the approach described here, I was able to install the full stack that I originally had for the successful demo last year.

The approach described here can actually be combined with pip freeze - that is, what if you wanted a requirements file for a specific date in the past, but forgot to run pip freeze then?

To take this one step further, this also allows time-based bisection - if I can bisect in time to find when things started to break, I can identify which specific release of which specific package caused the issue.

(edited for clarity)

@pradyunsg
Copy link
Contributor

pradyunsg commented Jul 21, 2019

FWIW, if we want to do this, we'd do this with the installer cooperating with the index. PyPI is not going to do any filtering on its end.

This is important to maintain ease of utilizing caching.

There are 2 ways that have come up, to do this, in a private discussion elsewhere:

  • Add an optional data-upload-datetime attribute to all links.
    • This bloats all of our requests for a... fairly new "niche" usage.
    • Interaction with existing attributes?
  • Add a new optional route to the simple API for getting upload dates for artifacts.
    • More pages to be cached (not sure how that'd work on our infrastructure)

(apologies for terseness, going for a nap)

@pfmoore
Copy link
Contributor

pfmoore commented Jul 23, 2019

Actually, I'd like to say that the pypi-timemachine project sounds awesome, and seems like a good, practical example of how to solve problems like this without needing to change the existing infrastructure or tools.

I'd be interested to know in what ways using a standalone proxy like pypi-timemachine is insufficient, and explore ways of addressing those issues, rather than just building the functionality into PyPI direct.

@Amusesmile
Copy link

Does this work? I was just imagining a tool like this to fix many abandoned Colab notebooks from old research that no longer compile.

@astrojuanlu
Copy link

@Amusesmile yes, last time I tried pypi-timemachine worked

@schymans
Copy link

I think it is really a shame that this is not being integrated in pypi. I keep running into the same problem again and again: Trying to re-use an old set of jupyter notebooks for which the environment is described in a requirements.txt file, but without explicit versions of the packages, because the user did not run pip freeze at the time. I know the date of when the notebooks ran successfully (e.g. 2021-01-01), but now they don't run any more, because some python packages got deprecated or updated in a backwards-incompatible way. Instead of specifying the version of each single package in requirements.txt that was current on 2021-01-01, I would like to create an environment with the versions of packages that were up-to-date on 2021-01-01. This could be done by running pypi-timemachine for 2021-01-01 in one terminal window (pypi-timemachine 2021-01-01 --port 5000 and then executing in another terminal window e.g. pip install --index-url http://localhost:5000/ -r requirements.txt. However, this is very difficult to automate, so it would be really great if the date could be passed directly to the pip install command.

@pfmoore
Copy link
Contributor

pfmoore commented May 16, 2024

For what it's worth, uv has an --exclude-newer option that you might want to take a look at.

@pradyunsg
Copy link
Contributor

In theory, this is something that could be implemented in all installers now, since we now have an upload-time key in the simple API's JSON responses. :)

I think this can be closed from PyPI's end and should instead be a feature request on pip's end.

@schymans
Copy link

I didn't know PyPI and pip were separate. How do I make a feature request in pip?

@di
Copy link
Member

di commented May 17, 2024

https://github.com/pypa/pip/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants