Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added upgrade-all command #10491

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

JensTimmerman
Copy link

fixes #4551

I'm a new contributor to pip, so any hints on what kinds of tests to add for this command?

If this is more the direction you would like to go in I'll spend some time to fixing the #TODO's and refactor the code a little bit so a lot less copy pasting is going on between install command and update-all command.

@DiddiLeija
Copy link
Member

I just have a question: Is it update-all or upgrade-all?

@JensTimmerman JensTimmerman changed the title added update-all command added upgrade-all command Sep 20, 2021
@JensTimmerman
Copy link
Author

@DiddiLeija I had an error in my commit message indeed, thx for pointing this out. It is upgrade-all but this is trivial to change (or make it work for all)

@pfmoore
Copy link
Member

pfmoore commented Sep 20, 2021

This probably needs more comprehensive documentation. In particular "does exactly what this Unix one-liner does" is not going to be helpful for a lot of people (Windows users, people unused to command line tools, etc). In fact, I am familiar with command line tools, and I know that xargs combines arguments "up to some system defined limit" - so it's not necessarily equivalent to upgrading everything in one invocation (some people have environments with really huge numbers of packages installed...) I'd rather see the docs explain the behaviour clearly without assuming a background in CLI tools.

Regarding tests, start with some basic ones, like installing an old version of one package, then checking that upgrade-all upgrades it. Then try 2 packages A 1.0 and B 1.0, where A 2.0 and B 2.0 are on the index, and A (both versions) depends on B 1.0, check that upgrade-all upgrades A but not B. Do a few of these "check that the things you'd expect to work, do" tests.

Then add some tests for nasty things that might happen to make sure that the code doesn't break. For example, you have A 1.0 and B 1.0 installed. On the index, though, there's no B 1.0 (let's say it was installed from a local directory). Does upgrade-all correctly leave B unchanged? What if the installed packages fail pip check? Suppose A 1.0 is installed and depends on B 1.0, but B 2.0 is installed? What does upgrade-all do? What should it do? (Probably raise an error saying that the original environment is broken). Suppose A 1.0 is installed and it depends on B, but there's no B installed and no B on the index - same question, what should upgrade-all do (and does it do it)?

IMO this is a pretty complex feature, and making sure we have good test coverage of the behaviour will be very important. TBH, I'm not actually sure anyone has even thought about a lot of questions like this, generally people say "we want upgrade-all" thinking only of the very basic case, and they leave it to whoever develops the feature to catch all the weird edge cases 🙁

@pfmoore
Copy link
Member

pfmoore commented Sep 20, 2021

I'm confused about the difference between this and #10040.

Can you start with an overview of what the approach is here, how it addresses the issues discussed on #10040, and summarise all of the discussion in one place? As things stand, there's loads of context scattered across various issues and PRs.

@JensTimmerman
Copy link
Author

JensTimmerman commented Sep 20, 2021

I’ll try to address some of the issues

Instead of trying to mimic the carefully crafted command line invocation and adding a dependency between two pip commands, we should use lower level components of pip to implement this instead.

the issue is that the underlying lower level components all rely on having the options object with all options available.
So I made these available in the cli to allow got maximum flexibility. Another approach would be to hard code them so only one very defined action is possible using this new command instead of a lot of different code paths.
I prefer the flexibility since I can see people asking for extra options 3 hours after the feature is released ;)

But this approach ada the options explicitly and parses them. Whereas the previous pr would just magically start implementing all options of the install and list command.

Specifically, we'd need to determine which packages were explicitly installed by the user (i.e. aren't marked with PEP 376's REQUIRED file)

is there an easy list command for this?
should I add this list command first ?

make the dependency resolver ignore already installed packages in the environment

Does it not do this yet if we give it all packages in the environment?

add the ability to uninstall dependencies that are no longer required in the environment.

Should this happen by default? Can this be a separate “auto clean” command ? ( like apt autoremove is seperate from apt upgrade)

we'd want to backtrack in case of incompatible dependencies and only upgrade to the most recent version that's compatible with the entire package set.

I figured the resolver would take care of this?

But at least upgrade-all should provide a mechanism for the user to say something like “upgrade everything to latest possible but also respect a[z]”.

what happens now if you run ‘pip install a -U’
Does this not perform the wanted action?
how does this differ from ‘pip install -U a[z]’
In which case, should I fix this first ?

@JensTimmerman
Copy link
Author

JensTimmerman commented Sep 20, 2021

This probably needs more comprehensive documentation.

Yes, it does, but I wanted some feedback on the approach first.

The current approach boils down to:
Pip list
Pip install -U “output of pip list”

where all options of pip install that are needed by the lower level functions will keep working.

@pfmoore
Copy link
Member

pfmoore commented Sep 20, 2021

I wanted some feedback on the approach first.

I thought a lot of this had been covered in previous discussions, but part of the problem I have is that I'm not entirely clear we've even pinned down what people want when they say they want upgrade-all. This feature has been stuck for a long time because at a bare minimum we needed the new resolver, which ensures (hopefully!) that the upgraded environment will not contain conflicts. But I don't think that was the only thing that was blocking it.

In reality, when I say "I want to upgrade everything" what I most likely mean is "I want to upgrade all of the top-level packages that I installed, and I want a consistent set of dependencies for them that's as up to date as possible". In 99% of cases, this is identical to "grab everything that's installed, and upgrade them all" - but there are cases where the two differ. For example, the em-keyboard package recently switched from depending on xerox to depending on pyperclip. Both provide the same functionality (clipboard support). If I upgrade an environment that contains em-keyboard, then I don't want xerox to be upgraded and pyperclip added as well. What I want is for em-keyboard to be upgraded, xerox removed, and pyperclip added. Unless, of course, something else in the environment still depends on xerox... Or I manually installed xerox because I use it for one of my scripts that I run by hand.

It's very hard to discuss the upgrade-all command without coming back to the need to categorise installed packages as "user requested" vs "installed as a dependency". And that's a distinction that's near-impossible to make accurately as environments get modified over time. Arguably, having a manually maintained requirements.txt and regularly running pip install -U -r requirements.txt or even deleting and rebuilding the environment from the requirements file, is a more reliable way of keeping an environment up to date.

Which is why I'm saying we need to clearly explain what the upgrade-all command is intended to do. I need to be able to read the docs, and understand that it won't do the right thing in the em-keyboard case above - it's no good if I do the upgrade and only afterwards realise that it didn't work the way I wanted. Maybe it's OK that we don't support cases like this - saying that we won't remove obsolete requirements if a package that was a dependency is no longer required by anything else. But I don't honestly know, and it's that design-level question that I feel remains unanswered. And frankly, it's possible that we'll find that there's simply no sufficiently good overlap between "what people want" and "what we can implement" - in which case, maybe we should not be implementing this feature at all.

@JensTimmerman
Copy link
Author

JensTimmerman commented Sep 20, 2021

How did pip install -U em-keyboard handle this case?

This implementation will follow the same semantics.
I am not arguing that this is not a problem to solve. But if I follow the current semantics the current documentation should apply right?

If the current implementation of pip install -U is not behaving as wanted, I can make a pr to fix it if someone gives a few good example cases in how it should behave.

Will this case of removing “hanging packages”be solved by adding an auto-remove command?

Personally I am against deleting packages without confirmation/ a second command. What if I had em-keyboard installed and wrote a script that uses xerox. I never had the need to manually install xerox because it was available on my system. I would not expect the xerox package to go away because I updated em-keyboard, unless I explicitly asked for a deletion.

But indeed, I do plan to document with a good number of examples how this behaves once the implementation is agreed upon.

@pfmoore
Copy link
Member

pfmoore commented Sep 20, 2021

How did pip install -U em-keyboard handle this case?

Depends on the upgrade strategy. (The example isn't something I actually did, I had em-keyboard installed via pipx which does its own thing - upgrade left xerox around, reinstall rebuilt the lot). But pip install -U doesn't uninstall stuff. I'm simply arguing here that people might want different semantics from a "upgrade all" command. Or maybe I'm arguing that if the semantics are identical, upgrade-all doesn't offer enough benefit over install -U to be worthwhile.

This implementation will follow the same semantics.
If the current implementation of pip install -U is not behaving as wanted

Well, "same" if you specify the same requirements to install -U as upgrade-all generated. And again, I'm not disputing that's what it does, I'm asking whether it should. If we're adding a whole new command just to save piping the output of pip list into pip install -U, I don't think that the cost is justified.

Personally I am against deleting packages without confirmation/ a second command. What if I had em-keyboard installed and wrote a script that uses xerox. I never had the need to manually install xerox because it was available on my system. I would not expect the xerox package to go away because I updated em-keyboard, unless I explicitly asked for a deletion.

See? You understand precisely why this case is complex. And what I'm saying is that a command that left me with both pyperclip and xerox in the environment wouldn't be of any use to me. I'm questioning here whether we even need an upgrade-all command if this is the best we can manage (and to be clear, I was someone who argued for such a command originally, before I'd appreciated the complexities involved).

Looking back at the history in #59, note the comment here:

#4551 exists and it would be nice to have a fresh discussion on this; when #988 is done.

We're at the point now where #988 is done. But we haven't really had a discussion on the functionality yet - we seem to have gone straight to implementing something based on one comment that suggested that pip list | xargs pip install -U was the basic workflow. I'm trying to push back on that statement, saying that I have never wanted an upgrade-all command that worked like that. There have definitely been times when I would have liked an upgrade-all command, but I'm not actually sure I could have articulated what I wanted it to do, so in the end, the lack of such a command probably helped me, by making me think more about my requirements, rather than just blindly upgrading everything.

🤷 I don't really have the energy for an extended debate on this right now. Let's just say I'm -1 on the proposal as it stands. You'll need to convince at least one pip maintainer that this is a good idea before it can proceed, but I doubt that is going to be me.

@JensTimmerman
Copy link
Author

Thank you for the extensive feedback! I mainly made this pr to give an idea how I would like to see it implemented and as such an example of how it could be implemented and get the discussion started again ;)

So please, discuss away!

@JensTimmerman
Copy link
Author

I just did notice some issues with this approach, I'm not sure where they're comming from,

pip upgrade-all will upgrade celery to latest version 5.1.2 whilst I have django-task-api installed that requests a celery==4.*

pip install -U django-task-api celery does not attempt this update

@BlueskyFR
Copy link

hi! any update on this?

@AkechiShiro
Copy link

Hi @pfmoore, so I take it first of all, there is a need to define semantically what means "upgrade-all" for pip ?
Could this be through a PEP or PEP are only for Python stuff ?
Maybe once, there is something polished on that regards, can such a command start to be implemented ?

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2023

It doesn't need a PEP, as it's a pip-specific matter. It "just" needs someone to actually think through all of the possibilities, and not just the obvious ones or the ones where what the user intends is clear.

@AkechiShiro
Copy link

AkechiShiro commented Aug 4, 2023

End goal : User upgrades all its pip packages.

So far, I believe that an "all-upgrade" should :

  • check each requirements of the latest release of a package, it should try and diff the new dependencies from the older ones (recursively).
    • Try and resolve/install any new dependencies (recursively too).
    • Remove any older dependencies that are no longer needed (recursively too).
      • In case, the dependency is needed by any other dependencies then it should not be removed.

Maybe in order to achieve this, I don't know but maybe building 2 graphs/list of all dependencies (older ones and newer ones required) and then do some kind of diff between the 2, to see new dependencies introduced and older that are removed.

Specifically, we'd need to determine which packages were explicitly installed by the user (i.e. aren't marked with PEP 376's REQUIRED file), make the dependency resolver ignore already installed packages in the environment and add the ability to uninstall dependencies that are no longer required in the environment.

This would be a bit more complex compared to the current implementation, but I think it'd be worth it for the correctness -- we'd want to backtrack in case of incompatible dependencies and only upgrade to the most recent version that's compatible with the entire package set.

There is this, which I'm not sure exactly what it is about, PEP 376 mentions this : https://peps.python.org/pep-0376/#implementation-details is this anything of concern ?

I also don't see the REQUIRED file mentioned in PEP 376 but only :

METADATA: contains metadata, as described in PEP 345, PEP 314 and PEP 241.
RECORD: records the list of installed files
INSTALLER: records the name of the tool used to install the project
REQUESTED: the presence of this file indicates that the project installation was explicitly requested (i.e., not installed as a dependency).

I also read the test cases needed, I do agree that extensive testing should be done.

@pfmoore is there anything that isn't clearly defined or a design decision that has not yet been answered that would hinder implementing this feature ?

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2023

Lots. If I have A 1.0 installed, and B 1.0 installed, and A 2.0 and B 2.0 exist, but A 2.0 depends on B<2.0, and B 2.0 depends on A<2.0, which of A or B should an upgrade-all command update? You can't update both (they are incompatible) but there's no reason to prefer one over another. And you can't ask the user, because pip is often run in non-interactive environments.

I can probably think of many other similar cases with a little more time, but to be blunt, this is the point. I don't have the time to spend looking for exceptional cases like this, which is why the responsibility is on a contributor to think about all these sorts of possibility, and come up with a proposal that addresses them. No-one's expecting a first-draft proposal to be perfect, but it must reflect the fact that this idea has been round for over 12 years, and if no-one has come up with a solution yet, it's very unlikely that a simplistic answer will work 🙁

@JensTimmerman
Copy link
Author

JensTimmerman commented Aug 4, 2023

@pfmoore what is wrong with the current semantics of pip upgrade and treating the upgrade-all command as if someone just runs pip install --upgrade with a lists all the packages that are on their system? Do you want to tell us that that semantic is also broken and should be thought about?

There is currently already the option to run
pip install --upgrade A B in your hypothetical situation, and pip will do things, these things are already defined and designed.

If you meant the order of A B vs B A currently makes a difference, then indeed, one needs to think about an acceptable ordering. e.g. alphabetical. For a first draft this could be acceptable?

@notatallshaw
Copy link
Member

notatallshaw commented Aug 4, 2023

Lots. If I have A 1.0 installed, and B 1.0 installed, and A 2.0 and B 2.0 exist, but A 2.0 depends on B<2.0, and B 2.0 depends on A<2.0, which of A or B should an upgrade-all command update? You can't update both (they are incompatible) but there's no reason to prefer one over another. And you can't ask the user, because pip is often run in non-interactive environments.

In backtracking this is done by eventually falling back to the original user order of requirements, as I understand the intention of this PR it's equivalent to taking the output of pip list as the user requirements, and I believe that is in alphabetical order(?) so I think A would be prioritized over B.

conda implements an update --all command and there's an even more fun extension of this scenario that regularly comes up in the real world when you use it:

  • A 2.0 and B 2.0 are installed
  • A 3.0 and B 3.0 are released
  • A 3.0 depends on B==1.0
  • So when you update all, A upgrades to 3.0 and B downgrades to 1.0

In general I think users would find this surprising, if I had to support this logic I would prefer the equivalent of this Unix Shell script so that packages don't downgrade:

pip freeze --exclude-editable --quiet | sed 's/==/\>=/g' > upgrade-all.txt
pip install -r upgrade-all.txt --upgrade

Both choices (priority order) and whether a package can downgrade should obviously be documented.

@notatallshaw
Copy link
Member

notatallshaw commented Aug 4, 2023

There is currently already the option to run pip upgrade A B in your hypothetical situation, and pip will do things, these things are already defined and designed.

Ah but pip install A B --upgrade is different to pip install B A --upgrade, and the two commands can absolutely produce different results, with an upgrade-all the user has no way of providing an order so it needs to be documented what will happen.

@JensTimmerman
Copy link
Author

JensTimmerman commented Aug 4, 2023

Thanks for mentioning the ordering @notatallshaw, I was not currently aware that made difference, (this explains the issue I saw in #10491 (comment) ) I guess indeed this needs to be documented and perhaps allow for other orderings, e.g. put the most behind packages first, or put the most up to date packages first,

another option needs to be not to take any dependencies into account and just update every package to the latest possible version.
This is actually the semantics I'm looking for, I'd rather have the latest greatest version of every package than to have a consistent graph of dependencies.

If this means some things break, so be it, I'll fix the breaks and create a pr to the package and bump their dependency.
Since I noticed a lot of packages fix their dependencies with == without having a good reason for it. A lot of times things just keep working if you upgrade to newer versions.

@notatallshaw
Copy link
Member

If this means some things break, so be it, I'll fix the breaks and create a pr to the package and bump their dependency.
Since I noticed a lot of packages fix their dependencies with == without having a good reason for it. A lot of times things just keep working if you upgrade to newer versions.

There are several other discussions about implementing similar behavior for the use case where the user wants to potentially install a "broken" environment:

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2023

Thanks for mentioning the ordering @notatallshaw, I was not currently aware that made difference, I guess indeed this needs to be documented and perhaps allow for other orderings, e.g. put the most behind packages first, or put the most up to date packages first

Thanks for this. I didn't particularly intend that to be a complicated or difficult example, just something that needs to be considered, and which any proposal needs to have an answer for. The proposal by @AkechiShiro didn't cover it, that's all.

The key point here is that there's just a bunch of rather tedious but important thinking about edge cases to be done. So far the history has been one of people posting ideas, and then others saying "but what about X?" It's not proved a very productive approach, as people nearly always get tired of responding to such questions1 long before they've got enough of the detail sorted out to write a PR. I'm suggesting that it would be more productive for someone to do this thinking (and also researching the historical discussion on this feature) before putting together a proposal - one that's more complete than just a paragraph or two here sketching an idea2. I may be wrong, such an approach may also end up going nowhere, but at least it's better than just trying the same approach over and over again 🙂

Footnotes

  1. Which I suspect they view as people shooting down their proposal, sadly.

  2. @AkechiShiro's proposal is actually one of the more complete ones, to be fair.

@notatallshaw
Copy link
Member

notatallshaw commented Aug 4, 2023

The key point here is that there's just a bunch of rather tedious but important thinking about edge cases to be done

I think the issue here is Pip isn't a package or environment manager, it's just a package installer, so there's no precedent for what one chooses in these edge cases. And "upgrade-all" is more in the realm of package or environment management than "just" package installation.

For example in @AkechiShiro proposal it states:

Remove any older dependencies that are no longer needed (recursively too).
In case, the dependency is needed by any other dependencies then it should not be removed.

Pip has no way of knowing what is needed, Poetry has a golden source of requirements and Conda knows exactly how it built the environment, so in both cases there are more natural answers to "what is needed" but Pip has nothing.

E.g. taking the ever hard example of extras, at any time you can just run pip install "pandas[xml]", if something changes about your environment how do you determine if pandas or lxml is still needed? And as it's an extra I don't even think there's any way to determine after the fact that pandas "needs" lxml as you might have just run pip install pandas and pip install lxml at different times for different reasons.

Or a more simple example, what if you use and installed numpy and then you use and install pandas, let's say Pandas 3.0 drops Numpy as a back end and only uses PyArrow, when you upgrade to Pandas 3.0 should Pip uninstall numpy? Pip has no way to know you actually use that independently of Pandas.

@pfmoore
Copy link
Member

pfmoore commented Aug 4, 2023

I think the issue here is Pip isn't a package or environment manager, it's just a package installer, so there's no precedent for what one chooses in these edge cases. And "upgrade-all" is more in the realm of package or environment management than "just" package installation.

Yes, that's a really important point that I hadn't properly thought about. In fact, it's arguable based on that view that upgrade-all is actually out of scope for pip, because we don't have a concept of "the contents of the environment" (and in particular, of "what the user wants to be there).

This is probably also why this issue has lingered for so long. People who want this functionality end up using an environment manager that provides it directly, so they no longer care that much that pip doesn't provide it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an upgrade-all command
6 participants