Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to restrict package indexes for specific packages (downstream pipenv issue 4637 #10964

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

matteius
Copy link
Member

@matteius matteius commented Mar 15, 2022

Greetings, this has been a long 24 hours for me as I began debugging this particular issue early Sunday on the pipenv side, and I have never opened a PR with pip directly before so please help guide me along the way.

Problem statement: There should be a way to marry up the pipenv package index restrictions to the Pip resolver so that it does not openly scan indexes that are package restricted.

This was revealed to me while triaging pipenv issue pypa/pipenv#4637 and in my hours of debugging the code, I had to conclude that pip resolver does not have a way to restrict only certain packages in the resolution or install process. After several attempts at arriving at a solution, I found this change to be cleanest and you can see it in the context of Pipenv with a passing build here: pypa/pipenv#4983

The actual test example of pipenv issue 4637 is complex and I haven't figured out adding that to the test runner yet, but that branch does in fact resolve the packages properly when particular packages are pinned to a particular index and multiple package indices are available thanks in part to this modification to pip internals.

I have no idea how the pip community will react to this PR. I am hoping that it will kick off the CI so I can see if I break any pip tests and perhaps get some guidance on how/where to add tests if this is in fact the right direction to head in. If this is not the right direction to head in, then please guide me towards the solution as I've spent many hours on this particular issue already dedicated days to improving the pipenv project with the help of others in the community.

This is the only current issue I know of in the pipenv backlog that seems very high severity, appears isn't already resolved upstream, and I believe relates to a number of other issues we are concerned with, such as package confusion where a package with the same name in pypi can be preferred over one in a private pypi server.

Note: I will be happy to add a news fragment if some version of this gains tractions/acceptance.

…ns to the Pip resolver so that it does not openly scan indexes that are package restricted.
@pfmoore
Copy link
Member

pfmoore commented Mar 15, 2022

This change seems to be purely adding functionality to the internal pip API, but not using those changes for any user-visible feature. I assume therefore that this is intended to be in support of some usage in pipenv of pip's internals? Assuming that's the case, the problem here is that pip doesn't have a supported programmatic API, and what pipenv is (presumably) doing is unsupported. We don't accept API changes like this, because we won't offer guarantees to keep them working or to maintain them across pip releases.

So unfortunately for your case, the fix needs to happen within pipenv. Ideally by pipenv using pip as a command line program, in line with the documented and supported approach but more realistically I assume that won't be possible, and pipenv will continue to try to work with the internals and take the risk that cases like this come up where the internals aren't sufficient for what you want.

Sorry I can't offer a more positive response, but the question of pip's (lack of) programmatic API has been discussed many times in the past, and this remains the current policy.

@matteius
Copy link
Member Author

This change seems to be purely adding functionality to the internal pip API, but not using those changes for any user-visible feature
The user-viable feature would be to allow index restrictions of specific packages when running a single pip install perhaps. You are right, as I stated in my description, I was lead to this trying resolve a major security bug and somewhat common use case of pipenv. However I am surprised pip doesn't support something similar; to me it would be great if pip did support restricting package indices and to do so would require adding/using similar code to what I have proposed here I think.

the problem here is that pip doesn't have a supported programmatic API

Yeah, that seems like a big problem. We are using the same finder and resolver type setup that pip uses its true, but from what I have seen of this topic in the past seems kind of ridiculous -- pipenv is going to be a better product if it can rely on the pip resolver and installer. One upon a time it had its own resolver I guess and it was not great. Pip is the python package installer, so it makes sense to me that pipenv would be attempting to use it. Its too bad that such important features are overlooked on the CLI.

Ideally by pipenv using pip as a command line program, in line with the documented and supported approach but more realistically I assume that won't be possible

I believe the specific reason is that its not possible to issue separate pip commands because pipenv is trying to resolve the whole tree with respect to the other packages, and so issuing separate pip commands does not allow the resolver to determine package conflicts between versions that were resolved/installed by separate runs of pip commands into a single lockfile.

For what pip CLI allows is overriding the primary package index, and optionally specifying additional extra-index but this doesn't allow specifying example comes from private-index1 and the rest can resolve/install freely.

pipenv will continue to try to work with the internals and take the risk that cases like this come up where the internals aren't sufficient for what you want.
I guess so, it seems to me though that it would be great for pip CLI to also allow specifying index restrictions per package. The new resolver seems pretty great and capable, but it is deficient at how it handles multiple indexes or the ability to restrict an index to a specific packages and resolve the whole lot of them against multiple indices.

the question of pip's (lack of) programmatic API has been discussed many times in the past, and this remains the current policy.
If you have any of these discussions available to link that might might be help. It seems especially not great to try and go in a direction where we maintain a separate resolver to what pip has implemented or what some of the other thoughts/recommendations has been. Aside from that my own experience is that: Keeping the pip patch files up to date within pipenv is awful; I had spent days trying to vendor pip 22.0.3 only to realize it had some regression that was rolled back in 22.0.4 and that 22.0.4 works again with the existing code and while that's not Pip's problem, I can complain about it. This situation is not ideal when the internal API is 99.8% of what we need and just a couple edge cases and policy help create an difficult to maintain downstream project within the same python packaging authority space.

Finally, thanks for your response @pfmoore.

@pfmoore
Copy link
Member

pfmoore commented Mar 15, 2022

If you have any of these discussions available to link that might might be help

The master issue for this is #3121. Ultimately, the conclusion there was mostly that the existing internals are not structured as a formal public API, and we have neither the resources, nor the intention, to rewrite pip around a formally supported (and therefore stable) public API. There are only a few projects that use pip's internals, and they seem, if not happy, then at least accepting of the current state of affairs.

In an ideal world, yes, there would be a fully maintained library that did all of the various packaging steps, and pip would simply be one consumer of that library. But the history isn't like that, and while we're working on standardising as many parts of pip as we can, and splitting them off into libraries, it needs other people to actually do that work. The pip maintainers have their hands full simply keeping pip as it stands running 🙁

It seems especially not great to try and go in a direction where we maintain a separate resolver to what pip has implemented

Pip's resolver is a separate project, surugaku/resolvelib, so you certainly shouldn't need to maintain a resolver. You might need to write your own interface to it, if you choose to go down that route. There are libraries for other parts of pip's functionality, not all of which pip uses itself. So many of the parts you want do exist. But integrating them will take work.

while that's not Pip's problem, I can complain about it

Well, you can, but not to the pip maintainers 🙂 We never said what you are doing would be easy or supported. I'm actually rather surprised you're not more aware of this - @uranusjr is both a pip maintainer and (I believe) a pipenv maintainer (as well as the author of resolvelib!) so I'd have thought the pipenv project is very well aware of the status of their pip integration...

@matteius
Copy link
Member Author

That master issue report #3121 is locked so open discussions cannot be had.

@pfmoore
Copy link
Member

pfmoore commented Mar 15, 2022

It is. I provided it to you for context. There's not much discussion to be had as far as I can see. What specific, actionable proposal do you have here around the question of whether pip has a supported programmatic API?

To be clear, I'm reluctant to accept this PR because it will set an expectation in your mind that we won't simply break the API (probably accidentally, we're not going to do it just to mess with you, but maybe deliberately because it makes it harder to do something else we need to do). I don't want to set such an expectation, and I'm trying to explain why by providing some background.

Also, have you considered that if you are using these internal pip APIs, others might be as well? And by proposing this change to the signature of various functions, you might be breaking their usage? I appreciate that this is unlikely, because you've made all the new parameters optional, but it's a possibility we have to consider once we start accepting that we support people using the internal API in any way. Would you be happy if we accepted a PR from someone else that helped their usage of pip's internals, and broke pipenv?

You're welcome, of course, to copy code out of pip, and patch your own version of that code. If you do that, the responsibility is on you to avoid breaking anything, just as it currently is when you use pip's internal code. I'm pretty sure pipenv used to do this (having a complete patched copy of pip internal to their code) but dropped it because the maintenance burden was too high. (Actually, it looks like they still do - https://github.com/pypa/pipenv/tree/main/pipenv/patched/notpip. I have no idea how that copy is used, but it looks a bit out of date - it claims to be pip 21.2.2. Maybe you should be looking at that, as well?)

Hmm, actually, I see that you already created a pipenv PR that patches their internal copy of pip. Is this PR simply to avoid pipenv having to carry that patch? It would have saved me a lot of time if you'd been clear about that up front 🙁

@matteius
Copy link
Member Author

@pfmoore Thanks again for your reply and I can definitely see you have thought a lot about pip internals in general. I am going to respond now to some of your points but it will definitely take me more time to consider some of what you bring up, especially how to frame my thoughts around this I defer for the moment on

What specific, actionable proposal do you have here around the question of whether pip has a supported programmatic API?

(Actually, it looks like they still do - https://github.com/pypa/pipenv/tree/main/pipenv/patched/notpip. I have no idea how that copy is used, but it looks a bit out of date - it claims to be pip 21.2.2. Maybe you should be looking at that, as well?)

Yes so as I have learned the way it works in the pipenv project is we vendor in a specific version of pip which goes into that patched/notpip directory. Why its called notpip is very confusing because it is essentially pip, but I assume a namespace issue combined with the fact of a few patches. Since we rely on pip internals and it is expected pip could change them at any time, we have to maintain a copy of pip that is compatible with the current version of the pipenv code, but the goal is to not have to patch it.

I have no idea how that copy is used, but it looks a bit out of date - it claims to be pip 21.2.2. Maybe you should be looking at that, as well?)

I already have it out for a separate PR to upgrade pipenv's vendoring to pip==22.0.4 -- this essentially pulls the newest pip version into the project, and reapplies the old patches that have been updated for where the new code lives. It actually fixes us a number of resolver bugs to go from pip 21.x to 22.x and is awaiting some upstream PRs to requirementslib and pipshims to support the new version, which I also have PRs out for. Ref point: pypa/pipenv#4969

Hmm, actually, I see that you already created a pipenv PR that patches their internal copy of pip. Is this PR simply to avoid pipenv having to carry that patch?

The template for Pipenv PR's state this:

Please try to refrain from submitting patches directly to `vendor` or `patched`, but raise your issue to the upstream project instead, and inform Pipenv to upgrade when the upstream project accepts the fix.

A pull request to upgrade vendor packages is strongly discouraged, unless there is a very good reason (e.g. you need to test Pipenv’s integration to a new vendor feature). Pipenv audits and performs vendor upgrades regularly, generally before a new release is about to drop.

If your patch is not or cannot be accepted by upstream, but is essential to Pipenv (make sure to discuss this with maintainers!), please remember to attach a patch file in `tasks/vendoring/patched`, so this divergence from upstream can be recorded and replayed afterwards.

So that is why I opened a PR against pip to find out if conceptually there are items of value here that will help move pip forward. Ultimately I have been trying to be more involved in contributing to the open source community and pipenv has given me that opportunity over the last few months. So I'll wrap up this current thread with a few notes/replies to some of your questions.

Also, have you considered that if you are using these internal pip APIs, others might be as well?

Sure, but I am told that not many are because it is discouraged, and I doubt that those who are using it are also within the pypa organization which currently helps to maintain pipenv.

Would you be happy if we accepted a PR from someone else that helped their usage of pip's internals, and broke pipenv?

It has happened before, I guess it is the risk we take on by being pipenv.

It would have saved me a lot of time if you'd been clear about that up front

I apologize, but I did try to be as clear as I could in the description of this PR I opened that it originated as an issue within pipenv; I linked to the issue and the PR I have open for it. Since this is my first PR to this project, hopefully not last--I expected some amount of churn and uncertainty, but I don't feel as though I am wasting anyone's time by asking for more consideration on ways to restrict the package indices of certain installation requirements while allowing others to remain open.

I did this in the form a PR rather than an issue report because that was the stage at which I found myself working on the underlying problem. My thought is that perhaps there is some use here for pip itself to make use of and I certainly wouldn't want to have this code merged into pip without minimally adding unit tests demonstrating why it exists, and the ideal would be pip itself could offer similar functionality though maybe that is not on the road map. Seeing as the existing historical maintainers for pipenv, except for @frostming have been fairly unresponsive as of late, some of the earlier decisions of the project are not totally transparent and so I am learning as I go.

@pfmoore
Copy link
Member

pfmoore commented Mar 15, 2022

I think probably the conclusion here should be that (because it has no supported programmatic API) pip is a special case, and the comments in the pipenv codebase don't necessarily apply here. To my knowledge, no-one from pipenv has ever asked us for internal API changes, which suggests that might well be the case.

In any case, thanks for your contributions to pipenv, and best of luck with getting more involved 🙂

@matteius
Copy link
Member Author

Hmmm, well the patched directory is primarily about pip I believe, so maybe our template needs to have a different message about changes to that directory. Anyway thanks for your help, I hope to debate further at some point the merits of pip actually offering/supporting an internal API that can do the things it can already do, that the CLI doesn't expose so that we can move away from this medieval process but for now I'll plan on moving forward with patching pipenv's pip and hoping someone else chimes in here on the usefulness of restricting package indices in isolation.

@uranusjr
Copy link
Member

My personal stance on this is pipenv should eventually move off using pip internals directly (either by copying pip’s implementation, or depend on something else altogether), so I lean more toward changing this in pipenv.

However, pip’s code base does need some changes to move it toward implementing better index selection in general, namely #8606 (comment) or the narrower-scoped #9715. From my investigation into it, the work would also involve a similar design that “remembers” which packages should be included/excluded by an index, so it’s likely this can go into pip if you could also help with those issues.

@pfmoore
Copy link
Member

pfmoore commented Mar 15, 2022

Thanks @uranusjr for the context. Just to be clear @matteius this isn't in conflict with my earlier comments - if there's a functional change in pip that triggers an internal API change, that's fine. And if that internal API change is useful to pipenv, then that's also fine. It doesn't make the API any more supported or stable, but apart from that, there's no harm in pipenv benefitting from new pip features. It's just changes to the API with no user-visible impact that I was talking about.

@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants