Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting dependency source doesn't work for transitive dependencies #2610

Closed
3 tasks done
BenWhetton opened this issue Jul 1, 2020 · 16 comments
Closed
3 tasks done
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations status/wontfix Will not be implemented

Comments

@BenWhetton
Copy link

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option). N/A
  • OS version and name: Ubuntu 18.04
  • Poetry version: 1.0.9

Issue

TLDR: Specifying individual dependencies source using the source field (see #908) is ignored for transitive dependencies. This greatly limits its usefulness and exposes users to supply chain attacks.

I have three private repos numpy, lib and app (I know I it would be unwise to call my package numpy, this is to illustrate the problem).
lib depends on numpy and app depends on lib. They all live in acme corps private pypi repository at http://my-pypi/simple/.

I have set up my-pypi as the primary source in pyproject.toml for lib and app.
To make sure thay my numpy is used by lib, its pyproject.toml includes this line:

numpy = { version = ">0", source = "my-pypi" }

This works fine. When I install lib using poetry install my private numpy package is installed as a dependency.

app depends on lib, app's pyproject.toml includes this line:

lib = { version = ">0", source = "my-pypi" }

Now, when I run "poetry install" for app, the public PyPI version of numpy is installed!
Is this expected and/or intended? As a user, I would expect app to use my private numpy in this case. If that isn't the case, it should be clearly specified in the documentation. The current behavior exposes users to supply chain attacks which they might expect to avoid using the source field.

In fact, the source field isn't documented anywhere although it is has been in poetry since v1.0.0. I found it described here: #908.


I believe this is may be a similar issue to #1356 but it concerns pure poetry workflows so I don't think it is a duplicate.


lib's pyproject.toml

[tool.poetry]
name = "lib"
version = "0.1.0"
description = "A library"
authors = ["me"]

[[tool.poetry.source]]
name = "my-pypi"
url = "http://my-pypi/simple/"

[tool.poetry.dependencies]
python = "^3.6"
numpy = {version = ">0", source = "my-pypi"}

app's pyproject.toml

[tool.poetry]
name = "app"
version = "0.1.0"
description = "An application"
authors = ["me"]

[[tool.poetry.source]]
name = "my-pypi"
url = "http://my-pypi/simple/"

[tool.poetry.dependencies]
python = "^3.6"
lib = {version = ">0", source = "my-pypi"}
@BenWhetton BenWhetton added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Jul 1, 2020
@BenWhetton
Copy link
Author

Any comments?
Is this the expected/desired behavior and/or is there a different way to solve this problem that I am not aware of?
We are evaluating tools at work and this might be a deal breaker.

@lioman
Copy link

lioman commented Oct 1, 2020

Since pypi seems to be an option as index poetry will first look on this index. You need to set a private index as general source: https://python-poetry.org/docs/repositories/#using-a-private-repository

And you can disable pypi: https://python-poetry.org/docs/repositories/#disabling-the-pypi-repository

@bigpick
Copy link

bigpick commented Sep 26, 2022

After much searching, this appears to be the closest issue to what I am seeing, also.

Using custom repo sources, when trying to grab transitive deps, poetry doesn't seem to respect the fact that the parent of the transitive dep had an explicit source set from the driving pyroject.toml;

Basically, the following:

[[tool.poetry.source]]
name = "repo_one"
url = "https://some.url"
secondary = true
default = false

[[tool.poetry.source]]
name = "repo_two"
url = "https://some.other.url"
default = true
secondary = false


[tool.poetry.dependencies]
package1 = { version = "*", source = "repo_two" }

We have multiple custom repositories defined: a default and a secondary; when we go to install a package after explicitly including its source as source_two, it will still for some reason attempt to grab child/transitive deps from source_one, which causes the install to fail entirely. All required things are definitely in the specified source (for example, removing the not-specified source_one block and trying to install works successfully, but that breaks us being able to use source_one for other things).

@neersighted
Copy link
Member

Transitive dependencies are not associated with a source constraint right now by design, as far as I am aware. If you want to control them you must specify them as top-level dependencies, just like if you wanted to explicitly control their version.

Thanks for bringing attention to this as we're revisiting some of the semantics of custom sources recently -- @dimbleby @radoering I'd love to hear what both of you think regarding this being desirable/if it's practical to teach the solver to make source 'contagious'.

@bigpick
Copy link

bigpick commented Sep 26, 2022

If you want to control them you must specify them as top-level dependencies, just like if you wanted to explicitly control their version.

😅 Woof, yeah, OK maybe time to look into doing something different in the meantime while that hopefully gets worked out; I assume that's work that's centered around the discussion at #5984?

I wasn't sure if that work would also implictly cover the transitive dep part, since if using something like say that "private" label; would it then try to use that defined source for the declared dep and all required transitives, or would it then fall to the same, everything-gets-searched-regardless issue since that would still only be covering the top level package?

@neersighted
Copy link
Member

neersighted commented Sep 26, 2022

Not directly related/covered, but worth talking about at the same time. Stewing on this a little, honestly I don't think there's much we can do.

If we make source = contagious, what happens when one of your private deps depends on say, numpy? Now you have to mirror numpy in your private repo. I think that overall this might be wontfix as I simply don't see a way to make the cure (source = spreading) less painful than the sickness (having to specify private transient deps source when you care).

This also mirrors the existing semantics: "If you care about versions (for wheels), constraints, or source, it's now a top-level dep even if you don't directly import it -- top level deps represent deps you care about and not importable code"

@bigpick
Copy link

bigpick commented Sep 26, 2022

If we make source = contagious, what happens when one of your private deps depends on say, numpy? Now you have to mirror numpy in your private repo.

In our use case, this is required to happen for reasons, so I would be fine with that behavior (and prefer/desire it). Consider the Remote+Local=>Virtual Artifactory PyPi usecase; If I have a project that depends on an internally published package that has numpy as a dep, the numpy transitive dep would have already been grabbed+cached; By using the same top level virtual source in the pyroject.toml for the parent dep, its not/shouldn't be re-downloading all of a new numpy per-se, right? The transitive dep has already required pulling the package into the Artif cache when the using package itself was built; I just need poetry to not try to fetch/look at other things when I know what it needs is already there, and not in the other places

I can see why others wouldn't want to have to mirror the entire world, but just throwing out there, in case others would might +1 the usecase where all of their packages have to come from non-external repositories anyhow.

@neersighted
Copy link
Member

Remote+Local Artifactory is certainly a use case, but not the norm -- most users are using small indexes with a subset of packages available to supplement PyPI. I don't think we can force everyone over to your proposed new behavior for that reason. Likewise, adding a knob and supporting both seems a bit fraught.

I suppose that if we tried to implement such a feature, we might introduce a new source = {name = "name", recursive = true} optional syntax. I am quite nervous of the users who then ask us to allow opting out of this recursion on a per-package basis 😆

I do wonder, if you're using a Virtual repository, why you can't just set default = true and not use source = at all/be done? It seems like that would be by far the easiest solution.

If I have a project that depends on an internally published package that has numpy as a dep, the numpy transitive dep would have already been grabbed+cached; By using the same top level virtual source in the pyroject.toml for the parent dep, its not/shouldn't be re-downloading all of a new numpy per-se, right?

Besides my concerns about the Artifactory workflow being hardly universal, I don't see how this is the case. I think most users would find it more unexpected if source = was contagious -- e.g. users of ML packages like torch that are distributed using an alternate index would find this to be the exact opposite of what is expected.

@bigpick
Copy link

bigpick commented Sep 26, 2022

I do wonder, if you're using a Virtual repository, why you can't just set default = true and not use source = at all/be done? It seems like that would be by far the easiest solution.

I'm not sure I follow this, isn't that what I have already above? This would only seem to work if we didn't also use any additional secondary repos from my understanding (and only specified the one custom default repo source, but we don't do that

... which IIUC, we have to do because using the same URL for both fetching+publishing doesn't seem to work when using /simple (so we have the one custom source with default=true and /simple for downloading packages, and the other custom source with essentially the same URL minus /simple/ and secondary=true strictly for publishing to)


... as you said though, the "just use the one custom default source" was what I was considering moving to as a "solution" as it's probably the fastest way to desired behavior, and then just stuffing the custom publishing source in where needed right before publication time (but after build); just was hoping to be able to keep it in there along with the pyproject.toml stuff, instead of having to stuff it in dynamically as part of CI

@neersighted
Copy link
Member

I think I'm starting to understand what's going on here -- you don't want to put a publishing target into pyproject.toml -- it's meant to be configured in poetry.toml or config.toml depending on if you use poetry config with the --local flag.

See https://python-poetry.org/docs/repositories/#publishable-repositories.

I do believe your existing use case is well-handled by a single source with default = True as you will fetch all of your deps from it, and your Local repository will take priority over the Remote PyPI repository with Artifactory.

FWIW, including less redacted URLs would have made it easier to figure out what was going on, but I think we got there. If there is any spot in the documentation that you think misled you to believe publishing targets were specified like package indexes, please let us know or submit a PR clarifying.

@bigpick
Copy link

bigpick commented Sep 26, 2022

Poetry treats repositories to which you publish packages as user specific and not project specific configuration unlike package sources

Ah, I see; don't know how I missed this🤦 Reading through though, makes sense.

I do believe your existing use case is well-handled by a single source with default = True as you will fetch all of your deps from it, and your Local repository will take priority over the Remote PyPI repository with Artifactory.

Yep, agree! And apologies, that's my bad for not pointing out that it was a convoluted setup for the same Artif source just trying to use it differently

Thank you!

@neersighted neersighted added kind/feature Feature requests/implementations status/needs-consensus Consensus among maintainers required area/sources Releated to package sources/indexes/repositories and removed kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Sep 26, 2022
@dimbleby
Copy link
Contributor

I'd be happy enough to close this as wontfix

  • wanting exclusively to use a private repository is a common use case and is supported (that seems to be what today's conversation is about)
  • wanting to get only some specific packages from a secondary repository is a common use case and more-or-less supported, albeit with improvements possible per Don't search secondary repositories if not required #5984
  • wanting to name packages so that they collide with public packages, and not be explicit about what source they come from, but also relying on that working out in some particular way... yuk.
    • original report wanted their implicit numpy requirement to resolve to their private package, but if poetry did behave that way then some other user could just as legitimately raise a bug report saying that they wanted public numpy installed the whole time...

Of course the goal is not "the same as pip", but as a rough level-set for what's typical... pip supports

  • --index-url, approximately equivalent to a private repository with default = true
  • --extra-index-url, approximately equivalent to a secondary repository
  • (so far as I know) nothing remotely equivalent to what is being asked for here

@neersighted neersighted added status/wontfix Will not be implemented and removed status/needs-consensus Consensus among maintainers required labels Sep 27, 2022
@neersighted
Copy link
Member

I think I am going to go ahead and close this as wontfix for now -- it's not impossible to be added ever, but given Poetry's current codebase and established semantics this would be a deeply surprising change.

If a use case that can't be met with Poetry's regular semantics materializes (plus after existing plans to refactor sources are completed), we could revisit at a later date.

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2022
@domenix
Copy link

domenix commented Oct 27, 2023

If we make source = contagious, what happens when one of your private deps depends on say, numpy? Now you have to mirror numpy in your private repo. I think that overall this might be wontfix as I simply don't see a way to make the cure (source = spreading) less painful than the sickness (having to specify private transient deps source when you care).

We do have every transitive dependency in our private repo, or at least that's the goal. No PyPi is allowed in the CI pipeline, but for local development PyPi is also used. Specifying each transitive dependency as top-level is not a viable option.
I want to provide an easy way for developers to transition from pypi to the private repo, and the current idea is to enforce the private repo by default and install new packages with pypi as source, evaluate them and later upload to the private repo, remove the source specifier from the pyproject.toml, repeat.
Current behavior of Poetry is that the private repo is queried which results in failure, as there is no way to specify explicit priority for transitive dependencies. I can't make pypi "primary" as then it'll pull dependencies from there even if the dependency does not have the pypi specifier.

This project looks really promising and I've been reading the docs all week, but to no avail. Do you have any suggestions how can this process be achieved with Poetry?

@domenix
Copy link

domenix commented Oct 30, 2023

Any thoughts about this @neersighted ?

I'd like to be able to specify two sources, with private repo as default, but I don't want Poetry to silently pull packages from pypi, so I have to declare it as explicit. But then this issue happens, the explicit source does not apply to transitive dependencies... not sure if I'm missing something.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations status/wontfix Will not be implemented
Projects
None yet
Development

No branches or pull requests

6 participants