Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poetry-core: consider *not* to vendor dependencies #2346

Closed
decathorpe opened this issue Apr 3, 2020 · 20 comments
Closed

poetry-core: consider *not* to vendor dependencies #2346

decathorpe opened this issue Apr 3, 2020 · 20 comments
Labels
area/core Related to the poetry-core library

Comments

@decathorpe
Copy link
Contributor

Python projects are starting to rely on pyproject.toml / poetry based build systems now, and this is also affecting fedora packages, for example. In this context, it's important to have a working, "native" poetry package.

However, using vendored dependencies - like in this project (which looks like will be required by what will be poetry 1.1.0) - will make it almost impossible to package poetry, let alone package it correctly.

Please reconsider the decision to use bundled dependencies.

@abn
Copy link
Member

abn commented Apr 3, 2020

Embedded/Bundled dependencies have long been a problem (JARs anyone?). Unfortunately, packaging and distributing these kinds of projects are still not straightforward (this includes golang and java packages too).

To your specific concern, doesn't pip do the same thing? In cases like poetry and pip vendoring base dependencies solves a lot of usability issues to a large portion of the end-users. So, I am not sure if it makes sense to not vendor.

In the specific context of fedora packages, I'd think poetry would be handled similar to what is done in python-pip.spec.

@decathorpe
Copy link
Contributor Author

Well, pip is really a special case in fedora, because - if I understand correctly - it's handled this way to solve bootstrap issues when upgrading to newer python versions, that doesn't apply to poetry.

And I really want to avoid to have to de-bundle things from poetry(-core).

@encukou
Copy link

encukou commented Apr 3, 2020

The issue with bundled stuff is that if one of those dependencies fixes a security vulnerability, a LTS distro needs to fix it everywhere – potentially in old versions of poetry, long after poetry has moved on and introduced incompatible changes.

pip is quite special because it's needed by Python itself. It's an example of something that's hard to maintain in a distro, not of something every build backend should do.

@leycec
Copy link

leycec commented Apr 4, 2020

Gentoo packager here. I'm packaging poetry >= 1.1.0a1 and poetry-core >= 1.0.0a5 for our humble source-based distro and can only concur with everything and everyone above – excluding @abn in his apologetic "But pip does this bad thing, so poetry should too!" screed. Appeal to authority is no justification for continued bad behaviour. pip is irrelevant; this is poetry.

As distro packagers, our core contention with bundled dependencies is exactly as described by @encukou: each bundled dependency increases attack surface. That's bad. While bundling dependencies may slightly ease the installation burden for end users installing poetry via the non-standard get-poetry.py installer, it does so only by substantially increasing the security risk for all users regardless of installation pathway.

When a major insecurity outweighs a minor convenience, the former takes precedence.

@leycec

This comment has been minimized.

@abn
Copy link
Member

abn commented Apr 4, 2020

@leycec no one is saying "But pip does this bad thing, so poetry should too!". The point of this issue is to discuss this matter and, if reasonable, figure out an alternative that makes a distro package maintainer's life easier.

As @encukou points out, there are inherent risks of bundling requirements. But these risks need to be weighed against multiple factors, this includes usability, access vectors, exploitability etc.

I disagree with @leycec that this is a "major insecurity" for a few reasons,

  1. poetry is a development tool
  2. currently, we strip out compiled binaries
  3. exploitation of any vulnerability or weakness in a vendored package requires the compromise of the host, trusted third-party services (PyPi, self-hosted repositories) etc.

As for a "minor inconvenience", it should be noted that poetry is expected to work with multiple python versions as a common use case is in environments that have mulitple python versions managed via pyenv. This is made a bit more trickier due to the fact that end users are on various platforms. Additionally,

  1. If a system managed dependent library is updated or changed, this will impact poetry's functionality.
  2. If (1) occurs, PEP-517 builds that depend on core will cease to work, ie. developers cannot install packages (ie. breaks a primary functionality of the tool).

@leycec, as for your last comment, poetry-core uses vendoring to keep in-tree bundled packages in sync. If you check here, you will see that the packages are in fact bundled similar to your reference project. We do not do any dynamic patching/installation at installation time.

All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.

To this end, any constructive feedback on this is always welcome.

@hroncok
Copy link
Contributor

hroncok commented Apr 8, 2020

One of the Fedora Python packagers here, hello. Few notes:

I completely understand and respect reasons why upstream projects like pip, pipetry or pipenv decide to bundle dependencies. The reasons are valid for upstreams.

As distributors, we don't like that very much and we try to avoid that for reasons other have already noted. Not only security fixes, but also other kinds of fixes are backported in distributions and maintaining and patching 8 different requests bundled in different projects is a nightmare. I understand that @abn wants to work toward understanding our reasons and respecting them -- thanks for this.

In Fedora, we keep bundled dependencies in pip out of necessity, not that we would want that. Considering that we build pip wheels and use those in venv and virtualenv in all the various Python versions we ship, debundling (devendoring) would become very complex and fragile. I can talk more about that, but it would get off topic.

A reasonable compromise would be to make unbundling supported and trivial. Currently, we would need to sed/patch each import statement when we unbundle. That makes upgrading the packaged poetry a very unpleasant and error-prone task. Can we work together to make the vendoring of the dependencies easily revertible at our build time?

Also, I'd like to cooperate here, not argue and fight. That isn't helpful.

@encukou
Copy link

encukou commented Apr 8, 2020

If a system managed dependent library is updated or changed, this will impact poetry's functionality.

You're either using:

  • distro-packaged poetry, which should be tested with the dependencies the distro provides, or
  • poetry installed from get-poetry.py, wheels, or pip, where you can pin dependencies as tight as you like. This version should not depend on system libraries; that would definitely be dangerous.

All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.

Quite a lot of the libraries poetry vendors are backports of Python's standard library, which is has quite strong backwards-compatibility guarantees. How hard would it be to use stdlib versions of those on Python versions that have them?

For the others, it seems that:

  • Poetry is a tool, not a library to be imported.
  • Most of the patching is there to replace imports with the poetry.core._vendor namespace.

Is that right?

One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches.
Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.

Does anyone foresee problems with this approach?

@hroncok
Copy link
Contributor

hroncok commented Apr 8, 2020

Does anyone foresee problems with this approach?

Not with unpatched dependencies, I don't really.

@decathorpe
Copy link
Contributor Author

One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches.
Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.

Does anyone foresee problems with this approach?

This sounds like a good solution that would satisfy both use cases. poetry without modifications would use the vendored dependencies, and distros could easily remove the sys.path modification and just drop the _vendor directory.

@abn
Copy link
Member

abn commented Apr 8, 2020

As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions. These will go away soon as we do plan on dropping 2.7 and 3.5 support in a future release.

The suggested apprach might work. However, I do need to run this by the other maintainers to make sure I have not missed something obvious, especially since I have not worked on the vendoring here. Will also need to test it out to make sure it does not break anything.

Will see if I can get this done before we get to 1.1.0.

@abn
Copy link
Member

abn commented Apr 23, 2020

Initial fix for this has been merged into master.

For debundling dependencies you can simply drop poetry.core._vendor. Versions of bundled requirements are available in vendor.txt under that directory.

We'll look into any issues that crop up as they are identified.

@decathorpe
Copy link
Contributor Author

This looks great, thank you!

I'll report back if I'm encountering any issues in the packaging context :)

@hroncok
Copy link
Contributor

hroncok commented Apr 23, 2020

As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions.

Just a question: Are those needed to run Poetry on the older Python versions, or also to manage older Python versions projects? E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata ever used?

@abn
Copy link
Member

abn commented Apr 25, 2020

@hroncok with the change I have done the following.

  1. Removed any and all unused dependencies
  2. Debundled Python 2.7 compatibility dependencies

As for the remaining bundled dependencies; you will notice that everything except six is not related to backwards compatibility. The importlib-metadata package is no longer used by poetry-core. We patch its use by jsonschema so that we do not have have to vendor it.

Also, it is important to not that this is specific to poetry-core. Poetry itself, does use importlib-metadata for Python versions less than 3.8, but it will be explicitly indicated. Poetry itself does not, and will not vendor dependencies for the PyPI published package. The only case where we vendor is when Poetry is installed using the get-poetry.py script, this is done in order for it to work seamlessly in multi-python environments.

Let me know if I can provide more information on this or if something needs to be clarified further.

@abn abn transferred this issue from python-poetry/poetry-core Apr 25, 2020
@abn abn changed the title Consider *not* to vendor dependencies poetry-core: consider *not* to vendor dependencies Apr 25, 2020
@abn abn added the area/core Related to the poetry-core library label Apr 25, 2020
@hroncok
Copy link
Contributor

hroncok commented Apr 25, 2020

Thanks!

@abn
Copy link
Member

abn commented Apr 25, 2020

@hroncok I realised I might not have answered your question.

E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata ever used?

Poetry, when run on 3.8 to manage a 3.6 project, it does not need importlib-metadata package. The dependencies are for Poetry itself. Poetry, will be able to manage Python 2.7 projects even after Poetry itself drops runtime support for Python 2.7.

@abn
Copy link
Member

abn commented Jun 30, 2020

This was resolved via python-poetry/poetry-core#25

@pradyunsg
Copy link
Contributor

doesn't pip do the same thing?

pip's has a published rationale for doing so: https://pip.pypa.io/en/latest/development/vendoring-policy/#rationale. As noted by others, it's uniquely positioned in this situation. :)

Copy link

github-actions bot commented Mar 3, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/core Related to the poetry-core library
Projects
None yet
Development

No branches or pull requests

6 participants