Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-architecture wheel selection across disperate sources (with markers) #5205

Closed
3 tasks done
merberich opened this issue Feb 16, 2022 · 11 comments · Fixed by #6679
Closed
3 tasks done

Multi-architecture wheel selection across disperate sources (with markers) #5205

merberich opened this issue Feb 16, 2022 · 11 comments · Fixed by #6679
Labels
area/sources Releated to package sources/indexes/repositories kind/enhancement Not a bug or feature, but improves usability or performance

Comments

@merberich
Copy link

merberich commented Feb 16, 2022

  • I am on the latest Poetry version.

  • I have searched the issues of this repo and believe that this is not a duplicate.

  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).

  • OS version and name: Ubuntu 18.04 on both aarch64 and x86_64

  • Poetry version: 1.1.12

  • Link of a Gist with the contents of your pyproject.toml file: (see below for variants)

Issue

I'm trying to build a project that will be used in multi-architecture environments (aarch64 and x86_64 specifically). One of my project's dependencies (jaxlib) only offers prebuilt wheels for x86_64 via PiPy. As a solution, I've built and hosted the wheel for aarch64 separately, which necessitates adding secondary dependency sources in pyproject.toml like so:

[tool.poetry]
name = "temp"
version = "0.1.0"
description = ""
authors = ["merberich <merberich@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.9"
cupy-cuda102 = "^10.1.0"
jaxlib = [
  { version = "0.1.71" },
  { url = "<valid-url-to-aarch64-wheel-host>" }
]
jax = "0.2.20"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Building a new poetry.lock for this project produces jaxlib references:

cat poetry.lock | grep jaxlib
cpu = ["jaxlib (==0.1.71)"]
cuda102 = ["jaxlib (==0.1.71+cuda102)"]
cuda111 = ["jaxlib (==0.1.71+cuda111)"]
minimum-jaxlib = ["jaxlib (==0.1.69)"]
tpu = ["jaxlib (==0.1.71)", "libtpu-nightly (==0.1.dev20210809)", "requests"]
name = "jaxlib"
jaxlib = [
    {file = "jaxlib-0.1.71-cp37-none-macosx_10_9_x86_64.whl", hash = "sha256:ad36895ceb68782bb74f0657da127cece06c19ce1e72ad9c660f79bce549618e"},
    {file = "jaxlib-0.1.71-cp37-none-manylinux2010_x86_64.whl", hash = "sha256:059eb572121e3f13e3b841c07137a72f3d0aeb76dc6ddf178922a327994f60b8"},
    {file = "jaxlib-0.1.71-cp38-none-macosx_10_9_x86_64.whl", hash = "sha256:c823b65c95aa7b6d8eb4a45bc6389d43c8d6dd20fe0c531345753819c70cff54"},
    {file = "jaxlib-0.1.71-cp38-none-manylinux2010_x86_64.whl", hash = "sha256:8f4447d6053b55bab9565e365b60c80ff186b0cb05407146ea844d328eaba2bf"},
    {file = "jaxlib-0.1.71-cp39-none-macosx_10_9_x86_64.whl", hash = "sha256:f5fc1873c25a7b07f9406bcb09540cef6e2b151aa139be52e34cf35f7b8390b0"},
    {file = "jaxlib-0.1.71-cp39-none-manylinux2010_x86_64.whl", hash = "sha256:c66a1bdb57934093938fd6e1767216c198bd1a102b8f5636fdb1f9a5b0d11067"},

Note that for each of these wheels, the supported architecture is always x86_64... which my aarch64 machine does not use (IMO these wheels should not be considered valid sources on my aarch64 machine). Instead, poetry install loads the secondary source specified in pyproject.toml, which breaks the lockfile boundary:

poetry install
Installing dependencies from lock file

Package operations: 10 installs, 0 updates, 0 removals

  • Installing numpy (1.22.2)
  • Installing six (1.16.0)
  • Installing absl-py (1.0.0)
  • Installing fastrlock (0.8)
  • Installing flatbuffers (2.0)
  • Installing opt-einsum (3.3.0)
  • Installing scipy (1.6.1)
  • Installing cupy-cuda102 (10.1.0)
  • Installing jax (0.2.20)
  • Installing jaxlib (0.1.71 <valid-url-to-aarch64-wheel-host>)

So the real issue here is that the lockfile does not contain the secondary source, which is necessary to build on this machine - instead, poetry install has to reference the pyproject.toml to identify secondary sources, which defeats the purpose of having a lockfile...

Possibly worth note: I've also tried to use markers to have the lockfile generation see that both sources are needed to cover all relevant architectures:

jaxlib = [
  { version = "0.1.71", markers = "platform_machine == 'x86_64'" },
  { url = "<valid-url-to-aarch64-wheel-host>", markers = "platform_machine == 'aarch64'" }
]

Unfortunately, this didnt cause any change in results.

Our workaround will be to host an alternative package registry with dependencies provided for both x86_64 and aarch64 (and this way ALL wheels should appear a 'valid' sources in the lockfile), but that seems like a lot of extra infrastructure just to account for something the lockfile could/should contain.

I think the correct solution would involve having the lockfile list alternatives for cases where markers are used to specify multiarch environments. The behavior to avoid would be having poetry look outside the lockfile when one exists.

@merberich merberich added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Feb 16, 2022
@sneakers-the-rat
Copy link
Contributor

+1 experiencing this same thing trying to package something for x86_64 and armv7l (the raspberry pi). Lockfile either references packages from piwheels or pypi, but not both (as it should, choosing depending on architecture). Have tried to look through the source for awhile to try and contribute to this, but still can't quite figure out where this would be decided. If you give me a pointer I'll try to write a patch if wanted

@sneakers-the-rat
Copy link
Contributor

Relatedly: when I have piwheels set as a source repository, I would expect it to also add entries from pypi as fallbacks, at least the source distributions so wheels could be built.

Instead, given this:

[[tool.poetry.source]]
name = "piwheels"
url = "https://www.piwheels.org/simple"

I just get an installation error for packages when installing on x86_64 because only the arm6l and 6l wheels are listed in the lockfile, eg.

markupsafe = [
    {file = "MarkupSafe-2.1.0-cp37-cp37m-linux_armv6l.whl", hash = "sha256:39e62bbd6852fe4655201a2d334a23426ad519f80dbe81bd8079fb0cc4fe6a0f"},
    {file = "MarkupSafe-2.1.0-cp37-cp37m-linux_armv7l.whl", hash = "sha256:39e62bbd6852fe4655201a2d334a23426ad519f80dbe81bd8079fb0cc4fe6a0f"},
    {file = "MarkupSafe-2.1.0-cp39-cp39-linux_armv6l.whl", hash = "sha256:47cd11e5cbd1f7beb2c6324d7876707b738ed8499723c89f8eb46806a448cb56"},
    {file = "MarkupSafe-2.1.0-cp39-cp39-linux_armv7l.whl", hash = "sha256:47cd11e5cbd1f7beb2c6324d7876707b738ed8499723c89f8eb46806a448cb56"},
]

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Mar 8, 2022

OK I've gotten as far as getting the files and hashes for multiple repositories to be stored in poetry.lock, but still need to figure out how to make the installer add the additional repository urls as --extra-index-url.

So when choosing a version, we arrive at this line, which gets a complete version of the package including its files and hashes:

package = self._provider.complete_package(package)

This is after a single package is chosen despite there being multiple matching packages (eg. from having multiple matches on separate repositories)

package = packages[0]

So I found that if you add the files from the remaining packages like this:

version = self._provider.complete_package(version)
# get the file hashes for the other versions of the file from other repositories
if 'packages' in locals() and len(packages)>1:
    for _package in packages[1:]:
        # _package = self._provider.complete_package(_package)
        _package = self._provider._pool.package(
            _package.name,
            _package.version.text,
            repository=_package.source_reference)

        _package.files = [f for f in _package.files if f not in version.files]
        version._package.files.extend(_package.files)

then I end up with a lockfile like this, where I am correctly keeping wheels from PyPI and from piwheels (note the armv6l and v7l for the raspi)

cryptography = [
    {file = "cryptography-36.0.1-cp36-abi3-manylinux_2_24_x86_64.whl", hash = "sha256:94ae132f0e40fe48f310bba63f477f14a43116f05ddb69d6fa31e93f05848ae2"},
    {file = "cryptography-36.0.1-cp36-abi3-musllinux_1_1_aarch64.whl", hash = "sha256:7be0eec337359c155df191d6ae00a5e8bbb63933883f4f5dffc439dac5348c3f"},
    {file = "cryptography-36.0.1-cp36-abi3-musllinux_1_1_x86_64.whl", hash = "sha256:e0344c14c9cb89e76eb6a060e67980c9e35b3f36691e15e1b7a9e58a0a6c6dc3"},
    {file = "cryptography-36.0.1-cp36-abi3-win32.whl", hash = "sha256:4caa4b893d8fad33cf1964d3e51842cd78ba87401ab1d2e44556826df849a8ca"},
    {file = "cryptography-36.0.1-cp36-abi3-win_amd64.whl", hash = "sha256:391432971a66cfaf94b21c24ab465a4cc3e8bf4a939c1ca5c3e3a6e0abebdbcf"},
    {file = "cryptography-36.0.1-cp37-cp37m-linux_armv6l.whl", hash = "sha256:d36a95532c74adae3c11dd375652d042437417a766b2750835cc8bed2af32fa6"},
    {file = "cryptography-36.0.1-cp37-cp37m-linux_armv7l.whl", hash = "sha256:d36a95532c74adae3c11dd375652d042437417a766b2750835cc8bed2af32fa6"},
    {file = "cryptography-36.0.1-cp39-cp39-linux_armv6l.whl", hash = "sha256:77ddd77d3850a9fee13d3504332d6229b1c501268f5f8518d1ed4ba6fce773ee"},
    {file = "cryptography-36.0.1-cp39-cp39-linux_armv7l.whl", hash = "sha256:77ddd77d3850a9fee13d3504332d6229b1c501268f5f8518d1ed4ba6fce773ee"}
]

I think i'm being too general in my file adding step there, as I probably need to filter by matching versino, but since elsewhere all hashes are added to the install stage, then if the secondary repository was to be added as an --extra-index-url then that would solve the problem. ya?

@merberich
Copy link
Author

This definitely looks like promising direction, and would at least solve my problem. As long as all of the available dependencies for different platforms end up in the lockfile, that's scope for this bug.

Separately, and outside of scope (probably in a separate Issue), it might warrant investigation how we would select between plat-B in a setup like:

repo 1:

  • plat-A
  • plat-B

repo 2:

  • plat-B
  • plat-C

So the lockfile would end up with at least:

package = [
  { file = "repo-1-plat-A", hash = "..." },
  { file = "repo-2-plat-C", hash = "..." }
]

But the selection mechanism for which repo is preferred for plat-B would be my question. I think your implementation would just default to the first repository scanned. Is that always desired behavior?

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Mar 9, 2022

Very good questions. I think this is wrapped up in a bunch of issues that I'm seeing open, and tried to start a meta-discussion over here: #4137 (comment)

I think basically the same set of parameters that are true for package dependencies might generalize to sources (eg. use repo-1 for Plat-A (or even just a refinement of the existing specifications for source and markers), but I'm still pretty new and am getting a hang of the library -- hence wanting to start a discussion because this seems like it's part of a larger constellation of bugs :)

edit: I also think that the .lockfile should also save the source of a given file/hash combination, that makes sense to me (otherwise the strategy of the PipInstaller seems to be to add extra --extra-index-urls, where if source was specified then it would be possible to simplify that to pair a specific set of hashes with a single repository URL)

index_url = repository.authenticated_url
args += ["--index-url", index_url]
if (
self._pool.has_default()
and repository.name != self._pool.repositories[0].name
):
args += [
"--extra-index-url",
self._pool.repositories[0].authenticated_url,
]

@merberich
Copy link
Author

Oh wow, that linked comment walks through a bunch of other methods I ended up trying XD

Very glad this is getting looked at, thank you!

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Mar 10, 2022

to add to examples:

markupsafe = [
    {version="^2.1.0", markers="platform_machine!='x86_64'", source='piwheels'},
    {version="^2.1.0", markers="platform_machine=='x86_64'", source="pypi"}
]

results in

   0: Duplicate dependencies for markupsafe
   0: Merging requirements for markupsafe (>=2.1.0,<3.0.0)

even though they are definitely not duplicates.

edit: on 1.1.13

@neersighted
Copy link
Member

neersighted commented Oct 2, 2022

I've done a quick skim and haven't looked too deeply at the individual issues discussed here, as they are disparate. Many of the attempted uses of Poetry here are misconceptions (e.g. a URL dependency cannot augment an index package, and Poetry will not mix wheels from two different package sources (imagine what would happen to the Torch folks!)).

However, there are two example usages that either should work (as in, the code that enables them is buggy), or (more likely) we want to make work in the future, as they fit the architecture of Poetry:

From @merberich:

jaxlib = [
  { version = "0.1.71", markers = "platform_machine == 'x86_64'" },
  { url = "<valid-url-to-aarch64-wheel-host>", markers = "platform_machine == 'aarch64'" }
]

From @sneakers-the-rat:

markupsafe = [
    {version="^2.1.0", markers="platform_machine!='x86_64'", source='piwheels'},
    {version="^2.1.0", markers="platform_machine=='x86_64'", source="pypi"}
]

Essentially, the use of markers is critical to make it clear to Poetry what you want. The latter would also be enhanced by #5984 (comment) as we could make usage of piwheels much faster by allowing non-pip semantics.

Anyway, those two examples are what I would consider 'correct' given the design of Poetry. However, I have not tried to reproduce them on 1.2, or dug into what might be required to fix them (or even introduce this as a feature -- I'm not 100% sure the solver will support this currently, and it may be a lot of work).

Still, hopefully this helps put those who might be interested in contributing on the right track -- these examples should hopefully be possible one day 😄

@neersighted neersighted changed the title Poetry does not consider wheel architecture when adding to lockfile Multi-architecture wheel selection across disperate sources (with markers) Oct 2, 2022
@neersighted neersighted added kind/enhancement Not a bug or feature, but improves usability or performance status/needs-reproduction Issue needs a minimal reproduction to be confirmed and removed status/triage This issue needs to be triaged status/needs-reproduction Issue needs a minimal reproduction to be confirmed labels Oct 2, 2022
@radoering
Copy link
Member

The first example (version and url dependency) should already work with poetry 1.2.1. The second example (two version dependencies from different sources) should be fixed by #6679.

@neersighted neersighted removed the kind/bug Something isn't working as expected label Oct 2, 2022
@sneakers-the-rat
Copy link
Contributor

thanks y'all <3

Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/sources Releated to package sources/indexes/repositories kind/enhancement Not a bug or feature, but improves usability or performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants
@neersighted @sneakers-the-rat @merberich @radoering @togetherwithasteria and others