Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package Pants as ~scie. #18145

Closed
jsirois opened this issue Feb 1, 2023 · 6 comments
Closed

Package Pants as ~scie. #18145

jsirois opened this issue Feb 1, 2023 · 6 comments

Comments

@jsirois
Copy link
Contributor

jsirois commented Feb 1, 2023

Currently, and for most of its life, Pants has been distributed as one or more Python projects distributed on PyPI. Pants v2 has settled on the pantsbuild.pants project to deliver the pants console script and the pantsbuild.pants.testutils test infrastructure project for writing plugins whose code lives outside the https://github/com/pantsbuild/pants repo. The ./pants script has been checked into Pants-using repos and used to install pantsbuild.pants in a venv and forward to its pants console script as the primary means of installing and running the right version of Pants in the right way in a repo. Recently, migration from the ./pants script to the scie-pants binary has begun.

The scie-pants binary works with the pantsbuild.pants PyPI project to maintain the status quo for installing and executing current and older Pants version but it also provides an avenue to alter:

  1. Pants primary distribution channel (PyPI).
  2. Pants secondary / utility distribution channel via our self-paid-for S3 bucket at binaries.pantsbuild.org.
  3. Pants assembly as a primary Python distribution with embedded rust C-extension + a transitive dependency set.

It's desirable to alter 1 - our primary PyPI distribution - to take load off PyPI's storage and transfer costs. We're currently in the top 50 users of storage space (31st using 36.5GB as of today, but see: https://pypi.org/stats/). Its desirable to alter 2 - our self-paid-for S3 bucket for similar reasons but for the Pantsbuild non-profit organization itself.

If we alter 3 - Pants assembly - to a single or few file format, we can cut down on flaky transfers (one or few connections tends to be better in this regard) and gain easier insight into what makes up a Pants release. The natural alternative here is distribution as a PEX. Pants already does release a PEX via (GitHub Releases](https://github.com/pantsbuild/pants/releases), but it seems to be a little used facility. This is probably partially due to the current PEX being very large (it's multi-platform), partially to the https://pantsbuild.org docs not recommending it front and center and partially due to it being non-optimal - it adds latency over a pure venv.

The scie-pants launcher could be altered to handle Pants released as a PEX on GitHub releases since it implements complex install logic efficiently and isolated from the hot path of a Pants run. One way of doing this would be to release a Pants.pex per supported platform (today: {Linux, Mac} x {x86_64, aarch64} along with a sha256 checksum of each of these PEXes. This would obviate the need for either PyPI (1) releases or pantsbuild.binaries.org (2) releases and - currently - cost the Pantsbuild organization nothing since GitHub releases are currently free without limits for OSS projects. To be clear, there is no magic here, we'd be getting away with murder and costing the world just as much in storage and transfer costs; we'd just be letting Microsoft foot the bill until we're forced to reconsider.

If we transition to only releasing Pants as a PEX that is installed by scie-pants we can eliminate the pantsbuild.pants wheels and just directly add the Pants code to the PEX via -D src/python. This eliminates some BUILD jank partially tracked in #7344. It also allows us to eventually ship the Pants native client as a separate binary with sha256 checksums via GitHub Releases and use it as the hot-path executable scie-pants invokes.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 1, 2023

This issue is framed pretty prescriptively; so extra care should be taken to question assumptions. The main points are to simplify and robustify distribution and consumption of Pants while taking financial and operational burden away from OSS entities and push that towards OSS benefactors like Microsoft.

@jsirois
Copy link
Contributor Author

jsirois commented Feb 1, 2023

The idea proposed here is releasing Pants as a PEX on GitHub releases. Later, also releasing the Pants client binary as well. These releases could be consumed in 2 obvious ways:

  1. The scie-pants binary grows new install logic to handle these releases. It downloads and checksums the PEX / native client and creates a venv from the PEX for hot path execution, etc.
  2. The Pants project publishes a scie containing the install and delegation logic that would otherwise have to live in scie-pants.

The second approach is appealing for separation of concerns. The scie-pants binary would only need to know a cut-over version for PyPI wheel -> GitHub Releases scie. Pants could then evolve its install and execution logic separately. The scie-pants binary then just handles invoking the right Pants version and ~nothing more. It stays simple and very robust while allowing Pants to add complexity and nuance on a per-release basis without worry of breaks cross-project. Towards this end, Pants need not even release a scie, it could just release a lift manifest + PEX (+ eventually native client). The lift manifest would point to the PEX (and eventually native client), locking in URL / size / checksum, as well as install and execution logic for that release and the scie-pants binary could just download and verify the checksum of the lift manifest and use it to locally assemble a Pants scie to delegate to in the hot path. Here a-scie/jump#10 would be useful and allow scie-pants to not even build a scie at all, but just "execute" the Pants lift manifest.

@stuhood
Copy link
Sponsor Member

stuhood commented Feb 6, 2023

Whether shipping a PEX to Github or a scie to github, this would also involve pants_requirements growing the ability to load itself from Pants' own runtime environment... or perhaps scie-pants could execute a pre-extraction step which continued to use find-links, but from within a nce directory.

@kaos
Copy link
Member

kaos commented Sep 20, 2023

@thejcannon has done a ton of work in this area. Maybe worth summarizing what is remaining to do here.

@thejcannon
Copy link
Member

Not sure about work, but when thinking about what we gain I think the big victory would be diminished now to:

  • scie-pants has to know what version of python for what version of pants. If/when we bump to 3.11, scie-pants has to know. If pants was a scie, it'd already pin its own python

There's likely other secondary benefits but I think that's the one big one (and is less beneficial since the actual support isn't really that challenging)

@jsirois
Copy link
Contributor Author

jsirois commented Sep 21, 2023

I'll just close this as obsolete. I don't think it was used as guidance for the migration and the migration is nearly complete.

@jsirois jsirois closed this as completed Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants