Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap] Implement PEP 740 #15871

Open
7 of 18 tasks
woodruffw opened this issue Apr 26, 2024 · 24 comments
Open
7 of 18 tasks

[Roadmap] Implement PEP 740 #15871

woodruffw opened this issue Apr 26, 2024 · 24 comments

Comments

@woodruffw
Copy link
Member

woodruffw commented Apr 26, 2024

PEP 740 is in a final but not yet approved state. This issue is intended to lay out the dependencies/subcomponents of its implementing, including things that can be done in a preliminary manner.

Index side

Uploader/publish side

CC @di @webknjaz @facutuesca for visibility

@webknjaz
Copy link
Member

I've gone ahead and filed a tracking issue in pip-tools as well: jazzband/pip-tools#2080.

@webknjaz
Copy link
Member

  • Concretely, this means gh-action-pypi-publish should use sigstore-python to sign the attestation payload defined in PEP 740

With the exciting changes suggested in pypa/gh-action-pypi-publish#230, it looks like we'll be able to leverage bits of composite actions and call the action even (if it makes sense to do so, as an alternative to using a Python lib within the container like the rest of stuff).

@webknjaz
Copy link
Member

  • Concretely, this means gh-action-pypi-publish should use sigstore-python to sign the attestation payload defined in PEP 740

Should this be done by the uploader, though? My initial feeling was that this should be done right after the job building the artifacts. Though, perhaps not by pypa/build (because of the privilege elevation concerns I often complain about). Are we going to treat the artifacts that the uploader gets as coming from trusted sources? Should we be emphasizing the security considerations and teaching the users more about the least privilege principle?

@woodruffw
Copy link
Member Author

Should this be done by the uploader, though?

Hmm -- I suppose it doesn't have to be, although for targeting publish provenance (and getting provenance generation exposed to the largest number of users) I think it's the most straightforward path.

Long term, PEP 740 is designed to enable multiple attestations: the building step(s) can generate one, as can the publishing step. But I believe getting that into a workable state will require a decent bit more coordination, so I want to target an MVP of "the artifact is signed by the same identity that pushed it to the index" 😅

@webknjaz
Copy link
Member

@woodruffw fair! I just wanted to make sure we're on the same page.

@sethmlarson
Copy link
Contributor

@webknjaz Glad you're thinking about build provenance already, I'm interested in tackling build provenance for Python too 🚀

These initial attestations are similar to NPM's publish provenance so is best done by the uploader. If the attestation can show "this artifact was uploaded by this workflow which matches a Trusted Publisher configuration for the project" we can already build some interesting things like verified GitHub URLs, exposing which artifacts have Trusted Publishers configured to users, monitoring for changes in provenance, etc

@webknjaz
Copy link
Member

@sethmlarson fair enough! I remember similar ideas being brought up during the initial private beta of trusted publishing. It's nice to have some more or less specified goals recorded and synchronized, though, which is why I posted my thoughts :)

Looking back, I'm glad that I encouraged implementing it in my action from the very beginning, months before it's gone GA so everybody could start using it right away with almost no changes. So I'm hoping that the Sigstore integration happens in a similar fashion, enabling any early adopters to start uploading signatures with zero changes to their workflows (if they already use trusted publishing, they have the OIDC access set up).

@di
Copy link
Member

di commented Apr 29, 2024

Should this be done by the uploader, though? My initial feeling was that this should be done right after the job building the artifacts.

I think your initial feeling is valid: since it's not guaranteed that the workflow using the publish action is also the workflow that built the artifacts, this isn't necessarily "build provenance".

That said, having an attestation ('publish provenance') from the uploader in the absence of any other attestation being provided to it would still be very valuable here I think, especially for projects that aren't using Trusted Publishing yet.

@facutuesca
Copy link
Contributor

@woodruffw What behavior do we want when uploading attestations to a file already in PyPI? Currently, the behavior for duplicate files is either stop the upload process and return OK (if the new file is the same as the existing one), or fail (if the files don't match).
How do we want to deal with those cases when the request also includes attestations?

@woodruffw
Copy link
Member Author

What behavior do we want when uploading attestations to a file already in PyPI? Currently, the behavior for duplicate files is either stop the upload process and return OK (if the new file is the same as the existing one), or fail (if the files don't match).

For the MVP, I think we want to skip if the distribution file already exists, for consistency with the existing upload API behavior.

Longer term, this probably falls under the "Upload 2.0 API" PEP -- we'll want a way to augment existing uploaded distributions with new attestations. But that's out of scope for now, IMO 🙂

@webknjaz
Copy link
Member

FYI: there's some new developments within GitHub, it seems — pypa/gh-action-pypi-publish#236 (comment).

@webknjaz
Copy link
Member

* [ ]  Update PyPUG and adjacent documentation to remove references to `gh-action-sigstore-python`

I think this one is somewhat separate — it signs built artifacts (in that snippet) while the pypi-publish integration signs whatever's uploaded (no matter if it was built within the workflow or downloaded).

@woodruffw
Copy link
Member Author

Something that @haydentherapper raised: Trusted Publishing supports private repos.

This isn't a "disclosure" issue at the moment since we don't expose the Trusted Publisher metadata anywhere, but with PEP 740 we will.

Some points:

  1. No matter what, we will continue supporting Trusted Publishing from private repos.
  2. Conceptually, there's nothing stopping someone from generating a private attestation, although its semantics are unclear (since a publish or build attestation from a private repo can't be really analyzed by third parties, besides a basic "GitHub promises this is right" check.)
  3. There's a (fair) PII concern about leaking a private repository name.

Some engineering details:

  • Both GitLab and GitHub provide a visibility type claim, which either PyPI or the publishing workflow could filter on in terms of including attestations. It should probably be PyPI though, since PyPI is already handling the JWT claims.

@woodruffw
Copy link
Member Author

Per NPM's docs, it looks like they require a public repo for provenance:

Ensure your package.json is configured with a public repository that matches (case-sensitive) where you are publishing with provenance from.

This doesn't quite mesh with PyPI's model since each distribution file has its own metadata, but in theory we could check them all for consistency. But that is potentially more brittle than just rejecting attestations if the accompanying OIDC JWT doesn't have the right visibility.

@woodruffw
Copy link
Member Author

From discussion in the Sigstore Slack: one route forward here is to reject attestations from private repositories, unless the project explicitly is explicitly configured (either on PyPI or the project metadata) to allow them. This eliminates the risk of a user inadvertently leaking a private repository name (or similar) through their Trusted Publisher configuration and subsequent publish attestations.

If configured on PyPI, this could be something like a new checkbox under /manage/<project>/settings, e.g. "allow attestations from private Trusted Publishers."

If configured via project metadata, things get a little hairier (since the metadata is duplicated once per distribution, and there's no free-form metadata at the moment). So doing it via PyPI is probably easier.

@woodruffw
Copy link
Member Author

@facutuesca and I had a conversation about the persistence side of this the other day; summarizing:

  • For the time being, we should treat publish provenance as a baseline: in other words: PyPI can accept SLSA Provenance as well if provided, but every attestation set should come with a Publish Provenance attestation at the minimum (since they're all coming from Trusted Publishing)
  • With Publish Provenance as a baseline, we can add a File <-> OIDCPublisher relationship that tracks the Trusted Publisher that was verified to have published the given File
  • That relationship then allows us to bootstrap verified URLs, etc.

Some related thoughts:

  • Having a File <-> OIDCPublisher relationship means that OIDCPublishers are never deleted once one or more files are published against them, even if no Project has that OIDCPublisher registered as an active publisher anymore. This may or may not be an issue; just highlighting as a "lifetime" change.
  • We need to determine the appropriate "policy" for rendering things like provenance markers on the project-level page (i.e. /p/<project>): in addition to verified URLs, we might want a little checkmark similar to NPM's that says the project is provenanced. For that, it probably make senses to require all files in latest release to have publish attestations, even if the data layer underneath doesn't prohibit "heterogenous" uploads.

@facutuesca
Copy link
Contributor

facutuesca commented Jun 20, 2024

Having a File <-> OIDCPublisher relationship means that OIDCPublishers are never deleted once one or more files are published against them, even if no Project has that OIDCPublisher registered as an active publisher anymore

This might add some complexity, since right now there is no way of modelling an "inactive" publisher: if a publisher exists in the DB, then it's active. Do we want to add a new column to the table, an is_active field that becomes False if the user deletes the corresponding Trusted Publisher? (more specifically, it becomes False iff the user deletes it and there exists at least one file that was uploaded with a publish attestation signed by that publisher)

@woodruffw
Copy link
Member Author

This might be problematic, since right now there is no way of modelling an "inactive" publisher: if a publisher exists in the DB, then it's active.

Hmm, I hadn't thought of this. A new is_active column makes sense to me, or alternatively we could define an is_active helper that checks whether len(projects) != 0. I think the latter would work approximately as well, without requiring a DB migration, although I might have missed something 🙂

@sethmlarson
Copy link
Contributor

With Publish Provenance as a baseline, we can add a File <-> OIDCPublisher relationship that tracks the Trusted Publisher that was verified to have published the given File

I was imagining a separate model for a "verified attestation" that's separate from the publisher, due to the publisher potentially changing over time.

@woodruffw
Copy link
Member Author

I was imagining a separate model for a "verified attestation" that's separate from the publisher, due to the publisher potentially changing over time.

I think there are two separate pieces of state here:

  1. The OIDCPublisher that uploaded the File, which we'd track with the relationship noted above
  2. Zero or more attestations associated with the File, which we'll need to track with a new model (AttestationFile? Attestation?) and will also need a File relationship

(2) may also need a relationship to the "thing" that can verify it, which for now will always be an OIDCPublisher (but in the future could be an email address, or public key, or something else).

There might be a better way to model all of that, of course. But we'll definitely need both the verified attestation and the thing that verified it, i.e. the OIDCPublisher 🙂

@facutuesca
Copy link
Contributor

After further discussion with @woodruffw, we think that it might be better to verify a release's URLs once (at the time the release is created, during the first file's upload), and store the fact that the URL was verified in the release metadata. This implies:

  1. When creating a release, if the first uploaded file includes a publish attestation that passes verification, and the URL of the Trusted Publisher matches the release metadata, we mark those URLs as "verified" in the release metadata
  2. Further files uploaded to that same release won't affect the release metadata or the verified status.
  3. If the first file uploaded does not include a publish attestation (or the URL does not match the release metadata), then the release metadata URLs are marked as unverified. Future files uploaded to this same release will not affect the verified status.

In short, whether a release's metadata URLs are verified or not depends exclusively on the first file uploaded (and its corresponding attestation).

The decision comes from the fact that PEP-740 allows uploading releases where only some of the distributions have attestations, so whether a release has "verified" URLs can be ambiguous if some of the files have publish attestations and others don't. Instead, we choose to use the first file uploaded (the one that creates the release) as the source of truth for verifying the release's URLs.

Also, by verifying the URL once and storing the result, we skip having the File <-> OIDCPublisher relationship, and the complexities that come with having to keep "inactive" OIDCPublishers forever.

@sethmlarson
Copy link
Contributor

sethmlarson commented Jun 20, 2024

@facutuesca @woodruffw Is there a risk in checking and updating the verified status of URLs for each artifact? It was noted somewhere that many projects use multiple build infrastructures to upload artifacts, it would be unfortunate to lose the "verified" status of metadata based on a race of build infra.

@woodruffw
Copy link
Member Author

Is there a risk in checking and updating the verified status of URLs for each artifact?

I don't think there's a risk per se, but it's unfortunately not currently straightforward to do with Warehouse's current File <-> Release relationship: the Release contains the "reference" metadata for rendering purposes (including URLs), and it's only created once (from the first File for the version being uploaded). So, in effect, there's really only set of URLs that PyPI "knows" for the entire Release, regardless of differences between individual Files.

TL;DR: to check each artifact, I think we'd need a stronger consistency guarantee about the metadata recorded in the Release model vs. each linked File, and (IIRC) that was excluded as a constraint from PEP 740 due to concerns that it would hamper adoption.

(The alternative here would be to consider the Release model's URLs mutable/subject to changes in verification state, e.g. consider them unverified at first but then mark them as verified if any File under Release can be verified. The main "risk" there is that it'll be complicated to engineer, and might result in a "chimera" render on PyPI, where some metadata comes from file A and some from file B.)

@di
Copy link
Member

di commented Jul 3, 2024

Hmm, I hadn't thought of this. A new is_active column makes sense to me, or alternatively we could define an is_active helper that checks whether len(projects) != 0. I think the latter would work approximately as well, without requiring a DB migration, although I might have missed something

I don't really love this, as it requires us to make sure we check is_active everywhere we want to use a publisher.

I think I'd prefer storing details on the event instead, which would allow us to persist the details even if the publisher is deleted.

(The alternative here would be to consider the Release model's URLs mutable/subject to changes in verification state

I think we should probably do that eventually, but it's OK to maintain the status quo for now and only consider the first file published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants