Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance v1.0: initial draft #525

Merged
merged 47 commits into from
Jan 20, 2023
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
b20437c
WIP
MarkLodato Oct 20, 2022
2a3eb61
WIP: finished proto for v1.0
MarkLodato Oct 21, 2022
ad7e823
WIP: inputArtifacts, parameters
MarkLodato Oct 21, 2022
cdecf52
WIP
MarkLodato Oct 25, 2022
2ef3659
WIP: apply feedback
MarkLodato Oct 27, 2022
e7dd1e0
WIP: replace proto extension with Markdown link
MarkLodato Oct 27, 2022
c3089de
Update example to use latest version of proto
MarkLodato Oct 28, 2022
25c78c3
artifacts: go back to map, uri
MarkLodato Oct 28, 2022
1015166
WIP: topLevelInputs and buildDependencies
MarkLodato Oct 31, 2022
861d844
WIP: make examples more realistic
MarkLodato Oct 31, 2022
f499fad
WIP: add cue file
MarkLodato Oct 31, 2022
cec3785
WIP: add Tekton example and TODO
MarkLodato Oct 31, 2022
82a3c28
WIP: merge everything into markdown file
MarkLodato Nov 1, 2022
df6bec9
WIP: remove extra divs
MarkLodato Nov 1, 2022
668f41a
WIP: rewrite intro
MarkLodato Nov 1, 2022
5a96d3a
WIP: rename Artifact to ArtifactReference
MarkLodato Nov 1, 2022
c397e76
Use headings in change history
MarkLodato Nov 1, 2022
f1adaf3
Make draft URL work
MarkLodato Nov 1, 2022
f5a4b06
fix lint errors
MarkLodato Nov 1, 2022
b661ee1
Address PR feedback
MarkLodato Nov 4, 2022
5aab59b
Add builderDependencies
MarkLodato Nov 7, 2022
78ae06f
WIP: lowercase purl, move TODO
MarkLodato Nov 9, 2022
11459e6
WIP
MarkLodato Nov 11, 2022
5d1e791
WIP: refactor - external vs system parameters
MarkLodato Nov 14, 2022
32f9d2f
Merge branch 'main' into provenance-refactor
MarkLodato Nov 15, 2022
d044c3d
WIP: fix typo in URL
MarkLodato Nov 15, 2022
3ae85a9
WIP: add todo
MarkLodato Nov 15, 2022
e60f742
Use the generic SLSA generator for the example.
MarkLodato Nov 15, 2022
479b7b3
Make builder.version a map
MarkLodato Nov 30, 2022
9ca7346
Replace .artifacts[name] with [name].artifact.
MarkLodato Nov 30, 2022
d63f6ba
Replace map with array of name/value pairs
MarkLodato Nov 30, 2022
03a0660
Revert "Replace map with array of name/value pairs"
MarkLodato Dec 8, 2022
0ca0d69
Update provenance build model
MarkLodato Jan 4, 2023
8b32358
Merge branch 'main' into provenance-refactor
MarkLodato Jan 4, 2023
622c0b5
Disable lint for blank lines betwen blockqutoes
MarkLodato Jan 4, 2023
77d5814
proto nits: consistent required/optional syntax
MarkLodato Jan 4, 2023
a18326b
More iteration on model
MarkLodato Jan 5, 2023
3b728b4
Move github actions to separate file; revise text
MarkLodato Jan 6, 2023
a4494fb
Only include major version in provenance URL.
MarkLodato Jan 6, 2023
31094f7
WIP: move to Markdown
MarkLodato Jan 18, 2023
472ba94
Finish Markdown conversion, add other param types
MarkLodato Jan 18, 2023
fdce758
address comments
MarkLodato Jan 18, 2023
268a64d
Merge branch 'main' into provenance-refactor
MarkLodato Jan 18, 2023
aba878e
fix mdlint
MarkLodato Jan 18, 2023
1987abf
add TODO about creating other build types
MarkLodato Jan 19, 2023
40aeb77
Fix typos in provenance v1.0
MarkLodato Jan 20, 2023
4210074
drop .md from link
MarkLodato Jan 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ MD025:
# Disable checking of YAML frontmatter.
front_matter_title: ""

# MD028/no-blanks-blockquote - Blank line inside blockquote
MD028: false

# MD029/ol-prefix - Ordered list item prefix
MD029:
# List style
Expand Down
10 changes: 10 additions & 0 deletions docs/_data/versions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ provenance:
name: Version 0.1
v0.2:
name: Version 0.2
v1.0:
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
name: Version 1.0 (DRAFT)
draft: true
current: v0.2

verification_summary:
Expand All @@ -38,3 +41,10 @@ verification_summary:
v0.2:
name: Version 0.2
current: v0.2

github-actions-workflow:
versions:
v0.1:
name: Version 0.1 (DRAFT)
draft: true
current: v0.1
44 changes: 44 additions & 0 deletions docs/github-actions-workflow/v0.1/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"predicateType": "https://slsa.dev/provenance/v1?draft",
"predicate": {
"buildDefinition": {
"buildType": "https://slsa.dev/github-actions-workflow/v0.1?draft",
"externalParameters": {
"inputs_build_id": { "value": "123456768" },
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
"inputs_deploy_target": { "value": "deployment_sys_1a" },
"inputs_perform_deploy": { "value": "true" },
"source": {
"artifact": {
"uri": "git+https://github.com/octocat/hello-world@refs/heads/main",
"digest": { "sha1": "c27d339ee6075c1f744c5d4b200f7901aad2c369" }
}
},
"workflow_path": { "value": ".github/workflow/release.yml" }
Copy link
Contributor

@laurentsimon laurentsimon Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the GitHub runner that runs a GitHub workflow, these inputs will be the workflow_dispatch inputs since it's the defined interface for a user to provide inputs.

For a re-usable workflow, the inputs will be the inputs passed to the re-usable workflow interface (e.g., https://github.com/slsa-framework/slsa-github-generator/blob/main/.github/workflows/generator_generic_slsa3.yml#L31), correct?

Do you want @asraa and I to try to come up with a format for a re-usable workflow builder? Those are used by orgs as well, e.g., https://github.com/sigstore/sigstore/blob/main/.github/workflows/reusable-release.yml#L5-L30

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

I was wondering about reusable workflows. Can/should this format be expanded to work for both, or is the model sufficiently different that we should have a separate format? (I thought about it for a while but couldn't make up my mind.)

Copy link
Contributor

@laurentsimon laurentsimon Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workflow_path is the field that is a bit awkward: the GitHub runner runs the workflow so it is an input, but a re-usable workflow runs some code based on the input provided by the caller, and the workflow_path is not an input. It's fine to keep it, but it has a slightly different meaning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we generalize the ParameterValue as proposed here https://github.com/slsa-framework/slsa/pull/525/files#r1066962916

Would it make sense to have two "namespaces", invocation (where the workflow_path would reside) and an input where the inputs to the reusable workflow could be in. Or does this just complicate things too much?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invocation could help, yes. But I thought runDetails was the new 'invocation'?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the migration plan it's mixed, some data from v0.2 invocation goes into externalParameters and some into systemParemeters. Only the invocationId goes into runDetails.
Assuming the migration plan is updated :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some data from v0.2 invocation goes into externalParameters and some into systemParemeters. Only the invocationId goes into runDetails.

When I was thinking of the reusbale workflow case, the workflow inputs seemed to make the most sense in externalParameters (since they're top-level controlled). If say, the reusable workflow does something internally derived from those inputs that are run-dependent (transforming the inputs, passing them to an action), those belong to me in runDetails and not systemParameters since that carries trusted builder provided info. Since there is not defined for them in runDetails, it's either (1) it's missing or (2) not important -- I lean on (2) right now as "implicit" in the buildType, that is, the user can recreate from the top-level inputs and the builder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workflow_path is the field that is a bit awkward: the GitHub runner runs the workflow so it is an input, but a re-usable workflow runs some code based on the input provided by the caller, and the workflow_path is not an input. It's fine to keep it, but it has a slightly different meaning.

Really source + workflow_path combined specifies which workflow is being run, while inputs specifies the extra parameters passed in (plus vars - see other thread). I think that's the same for both top-level and reusable workflows. The difference is that a reusable workflow also gets the repo + path of the top-level workflow that triggered the reusable workflow. Given that it has two repo + path pairs, I'm not sure what to name them such that it makes sense for both cases.

In other words, top-level workflow has:

  • source + workflow_path specifies which workflow was run
  • inputs specifies the workflow_dispatch inputs
  • vars specifies the variables

While reusable workflows have:

  • One repo + path for the reusable workflow (i.e. the thing being executed)
  • Another repo + path for the top-level workflow (i.e. the thing that triggered the reusable workflow; this gets passed in via the context so it's a parameter.)
  • inputs of the reusable workflow + top-level workflow (mashed together!)
  • vars (same as top-level)

It's awkward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread OK to resolve, or is there a specific request here (hard to follow thread)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, yes

},
"systemParameters": {
"github_actor": { "value": "MarkLodato" },
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
"github_event_name": { "value": "workflow_dispatch" }
},
"resolvedDependencies": [
{
"uri": "https://github.com/actions/virtual-environments/releases/tag/ubuntu20/20220515.1"
}
]
},
"runDetails": {
"builder": {
"id": "https://github.com/slsa-framework/slsa-github-generator/.github/workflows/builder_go_slsa3.yml@refs/tags/v0.0.1"
},
"metadata": {
"invocationId": "https://github.com/octocat/hello-world/actions/runs/1536140711/attempts/1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love having the full URL in here!

"startedOn": "2023-01-01T12:34:56Z"
}
}
},
"subject": [
{
"name": "_",
"digest": { "sha256": "fe4fe40ac7250263c5dbe1cf3138912f3f416140aa248637a60d65fe22c47da4" }
}
]
}
106 changes: 106 additions & 0 deletions docs/github-actions-workflow/v0.1/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: "Build Type: GitHub Actions Workflow"
layout: standard
hero_text: |
A [SLSA Provenance](../../provenance/v1.0) `buildType` that describes the
execution of a GitHub Actions workflow.
---

## Description

This `buildType` describes the execution of a top-level [GitHub Actions]
workflow (as a whole).

Note: This type is not meant to describe execution of subsets of the top-level
workflow, such as an action, a job, or a reusable workflow.

[GitHub Actions]: https://docs.github.com/en/actions

## Build Definition

### External parameters

All external parameters are REQUIRED.

<table>
<tr><th>Parameter<th>Type<th>Description

<tr id="inputs"><td><code>inputs_*</code><td>string<td>

The [inputs context], with each `inputs.<name>` renamed to `inputs_<name>`.
Every non-empty input value MUST be recorded. Empty values SHOULD be omitted.

Note: Only `workflow_dispatch` events and reusable workflows have inputs.

<tr id="source"><td><code>source</code><td>artifact<td>

The git repository containing the top-level workflow YAML file.

This can be computed from the [github context] using
`"git+" + github.server_url + "/" + github.repository + "@" + github.ref`.

<tr id="workflow_path"><td><code>workflow_path</code><td>string<td>

The path to the workflow YAML file within `source`.

Note: this cannot be computed directly from the [github context]: the
`github.workflow` context field only provides the *name* of the workflow, not
the path. See [getEntryPoint] for one possible implementation.

[getEntryPoint]: https://github.com/slsa-framework/slsa-github-generator/blob/ae7e58c315b65aa92b9440d5ce25d795845b3b2a/slsa/buildtype.go#L94-L135

</table>

[github context]: https://docs.github.com/en/actions/learn-github-actions/contexts#github-context
[inputs context]: https://docs.github.com/en/actions/learn-github-actions/contexts#inputs-context

### System parameters

> TODO: None of these are really "parameters", per se, but rather metadata
> about the build. Perhaps they should go in `runDetails` instead? The problem
> is that we don't have an appropriate field for it currently.

All system parameters are OPTIONAL. Each corresponds to the [github context]
value of the same name, with `github.<name>` renamed to `github_<name>`. The
list only includes parameters that are likely to have an effect on the build and
that are not already captured elsewhere.

| Parameter | Type | Description |
| -------------------- | -------- | ----------- |
| `github_actor` | string | The username of the user that triggered the initial workflow run. |
| `github_event_name` | string | The name of the event that triggered the workflow run. |

> TODO: What about `actor_id`, `repository_id`, and `repository_owner_id`? Those
> are not part of the context so they're harder to describe, and the repository
> ones should arguably go on the `source` paramater rather than be here.
>
> Also `base_ref` and `head_ref` are similar in that they are annotations about
> `source` rather than a proper parameter.

### Resolved dependencies

The resolved dependencies MAY contain any artifacts known to be input to the
workflow, such as the specific versions of the virtual environments used.

## Run details

### Metadata

The `invocationId` SHOULD be set to `github.server_url + "/actions/runs/" +
github.run_id + "/attempts/" + github.run_attempt`.

## Example

```json
{% include_relative example.json %}
```

Note: The `builder.id` in the example assumes that the build runs under
[slsa-github-generator](https://github.com/slsa-framework/slsa-github-generator).
If GitHub itself generated the provenance, the `id` would be different.

## Version history

### v0.1

Initial version
51 changes: 51 additions & 0 deletions docs/provenance/v1.0.cue
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
// Standard attestation fields:
"_type": "https://in-toto.io/Statement/v0.1",
"subject": [...],

// Predicate:
"predicateType": "https://slsa.dev/provenance/v1.0?draft",
"predicate": {
"buildDefinition": {
"buildType": string,
"externalParameters": { [string]: #ParameterValue },
"systemParameters": { [string]: #ParameterValue },
"resolvedDependencies": [ ...#ArtifactReference ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to map the discussion in #508 back to this PR and the updated requirements. We had recommended adding fields for evidence for the "scripted build" and "build as code" requirements. Even though the requirements have been updated, we still want to make sure we include a field in the provenance where the producer can attest to having used a specific build script and/or service addressing the Producer requirements, even if they do not want to reveal the build service or script in the provenance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion, I think we do not want such a field. There is an out-of-band process for certifying that a given builder.id implies that various requirements have been met. It is not a property per build.

Copy link
Contributor

@marcelamelara marcelamelara Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the builder.id would establish the binding between the provenance and the infrastructure provider questionnaire? However, presumably, in the case of a third-party verifier service who certified the build level, does this mean that the provenance consumer needs to go separately query the verifier service for the certification?

My main concern is that consumers of the provenance won't be able to (easily) link/connect the provenance to the evidence (i.e., questionnaire) for the claimed build level, so I'm still wondering why the provenance might not provide a hint for where to find the certification of the builder.id, especially in the case when the consumer doesn't rely on the third-party verifier.

Copy link
Contributor

@marcelamelara marcelamelara Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put differently, does this current approach prevent attackers from producing "fake" provenance as builder.id that a consumer/verifier would deem valid, based on the questionnaire? Without a binding in the provenance to the authenticated questionnaire (or a verifier's certification), I don't see how this would be detected by a verifier.

Copy link
Contributor

@laurentsimon laurentsimon Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the concern is that a builder.id inside the provenance can be forged by anyone, and a builder is defined not just by its ID, but by its keys. So you're asking how a verifier knows how to verify, i.e., not just the builder.id verification but also the signature verification.
If there is an official tool like the slsa-verifier that supports the builder, users should know about it. If there is none, what do we do? Do we expect corporate builders to provide their own verifying tool...? Or shall we encourage folks to add support in the same tool (e.g., slsa-verifier) and work on a plug-in support slsa-framework/slsa-verifier#275 to scale that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be just me but I had always assumed that a lot of the verification assumed you in the least trusted the identity of the signature that it's reporting accurate information. I had also assumed the certification would be an external certification something like: Alice certified on this date that Bob's builder service is SLSA 3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @laurentsimon said, the questionnaire is tied to signature verification plus an optional builder.id.

  • By "signature verification" this really means some sort of public key infrastructure that identifies the attestor, i.e. who signed the attestation. It could be a simple public key, or an X.509 cert chain like Sigstore, or something like SPIFFE.

  • The builder.id exists to allow a single attestor identity to represent two different logical build systems, with two different SLSA properties. For example, suppose Awesome Builder meets SLSA Build L3 but allows users to disable isolation between builds, which allows tampering between builds but does not allow forging of the provenance. Awesome Builder could represent this in two ways:

    • (a) As two different attestor identities, e.g. two different public keys.
    • (b) As two different builder.id values, signed by the same attestor identity.

I'm not 100% sure if (b) is needed. If not, we could drop builder.id which would simplify things a lot, I think. But I hesitate to do that until we have confidence it's not needed. Maybe just a bigger warning around it for now?

If there is an official tool like the slsa-verifier that supports the builder, users should know about it. If there is none, what do we do? Do we expect corporate builders to provide their own verifying tool...? Or shall we encourage folks to add support in the same tool (e.g., slsa-verifier) and work on a plug-in support slsa-framework/slsa-verifier#275 to scale that?

Sorry, I'm not sure who this comment was addressed to, or if you'd like to see anything in this PR about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That second bullet point I think we just need to be clear about. I'm thinking about implementation issues where having the same builder use different identities in different contexts could complicate things. I think (b) probably makes the most sense though it could also lead to a compromised builder claiming a higher SLSA level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I think that the verification flow was not fully clear to me since we added in the infrastructure questionnaire step. There is a dependency between the artifact/provenance and the infrastructure on which they were produced, but what we're really saying is that the verification of this dependency can be handled OOB because the infrastructure for a given builder.id isn't expected to change very often. And because the builder.id maps to a specific signing key for the infrastructure, that's all a provenance consumer needs to verify the build infrastructure.

As @mlieberman8 said, the documentation should be clear on all these assumptions and expectations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread OK to resolve, or is there a specific request here (hard to follow thread)?

},
"runDetails": {
"builder": {
"id": string,
"version": string,
"builderDependencies": [ ...#ArtifactReference ],
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
},
"metadata": {
"invocationId": string,
"startedOn": #Timestamp,
"finishedOn": #Timestamp,
},
"byproducts": [ ...#ArtifactReference ],
}
}
}

#ParameterValue: {
"artifact": #ArtifactReference
} | {
"value": string
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
}

#ArtifactReference: {
"uri": string,
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
"digest": {
"sha256": string,
"sha512": string,
"sha1": string,
// TODO: list the other standard algorithms
[string]: string,
},
"localName": string,
"downloadLocation": string,
"mediaType": string,
}

#Timestamp: string // <YYYY>-<MM>-<DD>T<hh>:<mm>:<ss>Z
MarkLodato marked this conversation as resolved.
Show resolved Hide resolved
Loading