Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are subjects conjunctive or disjunctive? #292

Open
woodruffw opened this issue Oct 19, 2023 · 14 comments
Open

Are subjects conjunctive or disjunctive? #292

woodruffw opened this issue Oct 19, 2023 · 14 comments

Comments

@woodruffw
Copy link
Contributor

(Copying from a conversation on the Sigstore Slack)

To my understanding: an in-toto formatted attestation has a set of subjects, and a predicate. The subjects can be thought of as principals for the attestation, and the predicate is a collection of arbitrary metadata that can "refine" the statement about the subject(s). If this understanding is incorrect, please correct me!

This raises the question: is the predicate "conjunctive" over the subjects, or "disjunctive"? In other words, is attestation verification defined as:

(subject[0] AND predicate) AND (subject[1] AND predicate) AND ...

or:

(subject[0] AND predicate) OR (subject[1] AND predicate) OR ...

my intuition is that it's the former, but I couldn't find that stated clearly in the spec 🙂

CC @marcelamelara

@TomHennen
Copy link
Contributor

Hey William,

Can you provide a concrete example? I'm afraid I don't quite follow as I don't think about statements as being boolean logic (but I'm not really a math person :/).

IMO it's saying that every subject listed has the properties specified in the predicate.

@woodruffw
Copy link
Contributor Author

Can you provide a concrete example? I'm afraid I don't quite follow as I don't think about statements as being boolean logic (but I'm not really a math person :/).

Yep! Sorry for not doing so initially.

This example is a little contrived, but for an attestation of two packages with an predicate signaling that a package is uploaded after a given date:

subject:
  - name: pkg:foo
    digest:
      sha256: abcd...
  - name: pkg:bar
    digest:
      sha256: ef12...

predicateType: blah
predicate:
  uploaded-after: 2023-10-19

(please excuse my use of YAML)

From here: is this attestation valid if either pkg:foo or pkg:bar satisfies uploaded-after, or do both need to satisfy uploaded-after?

@adityasaky
Copy link
Member

Not a maintainer of this spec but I'm curious about "is this attestation valid" in your sentence.

In my mental model, an attestation is not "valid" in isolation. An attestation may be valid in a particular verification workflow (i.e., based on the policy) if it's signed by an authorized actor (declared in policy), contains claims that meet the policy, etc. In this approach, you'd be verifying pkg:foo or pkg:bar or both, so the validity of the attestation is strictly in the context of the artifact you're verifying.

The attestation in your example can be used to verify pkg:foo and pkg:bar, together or separately. But inherently, it's not "valid" or "invalid". I don't mean to be pedantic but I think this may be worth clarifying both here and in the spec? 😄

@woodruffw
Copy link
Contributor Author

Being pedantic in this context is good! I appreciate it.

Let me rephrase: instead of "valid or invalid," what is a set of subjects intended to communicate?

For example, contrasting to PKIX: X.509 certificates can contain SANs, which can specify one or more subject identities. Under RFC 5280 and CABF, each is a sufficient identity for verification purposes, meaning that they're effectively disjunctive: a verifier only needs to accept one to consider the certificate valid for the stated purpose(s).

Essentially, my concern is that potential end users (like myself) will attempt to communicate one thing by specifying multiple subjects, while other consumers of what I produce will interpret it in another (possibly insecure) way. The more general problem here resembles PKCS#7/CMS: if the specs themselves don't have strong opinions on these things, then users end up creating differentially exploitable behavior at the policy layer.

@adityasaky
Copy link
Member

The set of a subjects strictly communicates the resources the attestation applies to. So the claims contained in the predicate apply to each of the recorded subjects.

In the certificate example you shared, you're verifying the certificate itself. You do not verify an attestation for the purposes of verifying an attestation, instead you verify an artifact using 1+ attestations that record it (or related artifacts) as subjects and an explicit policy (such as an in-toto layout) declaring what attestation claims are valid.

Going back to the original example:

subject:
  - name: pkg:foo
    digest:
      sha256: abcd...
  - name: pkg:bar
    digest:
      sha256: ef12...

predicateType: blah
predicate:
  uploaded-after: 2023-10-19

Unlike the certificate example, you wouldn't have a verifier that looks at this attestation and determines whether to accept it based on foo and bar's characteristics. Instead, your verifier would be invoked for pkg:foo with some policy about when it must have been uploaded after. Similarly, your verifier would be invoked for pkg:bar with a similar policy. This attestation can be used in both verification scenarios.

Notice that here, while the attestation records a claim for two subjects, it may be valid in one case and invalid in another. The uploaded-after policy for foo and bar can differ, so the claim may be valid for one but not the other. You could also have an attestation with a different uploaded-after claim for the same resource. That's again where the policy steps in to determine which claim(s) to trust. Both these attestations, taken by themselves, are equally valid and invalid.

Does this help?

@woodruffw
Copy link
Contributor Author

In the certificate example you shared, you're verifying the certificate itself. You do not verify an attestation for the purposes of verifying an attestation, instead you verify an artifact using 1+ attestations that record it (or related artifacts) as subjects and an explicit policy (such as an in-toto layout) declaring what attestation claims are valid.

To be precise, you don't verify a certificate just for its' own sake either -- you do it to establish trust in a public key for subsequent use (e.g. TLS session initiation) 🙂. The analogy is potentially still wrong, but I want to make sure I understand why it's wrong!

Unlike the certificate example, you wouldn't have a verifier that looks at this attestation and determines whether to accept it based on foo and bar's characteristics. Instead, your verifier would be invoked for pkg:foo with some policy about when it must have been uploaded after. Similarly, your verifier would be invoked for pkg:bar with a similar policy. This attestation can be used in both verification scenarios.

This is helpful, thank you. I was thinking of both subjects being used at the same time in a single shot, whereas this makes it sound like a verifier (from in-toto's perspective) starts with a specific subject and attempts to match the predicates against it.

Given that, I think my underlying question (about conjunction and disjunction) is probably the wrong question to ask: the underlying use case I have in mind here is expressing a statement about a single thing (a Homebrew bottle), so conceptually there's only a single subject.

@TomHennen
Copy link
Contributor

@adityasaky said it all much better than I could.

The only thing I have to add is that the validation model might help clarify things here?

@woodruffw
Copy link
Contributor Author

The only thing I have to add is that the validation model might help clarify things here?

It does, although it looks like the output that goes into the policy engine includes matchedSubjects, i.e. more than one subject.

So the basic question about AND/OR remains, although it sounds like the ultimate answer is that it just isn't specified here (and instead assumes that the policy engine does the right thing for the user's needs).

@marcelamelara
Copy link
Contributor

To summarize and formulate an actionable outcome from this issue, it sounds like there are a couple clarifications we could make in the spec for in-toto Statements and the validation:

  • Make explicit the assumption that the predicate MAY apply to any or all subjects in a single Statement
  • State explicitly that the "validity" of an attestation wrt to each of its subjects is determined at policy verification time (based on a consumer's policy)

Anything to add, and any updates we may want to make to the validation model in this context?

@MarkLodato
Copy link
Contributor

@woodruffw I think I understand perfectly what you are asking, and the x509 analogy with SAN is apt IMO. Could you link to the specific section of the RFC that says SAN is a disjunction? I couldn't find it. We need not go into more detail on the analogy in this thread, but knowing how x509 defines it would be helpful prior art.

I think the right way to think about it is that the subject is a single collection of artifacts. Exactly how the predicate applies to this collection is defined by the predicateType. Taking a real-world example, https://slsa.dev/provenance/v1.0 says that the collection of artifacts subject were produced by a single process described in the predicate.

In terms of verification, a consumer usually just verifies a single artifact at a time. For example, to accept a binary with hash H at upload time, they might require there to exist an attestation whose subject contains H, predicateType is SLSA provenance, and predicate says build platform X and source repo Y. In this case, they would ignore all other entries of subject.

However, a consumer MAY rely on the fact that multiple artifacts appeared in a single subject. Two contrived examples:

  • A consumer has two files and requires them to have been built from the same build process, i.e. they both appear in the subject of a single SLSA provenance attestation.
  • A consumer has a binary named foo with sha256 hash 1234... and requires evidence that tests have passed. They might implement this by "chaining" the provenance using a process like:
    • Find a SLSA provenance attestation with a subject containing name = "foo" and digest.sha256 = "1234..." (plus some requirements on predicate).
    • In that same attestation, let X be the digest.sha256 of the entry in subject whose name = "foo.testlog".
    • Find some other "test result" attestation with a subject containing digest.sha256 = X (plus some requirements on the predicate).
    • ALLOW is such a pair of attestations exist, else DENY.

My interpretation of the spec is that both of the above are acceptable.

/cc @SantiagoTorres, since we had discussed this very topic back in the early days of the design, as I recall.

@adityasaky
Copy link
Member

I think I see the X.509 parallel (nb: my X.509 knowledge has a lot of gaps) if we consider "validity" to be applies-to, which is how @woodruffw clarified the question in an earlier response as well. As I see it, in the case of the cert, having established the cert can be trusted based on its signatures, we consider it valid if it applies to the subject we're verifying the cert for. In that case, the subject is indeed a disjunction. IMO, this is also exactly the case for an in-toto attestation's subject field, we evaluate if the attestation applies to the artifact we're verifying and select it for further checks against the policy. I think we also see here that it is impossible to be a conjunction because we don't know whether the policy includes any checks such that the claims hold for all the subjects. I'm not sure I've encountered a supply chain process (that'd be represented by a unique predicate type) that intrinsically requires the predicate's claims to apply to all subjects.

@MarkLodato I agree with your points on how attestations are typically produced and verified. On the testlog example, I see how a producer and consumer may agree to include that information but to me it reads as counter to the Provenance spec, because the testlog artifact is not generated by the build but rather a prior process. Without additional information, it'd appear as if the testlog file was produced by the build process and we have its provenance, which is not actually the case?

In the other example, the attestation still applies to both subjects individually. Only the policy requires them to exist together (which is how "consumer may rely..." would apply here?). The attestation's subject, without knowledge of the policy it's used with, would apply to each subject separately. Indeed, it may be used to verify a layout for artifact A that doesn't care about artifact B, and another layout for artifact B that does care about artifact A (i.e., it has a REQUIRE rule for each artifact).

The question is, does the statement layer spec need further clarification? Currently it reads:

Set of software artifacts that the attestation applies to. Each element represents a single software artifact. Each element MUST have digest set.

Should it be updated to indicate the attestation may be used to verify one or more artifacts and it entirely depends on the policy?

@woodruffw
Copy link
Contributor Author

@woodruffw I think I understand perfectly what you are asking, and the x509 analogy with SAN is apt IMO. Could you link to the specific section of the RFC that says SAN is a disjunction? I couldn't find it. We need not go into more detail on the analogy in this thread, but knowing how x509 defines it would be helpful prior art.

I don't believe this is specified in any RFC anywhere, but it's evidenced by how the Web PKI (and other PKIs) work: when you connect to example.com, you may be served with a leaf certificate that contains example.com and www.example.com (or any number of other SANs), but the only subject you're actually verifying against is the domain you're connecting to (example.com).

(If I was to standards-lawyer it, I would argue that "Subject Alternative Name" implies that the names in the SAN are disjunctive. But the "real" answer AFAICT is "it has to be disjunctive because that's the only way it works on the Internet.")

Should it be updated to indicate the attestation may be used to verify one or more artifacts and it entirely depends on the policy?

I would find this clarification helpful, personally 🙂

@MarkLodato
Copy link
Contributor

Sorry, my examples were really contrived. A better, real-world example is a TensorFlow SavedModel, which has multiple files on disk:

saved_model.pb
variables/variables.data-00000-of-00001
variables/variables.data-00001-of-00002
variables/variables.index

You (probably) don't want to verify each file individually, but instead want to verify it as a whole. The fact that it's broken up into multiple files is just an implementation detail. But because a set of files has no well-defined serialization (and thus hash), you need to either:

  • Define a convention for hashing the entire SavedModel and represent the model as a single entry in subject. IMO this is the "right" long-term solution but is not standardized yet.

  • Hash each file individually and represent the model as a collection of entries in subject. This is the use case I was referring to. To verify, you would check that each file on disk matches a corresponding entry in subject.

Does that help at all? I think we're basically saying the same thing, so I'm not sure it really matters.

@adityasaky
Copy link
Member

Yeah, I think we're all on the same page. I'll cede to the @in-toto/attestation-maintainers on the clarification to the spec. I think using this tensorflow example and pointing to how layouts can parse subjects in different contexts will be the best additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants