Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: should Rekor canonicalize the Rekor entries it takes in? #1162

Open
znewman01 opened this issue Nov 2, 2022 · 16 comments
Open

Question: should Rekor canonicalize the Rekor entries it takes in? #1162

znewman01 opened this issue Nov 2, 2022 · 16 comments
Labels
enhancement New feature or request

Comments

@znewman01
Copy link

znewman01 commented Nov 2, 2022

Right now, Rekor signs the representation of the Rekor entry as-provided. Most Rekor entries are JSON, so there's no canonical encoding.

The current implementation of the Bundle format relies on reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp. However, this is a huge pain—one client might produce different JSON from another, which means that verifying is a pain.

Questions:

  • Should we be relying on the bytes of the Rekor entry getting passed in for verification? (that is, it would be in the Sigstore bundle; see Proposal: Include raw Rekor Bundle in TransparencyLogEntry protobuf-specs#9 )
    • This would solve the immediate problem but still feels like a hack.
  • Should we consider having Rekor canonicalize the JSON entry types? This is a can of worms because JSON canonicalization is a nightmare—there are many competing JSON canonicalization formats and some aren't actually JSON.
    • Would this be a breaking change? It's very possible that a client would assume that Rekor signs the bytes it's given.
  • Should we try to move later iterations of Rekor entry types to a different format that's more canonicalization-friendly (e.g. COSE)?

CC @codysoyland @bobcallaway

@znewman01 znewman01 added the enhancement New feature or request label Nov 2, 2022
@bobcallaway
Copy link
Member

Right now, Rekor signs the representation of the Rekor entry as-provided. Most Rekor entries are JSON, so there's no canonical encoding.

That is incorrect, rekor canonicalizes the entry before writing it to the log and after verifying the validity of the signature.

func CanonicalizeEntry(ctx context.Context, entry EntryImpl) ([]byte, error) {

The current implementation of the Bundle format relies on reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp. However, this is a huge pain—one client might produce different JSON from another, which means that verifying is a pain.

Questions:

  • Should we be relying on the bytes of the Rekor entry getting passed in for verification? (that is, it would be in the Sigstore bundle; stay tuned for a link to a protobuf-specs issue about this)

    • This would solve the immediate problem but still feels like a hack.
  • Should we consider having Rekor canonicalize the JSON entry types? This is a can of worms because JSON canonicalization is a nightmare—there are many competing JSON canonicalization formats and some aren't actually JSON.

    • Would this be a breaking change? It's very possible that a client would assume that Rekor signs the bytes it's given.
  • Should we try to move later iterations of Rekor entry types to a different format that's more canonicalization-friendly (e.g. COSE)?

CC @codysoyland @bobcallaway

@znewman01
Copy link
Author

Oh huh, I didn't realize that.

It seems like there is a canonicalization step, but it often involves using Go's .json.marshall. which isn't actually canonicalization. So I think my overall point stands—should we be canonicalizing the JSON as part of the Canonicalize step?

@bobcallaway
Copy link
Member

bobcallaway commented Nov 2, 2022

The type specific code does call json.Marshall, but is wrapped by this call:

return jsoncanonicalizer.Transform(canonicalEntry)

which uses github.com/cyberphone/json-canonicalization/go/src/webpki.org/jsoncanonicalizer

@bobcallaway
Copy link
Member

There are many other canonicalization topics to consider, including PKI artifacts (DER vs PEM encoding), PGP binary VS armored keys, etc.

@bobcallaway
Copy link
Member

Simplifying the input and output formats has been on my mind (triggered by much of the bundle dialog), so would love to collaborate on a doc / proto on this to help simplify (as well as to document this)

@asraa
Copy link
Contributor

asraa commented Nov 2, 2022

(that is, it would be in the Sigstore bundle; stay tuned for a link to a protobuf-specs issue about this)

Does this mean that Sigstore bundle would contain the canonicalized Rekor entry (and that way it can compute the correct entry body when validating the SET)? Honestly, I think that's the only way. Otherwise clients would need to recreate the canonicalization logic individually. Is that a really big issue? If you tamper with the bundle, expect it to be invalid.

reconstructing the Rekor entry type in order to check the signature in the SignedEntryTimestamp

Why not just store the Rekor entry type as is?

In https://github.com/kommendorkapten/cosign/blob/bundle_verification/specs/dotsigstore_bundle/verify.md#input, I do see:

Recreate the Rekor entry from the bundle and blob (if provided).

But I'd expect a sub-field of the bundle to be the Rekor entry payload itself. Honestly, that's a good reason for Rekor to directly serve you the bundle-stuff.

EDIT: Note that there are some signature-validation properties required. Validate the SET of the rekor entry in the bundle AND also validate that the signature matches the one in the bundle. The signature is what we care about for timestamp validation.

@codysoyland
Copy link
Member

Does this mean that Sigstore bundle would contain the canonicalized Rekor entry (and that way it can compute the correct entry body when validating the SET)? Honestly, I think that's the only way. Otherwise clients would need to recreate the canonicalization logic individually. Is that a really big issue? If you tamper with the bundle, expect it to be invalid.

I agree 100%. This week, I have been taking a stab at writing a bundle verifier following the spec in https://github.com/sigstore/protobuf-specs/ and I got SET verification working for a subset of my test bundles, but it's very error-prone. For example, the intoto v0.0.2 entry expects a user-provided hash over the DSSE (the hash field). This hash is not provided in the bundle so must be computed on the DSSE. This means that the Rekor entry must be created with a hash over a canonicalized DSSE, which is not enforced, so clients must agree to always hash the canonicalized DSSE. I will open an issue related to this in protobuf-specs.

@bobcallaway
Copy link
Member

I realized that the requirement for the user to compute a hash was present in intoto:0.0.2 when researching #1164 and this is not desirable behavior, as the server should be computing this.

@bobcallaway
Copy link
Member

bobcallaway commented Nov 3, 2022

Also, would appreciate @codysoyland @kommendorkapten @asraa @bdehamer feedback on this (as an example for a broader doc):

In trying to resolve #1164 and #1139, we'll need to add another version of the intoto type. I've started work on this based on the following draft schema, which I think is an improvement on both v001 and v002 in that it tries to more clearly segment what should be sent to Rekor VS what is written to the log and returned to users when fetching the entry:

  • proposedContent are the client-provided fields when uploading an entry into the log
    • envelope is specified as a string (to contain JSON) since depending on the server to marshal it in the exact same order isn't safe, and this allows Rekor (server-side) to compute a SHA256 digest over the string
    • publicKeys are base64 encoded values (since clients might send DER or PEM encoded, unless we want to restrict this)
  • envelopeHash, payloadHash, and signatures are the server-computed fields that get canonicalized and persisted into the log
    • the signature value should be copied directly from the envelope provided from the client, and not re-encoded server side
    • the publicKey value is a canonicalized version of the key
    • This mapping between the signature within the DSSE envelope and the public key provided by the client might be awkward for a client to manually recreate when trying to verify the bundle... thoughts?

@bdehamer
Copy link

bdehamer commented Nov 3, 2022

envelope is specified as a string (to contain JSON) since depending on the server to marshal it in the exact same order isn't safe, and this allows Rekor (server-side) to compute a SHA256 digest over the string

This would need to be done in conjunction w/ some sort of change in the bundle format too, right? At the moment, the envelope is not stored as a string in the bundle so it would be difficult to match an offline-computed hash to the server-computed one.

@znewman01
Copy link
Author

This would need to be done in conjunction w/ some sort of change in the bundle format too, right? At the moment, the envelope is not stored as a string in the bundle so it would be difficult to match an offline-computed hash to the server-computed one.

Yup, see sigstore/protobuf-specs#9

@kommendorkapten
Copy link
Member

kommendorkapten commented Nov 4, 2022

To answer the question if Rekor should canonicalize the entries first. I think yes, but in a much simpler and robust form than what it does today, as this has been proven hard to verify unless the exact Rekor bundle is kept, which I think is not the desired end-state.

I would suggest something like this:
The signature (SET) is performed over the PASETO PAE (Pre Auth Encoding) over the following fields:

  • integrated time (UNIX time, as a string)
  • log id
  • log index (as a string)
  • kind
  • version
  • signature over the artifact (base64 encoded)

As PASETO PAE only accepts string the signature has to be encoded. Just for reference, DSSE originally relied on PASETO PAE, but as there was always only two components (payload type and the payload) they rolled to their own, simpler PAE which is pure ASCII, but pretty much otherwise similar to PASETO. For a bit of analysis of that we can read this good blog post by Soatok.

When dealing with PAE like this, the order is significant, but I don't see this as an issue. This can be documented. The output of PAE is a byte stream. As the PAE encoding is very easy to reconstruct, I don't see a reason that we need to either return it or store it (we of course need to store and return the calculated signature (SET)).

I don't think we need to include the digest of the real payload, nor the key or certificate during PAE construction. I would argue that they are implicit via the signature over the artifact.

edit: clarified that we need to store and return SET.
edit2: clarified more.

@kommendorkapten
Copy link
Member

In trying to resolve #1164 and #1139, we'll need to add another version of the intoto type.

I think the proposed new schema makes sense, transfering the envelope as a string should simplify things.

This mapping between the signature within the DSSE envelope and the public key provided by the client might be awkward for a client to manually recreate when trying to verify the bundle

Maybe, and consider a scenario where the envelope has multiple signatures, but the client only have access to one key. In this scenario, the client would not be able to recreate the Rekor entry. Unless the keys are part of the response from Rekor, then they could be added to the sigstore bundle, similar to how the certificate chain are stored. If this is done, we must be very careful to *not" use this key material during verification of the artifact's signature, only for reconstructing the Rekor entry.

Until we have a better idea on how to improve the general canonicalization model for Rekor, I'm inclined to agree with @znewman01 and @codysoyland on including the unmodified Rekor entry into the bundle. I also believe this is a hack though :) This would require a robust validation test, as I believe there was a bug found by @codysoyland where the Rekor entry was verified but it was not verified that the entry matched the provided artifact. Requiring the client to reconstruct the entry would close this class of bugs.

@bobcallaway
Copy link
Member

Maybe, and consider a scenario where the envelope has multiple signatures, but the client only have access to one key. In this scenario, the client would not be able to recreate the Rekor entry. Unless the keys are part of the response from Rekor, then they could be added to the sigstore bundle, similar to how the certificate chain are stored. If this is done, we must be very careful to *not" use this key material during verification of the artifact's signature, only for reconstructing the Rekor entry.

The keys are part of the canonicalized entry (see https://github.com/bobcallaway/rekor/blob/10d63e565bf035791d6794ef8ba6a41ea013bff9/pkg/types/intoto/v0.0.3/intoto_v0_0_3_schema.json#L43)

@kommendorkapten
Copy link
Member

The keys are part of the canonicalized entry

Yes, but they need to be delivered to the client too, right? So the client can reconstruct the Rekor entry (I'm assuming the client does not have access to the Rekor entry). The sigstore bundle can not carry this information as it's written today. Or am I mixing things up here?

@asraa
Copy link
Contributor

asraa commented Nov 4, 2022

). The sigstore bundle can not carry this information as it's written today. Or am I mixing things up here?

Right: I think so. I think it seems like:

  1. The Sigstore bundle right now does NOT contain the rekor entry payload. So clients may not be able to reconstruct the total entry if multiple sigs because the rekor entry contains all of them and maybe they only have one that's relevant. They also have difficulty canonicalizing.
  2. The Sigstore bundle COULD contain the whole Rekor entry canonicalized. Then clients will need to do the additional check to make sure the Rekor entry applies to the artifact -> this could be hard.
  3. The Sigstore bundle COULD be simplified to only have an easy to form "signature" payload like mentioned above. We still need some logic around what counts as a "signature". Just the signature payload inside a DSSE envelope signatures[x].sig? For a JAR signed object, the whole signature file? I think we could do this. It's still complicated, but less than (1) EDIT: AND if this happens in conjunction with ripping out types or simplifying the number of types allowed, then this would be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants