Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: When verifying, only use data after it is considered acceptable #2482

Closed
wants to merge 43 commits into from

Conversation

mtrmac
Copy link
Contributor

@mtrmac mtrmac commented Nov 25, 2022

Summary

pkg/cosign.verifyInternal keeps growing conditions (A) at the end of the function, long after already relying (B) on data that is constrained by these conditions. It effectively makes the outcome of B indeterminate until A happens; that’s hard to track, and A might not happen at all if the function returns early. This makes me nervous: personally for me, it makes following the operation and rules of the code much harder than I think is necessary.

So, this is a move somewhat towards, but not completely, the verification organization suggested in #1648 (comment).

Most data in verifyInternal, and closest utility functions, is now stored with variables named untrusted…. Only after all checks on that data are done, it may be stored in a variable named acceptable….

Testing relies on existing unit tests; no new tests were added.

See individual commit messages for details. Most of the commits are trivial variable renames.

Behavior changes:

  • Certificate expiry is now not tested if we don’t rely on the certificate at all (if the user provides a trusted public key)
  • If we do rely on the certificate, we now always test expiry against some time. Previously that didn’t happen if there was no RFC 3161 timestamp and Rekor log presence was opted out. (I.e. this should allow using ordinary X.509 CA certificates without relying on a timestamp / Rekor at all, the way the TLS ecosystem does.)
  • If there is a RFC 3161 timestamp, and Rekor log presence is processed but not required, we only require the certificate to be valid as of the timestamp, not as of the current time.

Basic summary of the ordering changes:

  • The verification flow now:
    • First ensures timestamp / Rekor presence; that obtains a trusted signature creation timestamp, if any
    • Then, if necessary, validates the certificate (requires the timestamp from previous step)
    • Cryptographically verifies the signature/payload
    • Only at the very end, processes the payload
  • VerifyBundle now first ensures that the entry was actually stored on Rekor, and only afterwards uses it to see whether it matches the supposedly-logged signature.

Release Note

Certificate expiry checking was made a bit more accurate.

Documentation

N/A.

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2022

Codecov Report

Merging #2482 (deebfc5) into main (09f023f) will increase coverage by 0.18%.
The diff coverage is 69.32%.

@@            Coverage Diff             @@
##             main    #2482      +/-   ##
==========================================
+ Coverage   30.14%   30.33%   +0.18%     
==========================================
  Files         139      139              
  Lines        8595     8624      +29     
==========================================
+ Hits         2591     2616      +25     
- Misses       5629     5632       +3     
- Partials      375      376       +1     
Impacted Files Coverage Δ
cmd/cosign/cli/verify/verify.go 19.36% <0.00%> (ø)
cmd/cosign/cli/verify/verify_attestation.go 0.00% <0.00%> (ø)
pkg/cosign/verify.go 42.74% <70.70%> (+1.43%) ⬆️
cmd/cosign/cli/sign/sign.go 15.84% <100.00%> (ø)
cmd/cosign/cli/verify/verify_blob.go 49.56% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@znewman01 znewman01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for this! Very good idea overall and I'm strongly in favor.

I'd like even better to enforce this at the type system layer, rather than simply via naming. You might be able to do some cool encapsulation tricks. (Don't make it possible to instantiate a Foo struct from outside the module; instead, you must create an UnverifiedFoo and then turn it into a Foo.)

But in the shorter term—is there any chance you could break this up into 2 or 3 smaller PRs? I find once I'm reviewing more than 100–200 lines of code my eyes gloss over and I find it difficult to actually make sure the changes are correct.

@haydentherapper
Copy link
Contributor

+1, for PRs that both refactor and modify behavior, we should separate those into separate PRs, especially for critical parts of the code base like verify. Very minimal test changes for refactors, with more unit tests for behavior changes.

If we do rely on the certificate, we now always test expiry against some time. Previously that didn’t happen if there was no RFC 3161 timestamp and Rekor log presence was opted out. (I.e. this should allow using ordinary X.509 CA certificates without relying on a timestamp / Rekor at all, the way the TLS ecosystem does.)

Is this correct? We default to the current time when checking expiration time. If no bundle or timestamp is provided, we use the current time. There should be no case when cert expiration is not checked, but let me know if you found one.

If there is a RFC 3161 timestamp, and Rekor log presence is processed but not required, we only require the certificate to be valid as of the timestamp, not as of the current time.

This sounds like a good change! Note that if both a bundle and 3161 time stamp are present, we want to check the times for both. Maybe we make this configurable in the future, but for now, I’d like to continue checking both, because we do expect both to be valid.

@asraa
Copy link
Contributor

asraa commented Nov 28, 2022

YAY! Thank you -- I'll give a start reviewing soon. Also, I'm catching up on some lost context over the holiday week so apologies if I'm behind

Is this correct? We default to the current time when checking expiration time. If no bundle or timestamp is provided, we use the current time. There should be no case when cert expiration is not checked, but let me know if you found one.

We introduced a few as we were making the cosign 2.0 changes, but I ran out of time before the holidays to clean them up. Looking at main right now, you could get around if, let's say, you did skip-tlog-verify without TSACerts specified to validate an RFC 3161 timestamp. We fixed up the later ones, but I like that this one is a lot more explicit about the check.

@mtrmac
Copy link
Contributor Author

mtrmac commented Nov 28, 2022

Hey, thanks for this! Very good idea overall and I'm strongly in favor.

I'd like even better to enforce this at the type system layer, rather than simply via naming. You might be able to do some cool encapsulation tricks. (Don't make it possible to instantiate a Foo struct from outside the module; instead, you must create an UnverifiedFoo and then turn it into a Foo.)

Go doesn’t have a strong enough visibility system for that, other than using separate subpackages — and I think that would spread the core verification logic over too many separate files, making the end resulting logic hard to follow. (That’s especially the case for certificates, which really have several mostly-independent correctness criteria, like the various subject restrictions that can be configured.)

There’s certainly much more that could be done (e.g. maybe the oci.Signature type should name all fields Untrusted…). I wanted to get the core verification logic ordered, and I can’t immediately spend significantly more time on a style proposal beyond that.

But in the shorter term—is there any chance you could break this up into 2 or 3 smaller PRs? I find once I'm reviewing more than 100–200 lines of code my eyes gloss over and I find it difficult to actually make sure the changes are correct.

Sure, the set of commits should be correct at every point, so It can split in arbitrary places.

@asraa
Copy link
Contributor

asraa commented Nov 28, 2022

Sure, the set of commits should be correct at every point, so It can split in arbitrary places.

@mtrmac could we start with the timestamp check changes in the verification flow and do some renaming as follow-ups once there's no functional changes?

I'm looking at
3b21d42
eb1f060
0a343bc

Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking great! i've just reviewing the verify main command.

return false, fmt.Errorf("offline verification failed")
}

if co.RekorClient != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check was always very confusing to me. It seems like if you aren't skipping TLOG verification, then if there's no rekor client, somehow you can still pass without checking the tlog.

do you think maybe it should be an error that co.RekorClient is nil when we have to do tlog verification and never got a bundle?

Copy link
Contributor Author

@mtrmac mtrmac Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the API is such that external callers can call VerifyImageSignature and get (bundleVerified = false, err = nil). So this PR doesn’t change that, it’s a large enough set of changes already.

I agree that it seems dangerous and undesirable; IMHO external users of pkg/cosign should just provide a policy, and not be relied upon to correctly enforce other conditions on top. Conceptually, I think “change the API philosophy” and “change the implementation philosophy” are fairly separate conversations — OTOH, admittedly, the question of the right timestamp to use for certificate expiry means the two can’t separated that cleanly.

@@ -154,7 +154,7 @@ func (c *VerifyCommand) Exec(ctx context.Context, images []string) (err error) {
if err != nil {
return fmt.Errorf("getting Fulcio roots: %w", err)
}
co.IntermediateCerts, err = fulcio.GetIntermediates()
co.UntrustedIntermediateCerts, err = fulcio.GetIntermediates()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep the same name, IntermediateCerts?

I would prefer we don't used "Untrusted" in the name. This term comes from openssl from what I've seen, and I personally don't think it's correct to only use intermediate certificates for chain building ("untrusted"). Intermediates should be trusted, or at least come from a trusted source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is confusing. These particular ones are coming from a trusted source, so I can agree to see it named the same. The ones found on the registry through CertChain are untrusted intermediates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see co.UntrustedIntermediateCerts = untrustedPool. So, in that sense, I think changing the name of CheckOpts to UntrustedIntermediateCerts has been a success exactly because the trusted nature of those coming from fulcio.GetIntermediates is insufficient to make the field trusted.

(I’m also not sure what a “trusted intermediate” means. Is that a root of trust, i.e. a non-intermediate root? Or an intermediate that we still require to be correctly signed by a root of trust, i.e. something where we don’t actually require any trust?)


Actually, what I really think is that the caller-provided policy should be immutable and should not be modified to hold single-instance data within verify.go at all. Then we might even more precisely differentiate between

  • completely-untrusted signature-carried data
  • externally Fulcio-provided data (acceptable if the connection to Fulcio is trusted)
  • completely locally-provided 100%-trusted policy

and potentially make different decisions based on those details. That would be a different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a core library function, I think we should not be opinionated if the certificates are trusted or untrusted. For example, we don't know where the root certificate came from either when the checkopts is populated. I don't see a reason to distinguish between untrusted and trusted in the CheckOpts. Internally, we can name variables differently based on their source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not be opinionated if the certificates are trusted or untrusted.

I… don’t know what that would mean.

The verification code must be certain about which data is a part of the root of trust that is, by necessity, assumed valid, and which data is part of the external attack surface that is, by default, suspect. At the very least these two categories are 100% distinct. There might be even more categories in between (e.g. the intermediate certificates might be more trusted than the underlying signatures, but they are still potentially a part of the attack surface).

(And then the API should help the caller of the API express its assumptions clearly, and to help expose any incorrect assumptions the caller might be making.

From that point of view, calling the certificates “untrusted” is good for the caller, because it tells the caller the caller doesn’t need to make much effort at all to ensure validity of the data. “A trusted system is one whose failure would break a security policy”.)


I don’t care at all about the individual naming of the options. If you don’t want the field to be named Untrusted, and to instead have a CheckOpts.{TrustedSigVerifier,TrustedRootCerts}, sure, that might be even more explicit. If you think the distinction between trusted and untrusted data is sufficient to express in comments and doesn’t need to be in field names, that’s not my preference but I can live with that just fine.

But I read the above as suggesting that it doesn’t make a difference whether the data is trusted or not, and I just don’t know what to do with that. Probably I’m completely misunderstanding your point.

@haydentherapper
Copy link
Contributor

My 2cents: I see multiple types of refactoring in this PR - Name changes, reordering, etc. Given the criticality of this package, I think we should scope down the PR even more. I see some of the commits are behavior changes too - These definitely should be in their own PR, especially since I don't see tests that accompany them (which suggests we don't have sufficient testing in Cosign, which we don't :) ).

@mtrmac
Copy link
Contributor Author

mtrmac commented Dec 2, 2022

Right, I have rebased this first to have a baseline from which I can extract a smaller subset.

Then I intend to keep this PR to be a catch-all that merges branches of the smaller subsets, keeping a record of what is merged/being reviewed / outstanding (and dropping the rejected parts). If/as time allows.

@haydentherapper
Copy link
Contributor

@mtrmac Do you think it'd be possible to have a PR just with the behavior changes first? We wanted to cut a release candidate soon for cosign 2.0, and one of the behavior changes in particular is a blocker for release imo:

If we do rely on the certificate, we now always test expiry against some time. Previously that didn’t happen if there was no RFC 3161 timestamp and Rekor log presence was opted out. (I.e. this should allow using ordinary X.509 CA certificates without relying on a timestamp / Rekor at all, the way the TLS ecosystem does.)

This should result in a verification failure. Reproduced with cosign verify-blob --certificate blob.cert --signature blob.sig --insecure-skip-tlog-verify blob, using an expired cert, it says it succeeds.

fyi @priyawadhwa

@mtrmac
Copy link
Contributor Author

mtrmac commented Dec 2, 2022

@mtrmac could we start with the timestamp check changes in the verification flow and do some renaming as follow-ups once there's no functional changes?

I'm looking at 3b21d42 eb1f060 0a343bc


@mtrmac Do you think it'd be possible to have a PR just with the behavior changes first? We wanted to cut a release candidate soon for cosign 2.0, and one of the behavior changes in particular is a blocker for release imo:

If we do rely on the certificate, we now always test expiry against some time. Previously that didn’t happen if there was no RFC 3161 timestamp and Rekor log presence was opted out. (I.e. this should allow using ordinary X.509 CA certificates without relying on a timestamp / Rekor at all, the way the TLS ecosystem does.)

This should result in a verification failure. Reproduced with cosign verify-blob --certificate blob.cert --signature blob.sig --insecure-skip-tlog-verify blob, using an expired cert, it says it succeeds.

#2504: as it happens, I was planning to add that one on top of the previous request.

If you want to urgently backport the other change, feel free to; I’m afraid I’m done for the week.

@mtrmac
Copy link
Contributor Author

mtrmac commented Dec 2, 2022

This PR is now based on top of #2504 (and includes an explicit merge commit). So I’m marking it as a draft to direct reviews and conversations to the #2504 subset for now.

@mtrmac mtrmac marked this pull request as draft December 2, 2022 20:24
@haydentherapper haydentherapper mentioned this pull request Dec 5, 2022
3 tasks
asraa pushed a commit that referenced this pull request Dec 7, 2022
Summary
This is a subset of #2482 as requested in #2482 (comment) (a bit larger than that):

User-observable: verify RFC 3161 timestamps, and Rekor data, first, before processing certificates. Conceptually we need to do that because certificate processing requires a timestamp (although that’s not yet what this PR finishes doing). This can affect user-observable error messages, which can now complain about untrusted signed Rekor data instead of untrusted signed certificates.
Behavior change: If there is a certificate, always verify it against some timestamp (in particular, the current time, if there is neither a RFC 3161 timestamp nor any Rekor data. (This is more than was asked — I included it because the current behavior is such a surprise for callers, and because always verifying actually makes the code simpler. I can split that into some future PR, and add an explicit “are we dealing with Rekor” condition instead, to preserve the current behavior.)
Code consolidation: verifyInternal now has one place (although not yet the correct place) to do certificate expiry checks, instead of that happening in VerifyRFC3161Timestamp
See #2482 for previous discussion, and the general rationale of only using data after it is verified (which is not what this PR really does, but it’s a prerequisite).

Release Note
cosign verify-blob --cert-chain … --certificate … --skip-tlog-verify now requires the leaf certificate to not be expired.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Co-authored-by: Hayden B <hblauzvern@google.com>
@haydentherapper
Copy link
Contributor

We've merged the initial PR. Note that there have been a lot of changes in verify.go, so you'll likely have conflicts.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... to track the trust status of the Rekor bundle

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
And validate the public key first, the signature second, just for
consistency with the flow of trust (although that really doesn't
mattere here, we want all of (key, signature, payload) to match).

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
…61 timestamp

Reorganize the certificate expiry checks, so that we check against
the accepted timestamps if any, and only fall back to current time
if there is no data.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Use certWithUnverifiedExpiry , which we already have,
and which we care about.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Finally, only create the verifier based on an actually acceptable certificate,
instead of creating it first and then hoping not to forget to validate preconditions.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... to document the individual stages.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac
Copy link
Contributor Author

mtrmac commented Dec 7, 2022

Sure, rebased.

Do you have a preference for a subsection to propose next? Some options:

  • Pure variable renames
  • The certificate validation functions, leading up to the split of validateCertIssuanceAndSubject from ValidateAndUnpackCert
  • The reordering of VerifyBundle (probably smallest)

@haydentherapper
Copy link
Contributor

If you want quick reviews, we can start with the smallest, reordering VerifyBundle, then var renames?

@github-actions
Copy link

github-actions bot commented Jan 7, 2023

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions
Copy link

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants