Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix fetching updated targets from TUF root (based off of #1921) #1932

Closed
wants to merge 9 commits into from

Conversation

vaikas
Copy link
Contributor

@vaikas vaikas commented May 30, 2022

Summary

This change refactors the cosign TUF client and hopefully aims to simplify the logic behind embedded TUF repository and targets, and the writeable on-disk/in-memory repository and targets. Roughly, I structured this so that the cosign TUF client contains (1) A client.LocalStore to hold TUF repository metadata updates and (2) A targetImpl to hold downloaded and cached target files.

Previously, the two were "out of sync" -- when starting with an embedded workflow, we would create an embedded root repository and an embedded targetImpl. However, the embedded targetImpl ONLY retrieved embedded targets, not the updated ones!

b, err := embeddedRootRepo.ReadFile(path.Join("repository", "targets", p))

Embedded workflows never used their updated targets, despite writing them to the underlying storage, and failed to retrieve new targets referenced by updated metadata.

Conceptually now, there are 4 main objects in the TUF client:

  1. An memoryCache targetImpl: A map that stores target files by name.
  2. A diskCache targetImpl: Stores targets on disk.
  3. An embeddedWrapper targetImpl: This wraps the underlying (memory, disk) cache. By default it Gets embedded targets. If any targets were downloaded and Set, then Get transfers to retrieving from the underlying cache.
  4. An embeddedLocalStore that wraps either a MemoryLocalStore or FileLocalStore. Similar to above, by default it gets embedded repo metadata, until any new metadata needed to be downloaded and set. Then any GetMeta operations get the cached metadata.

Other:

  • To make testing easier, interfaced the default embedded repo and default remote mirror.
  • Tested the addition of a new target in an update. The same test at HEAD fails to find the target.
  • I tested this against a binary of cosign that pointed to the default mirror as brokenv3's GCS bucket and it successfully found the new target fulcio_interemediate_v1.crt.pem

If this design looks good, I will continue to add testing and clean-up test code. It's a pain to manually write out TUF updates, so I'd like to unify those functions and also add testing for consistent snapshotting (note brokenv3 enabled that so I'm 90% sure that works with this fix).

Ticket Link

Fixes #1899

Release Note

* bug fix: Fixes bug in retrieving newly added verification material from TUF root.

asraa added 3 commits May 25, 2022 13:23
Signed-off-by: Asra Ali <asraa@google.com>

add comment

Signed-off-by: Asra Ali <asraa@google.com>

update

Signed-off-by: Asra Ali <asraa@google.com>

update

Signed-off-by: Asra Ali <asraa@google.com>

possible fix windows

Signed-off-by: Asra Ali <asraa@google.com>

lint

Signed-off-by: Asra Ali <asraa@google.com>

fix windows maybe

Signed-off-by: Asra Ali <asraa@google.com>

fix close

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>

update fix

Signed-off-by: Asra Ali <asraa@google.com>

update and add some debug

Signed-off-by: Asra Ali <asraa@google.com>

add debug

Signed-off-by: Asra Ali <asraa@google.com>

 no cache

Signed-off-by: Asra Ali <asraa@google.com>

remove debug

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@codecov-commenter
Copy link

codecov-commenter commented May 30, 2022

Codecov Report

Merging #1932 (2805823) into main (5f09c42) will decrease coverage by 0.03%.
The diff coverage is 57.25%.

@@            Coverage Diff             @@
##             main    #1932      +/-   ##
==========================================
- Coverage   34.01%   33.97%   -0.04%     
==========================================
  Files         153      153              
  Lines        9977     9974       -3     
==========================================
- Hits         3394     3389       -5     
- Misses       6202     6206       +4     
+ Partials      381      379       -2     
Impacted Files Coverage Δ
cmd/cosign/cli/fulcio/fulcioroots/fulcioroots.go 36.36% <0.00%> (ø)
pkg/cosign/tlog.go 29.12% <18.18%> (-0.59%) ⬇️
pkg/cosign/tuf/client.go 62.07% <66.33%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f09c42...2805823. Read the comment docs.

Makefile Outdated
@@ -118,7 +118,7 @@ cross:
golangci-lint:
rm -f $(GOLANGCI_LINT_BIN) || :
set -e ;\
GOBIN=$(GOLANGCI_LINT_DIR) go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.43.0 ;\
GOBIN=$(GOLANGCI_LINT_DIR) go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.46.0 ;\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can bump to 1.46.2

Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
@vaikas
Copy link
Contributor Author

vaikas commented May 30, 2022

Looks like the Windows tests are unhappy with this:

--- FAIL: TestGetRekorPubKeys (0.21s)
    tlog_test.go:25: Unexpected error calling GetRekorPubs, expected nil: getting trusted root: open repository\root.json: file does not exist
    tlog_test.go:28: expected 1 or more keys, got 0
Repository initialized
Staged snapshot.json metadata with expiration date: 2022-06-06 08:58:35 +0000 UTC
Staged timestamp.json metadata with expiration date: 2022-05-31 08:58:35 +0000 UTC
Committed successfully

Maybe a path separator issue? Seems to work on everything else just fine.

Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
@@ -380,6 +385,17 @@ func VerifyTLogEntry(ctx context.Context, rekorClient *client.Rekor, e *models.L
rekorPubKeys[keyID] = RekorPubKey{PubKey: pubFromAPI, Status: tuf.Active}
}

rekorPubKeysTuf, err := GetRekorPubs(ctx)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be cleaner to have all the override behaviour in one place, and move L371-386 to GetRekorPubs and have everything there. But, that would mean plumbing RekorClient through to GetRekorPubs, so wasn't sure if we wanted to do that. One thing that we might want to consider is plumbing the RekorClient into the ctx passed in and pull it from there so we wouldn't have to change the signature. I'd be happy to do that if we think that makes sense.

Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
hectorj2f
hectorj2f previously approved these changes May 31, 2022
Copy link
Contributor

@hectorj2f hectorj2f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
Signed-off-by: Ville Aikas <vaikas@chainguard.dev>
@dlorenc
Copy link
Member

dlorenc commented May 31, 2022

Nice work @vaikas and @asraa!

When @asraa looks we can just merge this one.

@vaikas vaikas changed the title Trust rekor pub key if so specified. fix: fix fetching updated targets from TUF root (based off of #1921) May 31, 2022
@vaikas
Copy link
Contributor Author

vaikas commented May 31, 2022

I cribbed the PR subject / description from #1921 here in case this is what we end up merging.

Copy link
Contributor

@znewman01 znewman01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for the time being, appears to fix the relevant bug.

I have some apprehensions about this whole chunk of code generally (that predate this PR), I'm going to file a cleanup bug to come back and address this because this code is pretty important and should be easy to understand/test.

// It should theoretically be safe to do this everywhere - but the files only seem to get mutated on Windows so
// let's only change them back there.
if runtime.GOOS == "windows" {
return bytes.ReplaceAll(b, []byte("\r\n"), []byte("\n")), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 410d6cf

@znewman01
Copy link
Contributor

Actually can we hold off a day on merging this? I think #1921 is pretty close

Copy link
Contributor

@asraa asraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that this doesn't solve the problem: Does GetRekorPubs still fail in the e2e test? It seems like the error in getting the TUF root Rekor public key is going "silent" now to allow the test to pass, so I just want to double check the logs and make sure that we could retrieve the TUF keys in that environment

@asraa
Copy link
Contributor

asraa commented May 31, 2022

Does GetRekorPubs still fail in the e2e test?

I don't see "unable to fetch" anymore, so I will assume it's stable for now -- but AI to track/add a test whether verification failure will cause a some path to not close the TUF object and cause the "resource unavailable" error.

@haydentherapper
Copy link
Contributor

This PR does a lot, and I don't want to submit this without submitting #1921 first. We can't rush this large PR through to unblock tests, there's a risk with breaking something else accidentally.

@asraa
Copy link
Contributor

asraa commented May 31, 2022

This PR does a lot, and I don't want to submit this without submitting #1921 first. We can't rush this large PR through to unblock tests, there's a risk with breaking something else accidentally.

Hey! FWIW they're the same PR, so #1921 incorporates the CI fixes from @vaikas

@dlorenc
Copy link
Member

dlorenc commented May 31, 2022

This PR does a lot, and I don't want to submit this without submitting #1921 first. We can't rush this large PR through to unblock tests, there's a risk with breaking something else accidentally.

Hey! FWIW they're the same PR, so #1921 incorporates the CI fixes from @vaikas

Yep - we only need to merge one of them. We do need to get one of them merged though as this is holding up a pretty large release.

@haydentherapper
Copy link
Contributor

Totally fair, but I also want to make sure we take our time reviewing the PR to not introduce other subtle bugs.

@inferno-chromium
Copy link

Some folks are concerned that this might be just hiding a test failure, let's take the time to review this in depth in the original PR #1921. @lukehinds @bobcallaway for another pair of eyes. What is the release timeline, we should be able to make that one.

@dlorenc
Copy link
Member

dlorenc commented May 31, 2022

We can close this one now, the commits got merged in with #1921.

FWIW they were always the same PR as @asraa noted, it was just done to expedite getting the tests fixed because this was blocking the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Investigate issue in TUF root upgrades
9 participants