Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement sget blob downloads #1190

Closed
wants to merge 3 commits into from

Conversation

shibumi
Copy link
Contributor

@shibumi shibumi commented Dec 11, 2021

Summary

This commit adds a "blob" subcommand to the sget tool.
The "blob" subcommand provides an easy and convenient way to download
and verify binary large objects (BLOBs).

Breaking Changes: The sget image download command can be now found as subcommand: sget image

Example Output:

❯ ./sget blob https://github.com/shibumi/mnemonic/releases/download/v0.3.2/mnemonic_0.3.2_linux_arm64.tar.gz --signature https://github.com/shibumi/mnemonic/releases/download/v0.3.2/mnemonic_0.3.2_linux_arm64.tar.gz.sig > mnemonic_0.3.2_linux_arm64.tar.gz
Certificate is trusted by Fulcio Root CA
Email: []
URI: https://github.com/shibumi/mnemonic/.github/workflows/goreleaser.yml@refs/tags/v0.3.2
Issuer:  https://token.actions.githubusercontent.com
Verified OK
tlog entry verified with uuid: "7afc36d7e17268f0bf3b1f6e415e304aa96795b8c49963588e15a20225c3fa17" index: 932675

Ticket Link

None

Release Note

breaking change: sget has obtained two new subcommands: image and blob. Blob allows to download and verify BLOBs..

@shibumi shibumi marked this pull request as draft December 11, 2021 10:16
@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

CC: @dlorenc @mattmoor

@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

@mattmoor I modified the sget tool a little bit. With this patch it would obtain support for downloading every artifact. IMO, sget should be able to handle both images and blobs.

@shibumi shibumi changed the title feat: implement sget blob downloads WIP: feat: implement sget blob downloads Dec 11, 2021
@shibumi shibumi force-pushed the shibumi/sget-expansion branch 2 times, most recently from 475c171 to bc27fb8 Compare December 11, 2021 11:15
@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

I have added an e2e-test for the new sget feature... do we want one for sget image as well?

@shibumi shibumi force-pushed the shibumi/sget-expansion branch 2 times, most recently from fb0805e to 845e604 Compare December 11, 2021 11:31
Currently, golangci does not print filenames and line numbers due to some issue with the golangci log format. See also: golangci/golangci-lint-action#119 (comment)

Signed-off-by: Christian Rebischke <chris@shibumi.dev>
This commit adds a "blob" subcommand to the sget tool.
The "blob" subcommand provides an easy and convenient way to download
and verify binary large objects (BLOBs).

Breaking Changes: The sget image download command can be now found as subcommand: sget image

Signed-off-by: Christian Rebischke <chris@shibumi.dev>
@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

6ce0edd fixes an issue with golangci. It does not print the filename + line number on failure.. this is a known bug and tracked here: golangci/golangci-lint-action#119 (comment)

This commit adds a "blob" subcommand to the sget tool.
The "blob" subcommand provides an easy and convenient way to download
and verify binary large objects (BLOBs).

Breaking Changes: The sget image download command can be now found as subcommand: sget image

Signed-off-by: Christian Rebischke <chris@shibumi.dev>
@shibumi shibumi marked this pull request as ready for review December 11, 2021 12:23
@shibumi shibumi changed the title WIP: feat: implement sget blob downloads feat: implement sget blob downloads Dec 11, 2021
@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

I think that's ready for review now :)

}

// TODO(shibumi): we might want to export this function or add this to our package?!
func extractCerts(e *models.LogEntryAnon) ([]*x509.Certificate, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to export this function and add it to another package and then re-import the function here?
Right now we are having the code twice (once in the sget code and once in the cosign code).

}

// TODO(shibumi): we might want to export this function or add this to our package?!
func isb64(data []byte) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to export this function and add it to another package and then re-import the function here?
Right now we are having the code twice (once in the sget code and once in the cosign code).

@@ -143,3 +143,4 @@ jobs:
with:
# Required: the version of golangci-lint is required and must be specified without patch version: we always use the latest patch version.
version: latest
args: "--out-${NO_FUTURE}format colored-line-number"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Line fixes the golangci linter... without that line golangci will show errors, but will not show "where" (filename + line numbers) in the github actions workflow, due to golangci's weird log format (Why has Github Workflows a problem with logs starting with "::" at all?! Please god.. don't let this be some other weird issue, we just had log4j a few days ago XD

- uses: actions/setup-go@v2
with:
go-version: '1.17.x'
- name: build cosign and check
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to expand this test with more flag combinations, but I am not sure if this is really necessary.

We should also test the image subcommand.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

I have added a few comments. Let me know what you think :)

Also, I would like to underline that the current implementation only supports auto-lookups via rekor log. IMO, this should be the default behavior, because it's the best way to ensure that the lookup happens via the tlog.

Nevertheless, do we want to add functionality for supporting passing a certificate via CLI? Is this necessary? Opinions?

@dlorenc
Copy link
Member

dlorenc commented Dec 11, 2021

I need to think through this one a bit. The sget behavior today technically can fetch blobs, or any arbitrary files, they just have to be stored in an OCI registry.

This change does a few things:

  • Adds support for fetching objects NOT in an OCI registry
  • Separates the two into subcommands

For OCI storage, we currently allow getting an object by digest or by tag. If you fetch by tags, the object must be signed.

If we extend to non-OCI storage, we should try to keep things as safe as possible. I think to make a download "safe" from an arbitary URL we could require that the user specify a key to verify signatures against, OR a hash. The hash MUST be local in this case, not at a URL.

I like the idea of allowing other ways to fetch things, and encouraging safe behavior, so this overall makes sense.

What do you think of trying to merge the commands back though. To a user, the thing getting downloaded is actually the same, it just differs in how they're retrieved and validated.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 11, 2021

What do you think of trying to merge the commands back though. To a user, the thing getting downloaded is actually the same, it just differs in how they're retrieved and validated.

This is exactly what I thought about at first. I struggled with deciding how we branch into the different code segments. How do we know it's a file and not an OCI registry? I thought about looking for https:// as prefix of the "artifact". What do you think?
Do we want to support local files, too?

EDIT: Ok... OCI registries would start with this prefix as well.. we have to go for another way. Maybe trying to download it as OCI image first, if it fails we fallback to a normal GET request and blob verification?

If we extend to non-OCI storage, we should try to keep things as safe as possible. I think to make a download "safe" from an arbitary URL we could require that the user specify a key to verify signatures against, OR a hash. The hash MUST be local in this case, not at a URL.

I see a problem here, but maybe you have an idea. My assumption was that the URL is safe (HSTS, DNSSEC, etc). If it does not, we are having a little trust problem, because from where would the user get the hash? Very likely from the same website the user gets the blob from, right? Does it really make a difference if we provide --hash <hash of the blob flag? If we use this flag we should toggle to the blob download-mode with it, but I do not know if it's really more secure to rely on a given hash from the same source.

What do you think? Can you explain this point to me? :)

@shibumi
Copy link
Contributor Author

shibumi commented Dec 13, 2021

I think we have a following options for merging the blob and image subcommand again:

  1. We download the artifact and parse the header or the MIME type and look for the vnd oci identifier..
  2. We let the user decide with a command line flag instead of a subcommand
  3. We "detect" the type of artifact via parsing the URL. For example: We could look for the colon in oci registry URLs.
  4. We use the package URL specification (purl): https://github.com/package-url/purl-spec

Every one of those has its own advantages and disadvantages.

@dlorenc
Copy link
Member

dlorenc commented Dec 14, 2021

Would this work for discovery? https://docs.docker.com/registry/spec/api/#api-version-check

You could parse the URL like a docker image one, and then make sure it's served from a registry.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 14, 2021

Would this work for discovery? https://docs.docker.com/registry/spec/api/#api-version-check

Good idea! I will work something out and update this PR.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 15, 2021

Would this work for discovery? https://docs.docker.com/registry/spec/api/#api-version-check
You could parse the URL like a docker image one, and then make sure it's served from a registry.

This does not seem to work:

❯ curl https://ghcr.io/v2
404 page not found

shouldn't I get some JSON response? With docker hub this does not work as well. Only index.docker.io seem to work:

❯ curl https://index.docker.io/v2 -L
{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":null}]}

@dlorenc The alternative is that we just check for :<tag> as suffix in the URL., but as soon as a URL has a colon in it we will get into trouble.

What about using sget <image ref> on default and sget blob <artifact ref> for blobs?

@dlorenc
Copy link
Member

dlorenc commented Dec 17, 2021

@jonjohnsonjr any idea the best way to tell if a URL points to a container image?

@dlorenc
Copy link
Member

dlorenc commented Dec 17, 2021

What about using sget <image ref> on default and sget blob <artifact ref> for blobs?

I don't love this one either because anything is kind of a blob... Images also store blobs...

What about something like sget <ref> --protocol=oci|http|etc Something that kind of sets us up for PURL but maybe doesn't go the entire way? I like the idea of PURL still, but agree with your concern that it would convey we support all the types.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 18, 2021

What about something like sget --protocol=oci|http|etc

this sounds reasonable for me. We could default to oci.

@dlorenc
Copy link
Member

dlorenc commented Dec 18, 2021

I actually wouldn't mind if sget turned into a safe, universal artifact fetcher over time. That's kind of the idea actually, now that I think about it :)

Are there any other projects that fetch artifacts over the full purl spectrum?

@shibumi
Copy link
Contributor Author

shibumi commented Dec 18, 2021

Are there any other projects that fetch artifacts over the full purl spectrum?

I don't think so. There is a go package for validating/creating package URLs. We do not need to support all package definitions from day 0. The list is very long: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

Does the OCI definition meets your expectations?

https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#oci

IMO, if we implement purl, we should be consistent. That measn that

$ sget us.gcr.io/dlorenc-vmtest2/readme@sha256:4aa3054270f7a70b4528f2064ee90961788e1e1518703592ae4463de3b889dec

will become:

$ sget pkg:oci/readme@sha256:4aa3054270f7a70b4528f2064ee90961788e1e1518703592ae4463de3b889dec?repository_url=us.gcr.io/dlorenc-vmtest2/readme&tag=latest

TWill package URL be a standard in the future that we can rely on? I am fine with supporting purl, I am just worried that we might implement something that will never "take off". On the other hand: If nobody starts implementing it nobody will implement it, I guess. :D

@shibumi
Copy link
Contributor Author

shibumi commented Dec 18, 2021

@dlorenc FYI: I forked the purl go implementation (it hasn't received an update since 2 years). Might be useful for SPDX as well. Looks like SPDX implements package URL.

@dlorenc
Copy link
Member

dlorenc commented Dec 23, 2021

TWill package URL be a standard in the future that we can rely on? I am fine with supporting purl, I am just worried that we might implement something that will never "take off". On the other hand: If nobody starts implementing it nobody will implement it, I guess. :D

Idk if it'll take off, but it is seeing usage in the SBOM formats and in the SLSA spec. I'm not aware of anything equivalent either.

@shibumi
Copy link
Contributor Author

shibumi commented Dec 23, 2021

Idk if it'll take off, but it is seeing usage in the SBOM formats and in the SLSA spec. I'm not aware of anything equivalent either.

If it's in the SBOM formats and the SLSA spec it should be fine, I guess. I got good feedback for my package-url-go PR: package-url/packageurl-go#28

if it isn't merged until new year I might hack something together with my fork, just for knowing it would work the way we want it to.

@dlorenc
Copy link
Member

dlorenc commented Dec 24, 2021

if it isn't merged until new year I might hack something together with my fork, just for knowing it would work the way we want it to.

I'm fine using your fork if we need to because the other one is stale. The only real risk of a fork is drift, but if nothing is moving upstream we don't have much drift to worry about.

@jonjohnsonjr
Copy link
Contributor

@jonjohnsonjr any idea the best way to tell if a URL points to a container image?

HEAD and check the Content-Type?

@shibumi
Copy link
Contributor Author

shibumi commented Dec 29, 2021

@jonjohnsonjr is the content-type special for an OCI registry? I just checked this with index.docker.io and what I got is content-type: application/json. JSON alone is not enough for identifying an OCI registry.

Anyway: We are playing around with pkg urls as explicit artifact identifier.. this seem to be more promising right now.

@jonjohnsonjr
Copy link
Contributor

If you want to check "is this host a registry?" you can do this:

$ curl -s -I -L https://ghcr.io/v2/ | grep docker
docker-distribution-api-version: registry/2.0

$ curl -s -I -L https://index.docker.io/v2/ | grep docker
docker-distribution-api-version: registry/2.0
www-authenticate: Bearer realm="https://auth.docker.io/token",service="registry.docker.io"

And just check for docker-distribution-api-version.

If you want to check if a URL points to a specific container image you have to jump through some more hoops to transform the image ref to the API call and (sometimes) do the auth handshake:

$ curl -s -I \
  -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
  -H "Authorization: Bearer $(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token)" \ https://index.docker.io/v2/library/debian/manifests/latest | grep content-type
content-type: application/vnd.docker.distribution.manifest.list.v2+json

@github-actions
Copy link

github-actions bot commented Sep 2, 2022

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions
Copy link

This PR was closed because it has been stalled for 10 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants