-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how do I check if a layout contains all of the blobs? #838
Comments
I would point you towards I wish I had seen this before you wrote up the PR, sorry! |
Haha! No worries. It was a fun little exercise, if unnecessary. So those in the validate package do the same thing? They pull down all manifests and indexes, and check the accessibility of layers and configs without having to actually pull them all? Will they work for an image based on v1/layout and v1/remote, and pretty much anywhere I get an Image or ImageIndex? |
Those can get expensive, as they actually calculate layer hashes, but I can live with it for now. |
Yep, that's the idea. I use these to "cheat" at test coverage in a lot of places.
Except for this bit. We will hit every byte at least once (sometimes twice), so it's pretty expensive, but if you're doing this on a local disk, it should be pretty fast. For remote, yeah it might be slow. In general, I don't love the idea of a "does this stuff exist?" method because you generally shouldn't have to care about that. The read and write implementations should do everything as lazily-accessed as possible, so you shouldn't have to care. |
Come back to the original use case. I want to check if everything is there, before pulling remotely. Or because I want to check before going offline. |
In this instance, you could just read the index from your layout, then attempt to From earlier:
This would only incur the token handshake and a single manifest GET (assuming everything is already there). For most use cases, I think that's pretty cheap, but I agree there are some scenarios (air-gapped or firewalled environments) where this doesn't work... but in these scenarios, I do think you'd really want to |
Yup, then I am just going to use I am going to close this issue out. |
Oh wait. It is closed! 😂 |
Hmm, there may still be an issue. I am not 100% positive that is a good idea, but it is worth discussing. |
How slow? I'm kind of surprised by that if it's just hitting local disk unless your images are enormous. What exactly are you doing? I can try to reproduce it.
Not currently.
As in something like
Yeah... we don't have a generic way to ask an image "does this layer exist?" -- just "give me the bytes for this layer". |
I was surprised too, but I am definite that it is in the hashing stage. Truth is, the more I think about it, the more it makes sense. If I am validating a local directory, I should be hashing. Validation means "valid", not, "I have the files and hope they are valid." So we live with that. If It comes up again as unusually slow, I will raise a separate issue (and attach pprof and a flamegraph as well, to make it useful). |
I am coming back to this one. I find that if it is an index, with several manifests, each of which has several decent sized layers, hashing can take a while. Is there any way to validate existence of all of the parts of an image (and index) without calculating the hash? |
I think I tracked it down. I stepped through it, looked at the source, and compared to typical utilities like I timed a 160MB blob. I get why we do it, we are validating the index, the hashes of manifests, the hashes of layers and configs, and the diffids in the configs. But it does come with a big price. |
It does everything twice, actually, because we access it through both I think I'd be okay with adding options to these functions to speed things up. I can imagine: An option to skip calculating diffids would skip all of this and this. (There's also this assumption in there that everything is a tarball, which doesn't hold up. We should only do that for certain media types.) Another option would be to add something that does just existence checks by calling |
OK, which do you prefer? I looked at it like this: if I am opening an image to run it, I probably validate hashes and diffids; if I am containerd or a registry checking the input of an image when loaded, I probably only check that the blob (and config and manifest)hashes match what is expected. |
Maybe benchmark something that just skips the diffids, and we can see if that's fast enough? |
I keep running into this. We need some way to do an existence check on a blob that explicitly isn't lazy but doesn't give you all the bytes. This would translate into a HEAD request or stat of a file. |
I am not happy with this because of all the type wrapping nonsense :/ |
You got there ahead of me! I was going to, but been swamped this week. Thanks! :-) |
Is it possible to somehow try to "resolve", or walk, my local v1 layout from an index that is there and see if all parts are there?
Use case: let's say I have an image as follows. I am using part of
docker.io/library/alpine:3.11
For the above, I have all of the parts from root index through manifests, configs, and layers for linux/amd64 and linux/arm64.
If my local layout directory has some of those, but not linux/arm64 parts, then it might look like this:
How do I check if all of the parts are there without going to docker.io? I can do the following:
and it will go to
docker.io
, get the index, and then download any missing parts.However, it should be possible to somehow try to "resolve" my local index and see if all parts are there.
As usual, happy to open a PR on it.
The text was updated successfully, but these errors were encountered: