New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expected_hash to DataVolume #1520
Comments
|
This stackoverflow answer may provide an elegant way to integrate hashing into our stream readers. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
1 similar comment
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale @mhenriks would you express here why this is nontrivial to implement? |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
|
/remove-lifecycle rotten |
|
/lifecycle frozen I want this too :) I think this is essential. For containerDisks this is kind of built-in (at least I hope that skopeo does these checks). |
Can confirm that this pattern works great. We use it for instance here: https://github.com/kubevirt/containerdisks/blob/main/pkg/http/http.go#L38 |
IMO it is not secure to compute the checksum at t0 and assume it is the same at t1. Especially with http (no s) url. The only truly secure way to do this is to download the file to scratch space, compute the checksum (can be done while downloading), then use downloaded file. And this is definitely something we can do but will make certain operations slower (http(s) qcow2 or raw) |
Could you explain what you mean? What is the difference to computing the checksum during the download (stream) compared to first downloading and then computing it? I am not suggesting to first compute the checksum from the remote location and then download the file. Edit: Oh you are probably talking about directly pushing and releasing. while it streams through. Yes, right. If the target has no toggle which can be triggered after the import to make it usable, then yes definitely. |
Yeah, for example we can currently directly stream a qcow2 file to raw via http. This requires no scratch space. To validate the qcow2 checksum will require downloading the qcow2 file to scratch space. |
I am not sure I understand this example? Are you talking about cases where you internally use a tool which requires a http source and does the download itself? Otherwise it seems to me like you could always calculate the checksum while you convert with a tee reader, or not? You could still fail the import after the full download where you finally know that the checksum is bad. |
nbdkit is used to give CDI one interface to a bunch of files/formats on the other end of a url. My assumption is that when checksum is provided CDI will first download/validate the file then point nbdkit to local file rather than http. |
In this case I would probably aim for providing sockets or file descriptors (e.g. pipes) to nbdkit (unless ndbkit supports checksums directly). |
If nbdkit does not access the file sequentially I don't think that we will be able to efficiently compute checksums. And I'm pretty sure it does not access qcow sequentially |
If this is needed, definitely :) |
|
Will a Merkle Hash be better than a linear hash? See for example https://listman.redhat.com/archives/libguestfs/2023-February/030908.html which describes how blkhash can come up with faster hash results by using a Merkle Tree |
|
Hey @mhenriks, do you think this would make sense in the context of populators? |
|
Created a Jira card to track this issue https://issues.redhat.com/browse/CNV-31631. Since a single VolumeImportSource can be used for several PVCs I think this might be more useful for populators. |
/kind enhancement
qemu images may be garbled en route from an http server to CDI's target. To identify this condition, many servers (e.g. Fedora provide the expected checksum of the image.
I would like to state the expected hash in the DataVolume spec, and have the import fail if any of the provided hashes does not match the imported data.
The text was updated successfully, but these errors were encountered: