Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baremetal: incorrect checksum used for bootstrap vm with bootstrapOSImage override #2845

Closed
hardys opened this issue Dec 18, 2019 · 12 comments
Closed
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. platform/baremetal IPI bare metal hosts platform

Comments

@hardys
Copy link
Contributor

hardys commented Dec 18, 2019

Since #2757 landed it is possible to override the bootstrap image for disconnected installs, and I tried that like this:

platform:
  baremetal:
    bootstrapOSImage: http://192.168.111.1/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz?sha256=fb31404bbd8b7cb4726799e0a839799060a496679e5c67b06a17929d757e5e9e

However we see that although the installer cache code downloads from the expected location, the checksum is not recalculated before passing to terraform, so the validation fails:

level=info msg="Obtaining RHCOS image file from 'http://192.168.111.1/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz?sha256=fb31404bbd8b7cb4726799e0a839799060a496679e5c67b06a17929d757e5e9e'"
level=debug msg="Unpacking file into \"/home/shardy/.cache/openshift-installer/image_cache/59c9306c6a41ee6d900e99b4a0b2697a\"..."
level=debug msg="content type of /home/shardy/.cache/openshift-installer/image_cache/59c9306c6a41ee6d900e99b4a0b2697a is application/x-gzip"
level=error msg="File sha256 checksum is invalid."
level=fatal msg="failed to fetch Terraform Variables: failed to generate asset \"Terraform Variables\": failed to get baremetal Terraform variables: failed to use cached bootstrap libvirt image: Checksum mismatch for /home/shardy/.cache/openshift-installer/image_cache/59c9306c6a41ee6d900e99b4a0b2697a; expected=fb31404bbd8b7cb4726799e0a839799060a496679e5c67b06a17929d757e5e9e found=6a019c55a13c6ff4c6527d8b2c965bdc657bf444258ee7a420694d6f3ab3a8e8"

Here we can see the compressed and uncompressed checksum - I think in the case where bootstrapOSImage is specified the sha256 should be used to validate the downloaded image, then a new checksum calculated for the gunzipped file?

$ sha256sum /home/dev-scripts/ironic/html/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz
fb31404bbd8b7cb4726799e0a839799060a496679e5c67b06a17929d757e5e9e  /home/dev-scripts/ironic/html/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz
$ gunzip -c /home/dev-scripts/ironic/html/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz | sha256sum
6a019c55a13c6ff4c6527d8b2c965bdc657bf444258ee7a420694d6f3ab3a8e8  -
@hardys
Copy link
Contributor Author

hardys commented Dec 18, 2019

/label platform/baremetal

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Dec 18, 2019
@hardys
Copy link
Contributor Author

hardys commented Dec 18, 2019

@kirankt FYI I found this while testing with openshift-metal3/dev-scripts#867

I'm not yet sure why this isn't playing nicely wrt #2657

@kirankt
Copy link
Contributor

kirankt commented Dec 18, 2019

@hardys I left a comment earlier in openshift-metal3/dev-scripts#867 that we need to use the uncompressed sha256 hash. Its confusing, but that's what the cache algorithm expects.

@hardys
Copy link
Contributor Author

hardys commented Dec 18, 2019

Ok thanks for confirming @kirankt IMHO that is probably a bug, I see it appends the uncompressed hash here:

baseURL += "?sha256=" + meta.Images.QEMU.UncompressedSHA256

It's a counterintuitive interface though, when you see a URL like this I expect the checksum to relate to the file actually downloaded:

level=info msg="Obtaining RHCOS image file from 'http://192.168.111.1/images/rhcos-43.81.201912110942.0-qemu.x86_64.qcow2.gz?sha256=fb31404bbd8b7cb4726799e0a839799060a496679e5c67b06a17929d757e5e9e'

@RobertKrawitz - hi do you have any thoughts on this - would it be reasonable to rework the cache code to checksum, uncompress then provide the uncompressed checksum for input e.g to terraform?

Perhaps you already looked into that and it'd be good to save some time if we can share any experiences you have there, thanks! :)

@RobertKrawitz
Copy link
Contributor

We discussed doing that, but decided that it's best to not have to store (temporarily) two copies of the image on disk.

If a .gz file is explicitly requested, perhaps the code should decompress it on the fly (perhaps calculating the compressed checksum on passthrough), but only land the uncompressed file on disk.

@hardys
Copy link
Contributor Author

hardys commented Dec 18, 2019

@RobertKrawitz thanks for the additional context, it does seem like the interface would be more intuitive/consistent if we calculated the checksum for the compressed file.

As you say we could do it on the fly via streaming, e.g locally I've been doing gunzip -c | sha256sum for similar reasons;

I'm creating a local cache for disconnected installs that includes images used by the installer and other provisioning components deployed via IPI baremetal, and in that case I don't really want to store the uncompressed files, it's better (and faster) to validate the compressed data.

@RobertKrawitz
Copy link
Contributor

RobertKrawitz commented Dec 18, 2019

Last time I tried it, the machine would not boot the compressed image, so we stored it uncompressed and calculated the checksum on that basis.

If you do make this kind of change, please ensure that libvirt and OpenStack function correctly.

@abhinavdahiya
Copy link
Contributor

@RobertKrawitz thanks for the additional context, it does seem like the interface would be more intuitive/consistent if we calculated the checksum for the compressed file.

As you say we could do it on the fly via streaming, e.g locally I've been doing gunzip -c | sha256sum for similar reasons;

I'm creating a local cache for disconnected installs that includes images used by the installer and other provisioning components deployed via IPI baremetal, and in that case I don't really want to store the uncompressed files, it's better (and faster) to validate the compressed data.

I don't want to store compressed data, because it's not useful to any of the consumers, libvirt, openstack can't use the compressed format... so it's most useful to store and cache the uncompressed files.

as for faster, caching means you don't have to do it always, so i don't see the benefit there.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 17, 2020
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. platform/baremetal IPI bare metal hosts platform
Projects
None yet
Development

No branches or pull requests

6 participants