Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed RHCOS buildID in update payload, fetch it for installer #987

Closed
cgwalters opened this issue Jan 3, 2019 · 15 comments
Closed

Embed RHCOS buildID in update payload, fetch it for installer #987

cgwalters opened this issue Jan 3, 2019 · 15 comments

Comments

@cgwalters
Copy link
Member

cgwalters commented Jan 3, 2019

There was a lot of discussion in #757 which is long since closed that I don't want to be lost.

One specific proposal I'd like to discuss here is embedding the RHCOS build ID (e.g. 47.201) in the update payload.

In order to do this, as was discussed in the above PR, the installer would need to at least do the equivalent of skopeo inspect. A nice benefit of this would be that we'd validate the pull secret up front instead of on the bootstrap node.

A major benefit of this is conceptually one only needs to worry about two toplevel things when speaking of git master - (installer, update payload).

And released installers wouldn't need the special commit to do the RHCOS pinning - it would happen naturally as part of pinning a release payload.

@cgwalters
Copy link
Member Author

And we'd solve the "RHCOS CI gating problem" if updating the build ID into the release payload was a PR gated by e2e-aws.

@cgwalters
Copy link
Member Author

(Clearly if we do this the installer should also resolve the release image to a by-digest @sha256 and pass that to the bootstrap)

@wking
Copy link
Member

wking commented Jan 3, 2019

A nice benefit of this would be that we'd validate the pull secret up front instead of on the bootstrap node.

Just for the update payload though. There would be no validation of whether your pull secret was authorized to pull the (indirectly) referenced images. Still, checking the pull secret against the update payload is certainly not hurting validation, even if it is incomplete ;).

A major benefit of this is conceptually one only needs to worry about two toplevel things when speaking of git master - (installer, update payload).

And not even that, since you can use the installer referenced by the update payload since openshift/origin#21637.

I'm on board with this approach. How did you plan on getting the RHCOS build ID (and channel) into the update payload? I'd prefer if it was via a repository maintained by the RHCOS team, because we want the installer's master to continue to float with the latest RHCOS, and that would be a noisy stream of machine-generated PRs ;).

@cgwalters
Copy link
Member Author

OK openshift/origin#21637 is...confusing me a lot. I understand the Hive connection somehow but...for the ("non-Hive"?) install path there are no plans to switch away from downloading a binary or building git master right?

How did you plan on getting the RHCOS build ID (and channel) into the update payload?

Two options I see:

I think my vote is the former; this leaves us in a place where e.g. a broken cri-o will stall release image promotion, but I think that's OK for now. The RHCOS team can invest more in better CI even before things land into git master for us after that.

@cgwalters
Copy link
Member Author

If we're agreed on this then the next step is to discuss implementation architecture on the installer side a bit - I'm willing to try helping here.

A dependency on the containers/image library would be OK as far as we know?

@wking
Copy link
Member

wking commented Jan 3, 2019

... for the ("non-Hive"?) install path there are no plans to switch away from downloading a binary or building git master right?

It's a bit fuzzy for me, but I'm still planning on cutting installer releases. While we're doing that, users can do any of:

  • Use a released installer binary (or build from a release tag with its pinned dependencies) to launch clusters pinned to that baked-in update payload (and RHCOS).
  • Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).
  • Use an installer built from master to launch the most recently promoted update payload and the most recently promoted RHCOS. With this approach, it doesn't really matter if the installer pulls "the most recently promoted RHCOS" information from the update payload (as in this issue) or some RHCOS-specific API (as we currently do).

And with all of these approaches, users can go through some hoop-jumping to override the update payload or RHCOS build if they want to test a change (like we do in CI).

Two options I see:

I think my vote is the former...

We certainly don't want to bury openshift/os in RHCOS-bump PRs. [edit: I'd misread this as openshift/origin. I'm fine burying openshift/os in RHCOS-bump PRs ;).]

... this leaves us in a place where e.g. a broken cri-o will stall release image promotion...

A third option would be to have a separate repository (recycling openshift/os?) that builds this into the release payload just like all of the other repositories getting built into the release payload. That has the "one more repo to manage" downside but the "breaking RHCOS-bump just blocks one repo (vs. the whole update payload promotion stream)" upside. [edit: Why would openshift/os PRs stall release-image promotion? The bumping PR should be blocked from merging by CI, just like PRs to other repositories.]

The RHCOS team can invest more in better CI even before things land into git master for us after that.

Yeah, this would mitigate the promotion-blocking issue.

A dependency on the containers/image library would be OK as far as we know?

That's fine with me. We already depend on them for Podman/CRI-O, so vendoring them here doesn't increase our overall dependency exposure.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 3, 2019

so vendoring them here doesn't increase our overall dependency exposure.

Yeah, only concern I can think of though is that we'd need it to work on OS X at least, which shouldn't be hard - particularly if we e.g. had a build tag that that entirely disabled the storage backends for example since we don't need to unpack the containers.

@cgwalters
Copy link
Member Author

Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).

This is "Use Hive" (and really only Hive) right? We don't have other use cases where people are unpacking the update payload to get an installer, or do we?

@wking
Copy link
Member

wking commented Jan 3, 2019

Use an installer imaged referenced from an update payload to launch a cluster from that same update payload (and once we fix this issue, with the RHCOS it references as well).

This is "Use Hive" (and really only Hive) right? We don't have other use cases where people are unpacking the update payload to get an installer, or do we?

I dunno. But does consumer enumeration matter? It's a valid approach with our current and foreseeable future update-payloads, and so are the other two approaches.

@crawford
Copy link
Contributor

crawford commented Jan 5, 2019

I'm against this idea right now. The long term plan is to move master creation into an actuator (which will mean that all of the machines are created dynamically). Once that happens, I think this will make a lot more sense to tackle.

As things exist today, if the installer creates a master with an old RHCOS image, will the MCO take a pass and get it updated? If not, that needs to be fixed long before we should be discussing this.

@cgwalters
Copy link
Member Author

I'm against this idea right now. The long term plan is to move master creation into an actuator (which will mean that all of the machines are created dynamically). Once that happens, I think this will make a lot more sense to tackle.

Seems weird, you're saying "not right now but would be good for the future" - why explicitly then the "not right now" part? Complexity? Timing/need-to-ship-what-we-have?

As things exist today, if the installer creates a master with an old RHCOS image, will the MCO take a pass and get it updated? If not, that needs to be fixed long before we should be discussing this.

This is openshift/machine-config-operator#183

We need both I think though - they're not quite orthogonal because if we get in a situation where the installer creates a cluster with "bootimage" != "osimage", that would be quite noticeable for both users and CI scenarios - we'd be rebooting all of the nodes (including the master) soon after cluster creation.

@cgwalters
Copy link
Member Author

cgwalters commented Jan 22, 2019

One thing that's fairly important and would be solved with this is that today there's a dependency on the rhcos release API server which...isn't really maintained to the level it should be to be a production service.

For AWS in particular all we need is the "AMI json" which if we just embedded in the release payload would mean the critical path wouldn't need a dependency on the service.

(We still need the service for libvirt but that's not production. We will need the service too or something like it for bare metal/openstack etc.)

Strawman proposal:

oc adm release new --embed-rhcos=https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/

Maybe for maximum convenience for the installer this data ends up in the container metadata (not the content payload) so that it doesn't need to be extracted.

cgwalters added a commit to cgwalters/origin that referenced this issue Feb 9, 2019
For RHCOS we have two things:

 - The "bootimage" (AMI, qcow2, PXE env)
 - The "oscontainer", now represented as `machine-os-content` in the payload

For initial OpenShift releases (e.g. of the installer) ideally
these are the same (i.e. we don't upgrade OS on boot).

This PR aims to support injecting both data into the release payload.

More information on the "bootimage" and its consumption by the
installer as well as the Machine API Operator:
openshift/installer#987

More information on `machine-os-content`:
openshift/machine-config-operator#183
cgwalters added a commit to cgwalters/origin that referenced this issue Feb 9, 2019
For RHCOS we have two things:

 - The "bootimage" (AMI, qcow2, PXE env)
 - The "oscontainer", now represented as `machine-os-content` in the payload

For initial OpenShift releases (e.g. of the installer) ideally
these are the same (i.e. we don't upgrade OS on boot).

This PR aims to support injecting both data into the release payload.

More information on the "bootimage" and its consumption by the
installer as well as the Machine API Operator:
openshift/installer#987

More information on `machine-os-content`:
openshift/machine-config-operator#183
@cgwalters
Copy link
Member Author

openshift/origin#21998 will land metadata in the release payload sufficient for the installer to use to implement this.

@abhinavdahiya
Copy link
Contributor

The installer binaries shipped to customers have embedded fixed release image digest, and also the installer binary itself has the boot-images embedded.

So the in terms of users, they only care about one thing and the rest is already handled. So I think issue can be closed as fixed.

/close

If you think that's not the case, please feel free to reopen.

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closing this issue.

In response to this:

The installer binaries shipped to customers have embedded fixed release image digest, and also the installer binary itself has the boot-images embedded.

So the in terms of users, they only care about one thing and the rest is already handled. So I think issue can be closed as fixed.

/close

If you think that's not the case, please feel free to reopen.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants