New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-707: Strip registry transport from os image URL for comparison #3857
Conversation
@ori-amizur: This pull request references MCO-707 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @sinnykumari |
/uncc @cdoern |
/retest |
/test e2e-hypershift |
1 similar comment
/test e2e-hypershift |
pkg/daemon/daemon.go
Outdated
@@ -2075,6 +2075,10 @@ func (dn *Daemon) validateOnDiskState(currentConfig *mcfgv1.MachineConfig) error | |||
return nil | |||
} | |||
|
|||
func canonizeImageURL(url string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a doc comment (and also "canonicalize" is more canonical here) - canonization is a different thing and I don't think this function quite qualifies for sainthood yet 😄 . But even "canonicalize" would make someone wonder "canonicalize how?" I think.
Further this is going to be a bit fragile because in theory we could (and in fact have a TODO to) change it. So how about combining these two:
func canonizeImageURL(url string) string { | |
// stripRegistryTransport removes the registry transport prefix, see https://docs.rs/ostree-ext/latest/ostree_ext/container/enum.Transport.html | |
func stripRegistryTransport(url string) string { | |
return strings.Relpace(strings.Replace(url, "docker://", "", 1), "registry:", "", 1)) |
or so?
Could also use a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that it will be always "docker://" or we should handle rest of the transport replacement as well that are supported by OSTree ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in OpenShift today we effectively hardcode docker://
- i.e. we only support fetching from a registry in osImageURL
.
So...yes. Note openshift/os#657 would actually switch things so that the initial state is from an oci-archive
, but that's OK because it will today always be older anyways.
(A whole other problem here is that today we deploy as OCI, but when we later push the container image it gets converted to d2s2, which changes the digest...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, other than that I agree with changes already suggested by Colin.
pkg/daemon/update_test.go
Outdated
assert.Nil(t, err) | ||
assert.False(t, diff.isEmpty()) | ||
assert.True(t, diff.osUpdate) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test case where oldConfig has empty OSImageURL and newConfig has a valid OSImageURL. something like oldConfig.Spec.OSImageURL = " " and newConfig.Spec.OSImageURL = ""quay.io/example/foo@sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c" .
This usually is the case today during initial node bootstrap for a cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this already exist on line 222?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, should have looked at existing test cases.
Also, will be good to reword commit message based on latest changes that we have. |
@ori-amizur: This pull request references MCO-707 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ori-amizur: This pull request references MCO-707 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thanks Ori for working on this. |
Verifying using IPI on GCP
The new osImage is applied and the nodes rebooted.
A new MC is rendered and, when it is applied to the nodes, it is considered to be already applied and the nodes are not rebooted nor drained, as expected. Nevertheless, we need to be aware that "docker://..." is not the right format for osImageURL. So we are accepting a wrong format in osImageURL if the image is the same as the one already present in the cluster. Imho, It's not a big deal, but we need to be aware of it. A customer could wonder why "docker://" is accepted in the MCs' osImageURL only sometimes. Regarding the agent-install, we can't verify this PR using agent-install until openshift/os#657 is merged. Is it right? /cc @rioliu-rh |
/hold |
hold this PR according to Sergio's comment, another issue about url format validation, if we create a mc to update os image with a new one w/o transport prefix Sergio can provide more details about it. |
Hmm yes. I think we should actually continue to reject So this PR started with:
Let's try to identify more precisely the code flows involved here. I'm actually a bit confused here because I thought we were parsing the output from And then we sort of hackily scrape that off... Ohh, wait hmm...I think I may understand now! Yes, the bug is in that code in the MCO that's incorrectly parsing ostree-container image references. When we use
So the core bug is here
Now an immediate practical problem is that the code for parsing these strings today lives in Rust, not Go. (And even for using Rust, we don't really want to vendor all of ostree-rs-ext, which is its own distinct bug). Here's what I'd propose:
|
I can also look at "canonicalizing" |
This is needed in the MCO, see openshift/machine-config-operator#3857
So next we need #3916 |
#3916 has been merged, we should be good to utilize bootedDeployment.RequireContainerImage() to get bootedOSImageURL without worrying about transport prefix. |
/retest |
@ori-amizur: This pull request references MCO-707 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…erence for booted OS image URL One of the conditions to reboot after firstboot is different os image URLs. In order to correctly compare the URLs, the booted OS image URL has to be stripped from container image reference and transport. This commit utilizes a recent change in rpm-ostree client to do just that.
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
/unhold |
Hello! @sinnykumari this PR depends on openshift/os#657 We need that PR to be merged before we can execute the verification here. We have verified that the new change does not have the problem reported above, and it fails properly when we use osImage values with "docker://". The rest of the verification (agent-installer skipping the first boot) will be executed once the PR linked above is merged. |
Thanks Sergio for testing it. Agree it is difficult to test complete functionality without openshift/os#657 . With current testing, it should be sufficient for the PR to get merged as it doesn't break any existing functionalities. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, ori-amizur, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@ori-amizur: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
One of the conditions to reboot after firstboot is different os image
URLs. In order to correctly compare the URLs, the booted OS image URL
has to be stripped from container image reference and transport.
This commit utilizes a recent change in rpm-ostree client to do just
that.
- What I did
Strip registry transport from os image URLs before checking if reboot is needed.
- How to verify it
- Description for the changelog
Strip registry transport from OS Image URLs for comparison