Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-707: Strip registry transport from os image URL for comparison #3857

Merged
merged 1 commit into from Sep 26, 2023

Conversation

ori-amizur
Copy link
Contributor

@ori-amizur ori-amizur commented Aug 15, 2023

One of the conditions to reboot after firstboot is different os image
URLs. In order to correctly compare the URLs, the booted OS image URL
has to be stripped from container image reference and transport.
This commit utilizes a recent change in rpm-ostree client to do just
that.

- What I did
Strip registry transport from os image URLs before checking if reboot is needed.
- How to verify it

  1. Regression
  2. Verify if URLs are considered equal when booted os URL is with a registry transport, and the required is without.
    - Description for the changelog

Strip registry transport from OS Image URLs for comparison

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 15, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 15, 2023

@ori-amizur: This pull request references MCO-707 which is a valid jira issue.

In response to this:

When comparing os image URLs in or to decide if reboot is needed, sometimes the URL starts with a scheme (i.e docker://), and sometimes without. In order to compare correctly, the scheme is removed before comparison is done.

- What I did
Canonized os image URLs before checking if reboot is needed.
- How to verify it

  1. Regression
  2. Verify if URLs are considered equal when booted os URL is with scheme, and the required is without.
    - Description for the changelog

Canonize OS Image URLs for comparison

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ori-amizur
Copy link
Contributor Author

/assign @sinnykumari

@ori-amizur
Copy link
Contributor Author

/uncc @cdoern

@openshift-ci openshift-ci bot removed the request for review from cdoern August 15, 2023 10:07
@ori-amizur
Copy link
Contributor Author

/retest

@ori-amizur
Copy link
Contributor Author

/test e2e-hypershift

1 similar comment
@ori-amizur
Copy link
Contributor Author

/test e2e-hypershift

@@ -2075,6 +2075,10 @@ func (dn *Daemon) validateOnDiskState(currentConfig *mcfgv1.MachineConfig) error
return nil
}

func canonizeImageURL(url string) string {
Copy link
Member

@cgwalters cgwalters Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a doc comment (and also "canonicalize" is more canonical here) - canonization is a different thing and I don't think this function quite qualifies for sainthood yet 😄 . But even "canonicalize" would make someone wonder "canonicalize how?" I think.

Further this is going to be a bit fragile because in theory we could (and in fact have a TODO to) change it. So how about combining these two:

Suggested change
func canonizeImageURL(url string) string {
// stripRegistryTransport removes the registry transport prefix, see https://docs.rs/ostree-ext/latest/ostree_ext/container/enum.Transport.html
func stripRegistryTransport(url string) string {
return strings.Relpace(strings.Replace(url, "docker://", "", 1), "registry:", "", 1))

or so?

Could also use a unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that it will be always "docker://" or we should handle rest of the transport replacement as well that are supported by OSTree ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in OpenShift today we effectively hardcode docker:// - i.e. we only support fetching from a registry in osImageURL.

So...yes. Note openshift/os#657 would actually switch things so that the initial state is from an oci-archive, but that's OK because it will today always be older anyways.

(A whole other problem here is that today we deploy as OCI, but when we later push the container image it gets converted to d2s2, which changes the digest...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes sense.

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, other than that I agree with changes already suggested by Colin.

assert.Nil(t, err)
assert.False(t, diff.isEmpty())
assert.True(t, diff.osUpdate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test case where oldConfig has empty OSImageURL and newConfig has a valid OSImageURL. something like oldConfig.Spec.OSImageURL = " " and newConfig.Spec.OSImageURL = ""quay.io/example/foo@sha256:b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c" .

This usually is the case today during initial node bootstrap for a cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this already exist on line 222?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, should have looked at existing test cases.

@sinnykumari
Copy link
Contributor

Also, will be good to reword commit message based on latest changes that we have.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 28, 2023

@ori-amizur: This pull request references MCO-707 which is a valid jira issue.

In response to this:

When comparing os image URLs in or to decide if reboot is needed,
sometimes the URL starts with a prefix (i.e docker:// or registry:), and sometimes
without. In order to compare correctly, the prefix is removed before
comparison is done.

- What I did
Strip registry transport os image URLs before checking if reboot is needed.
- How to verify it

  1. Regression
  2. Verify if URLs are considered equal when booted os URL is with scheme, and the required is without.
    - Description for the changelog

Canonize OS Image URLs for comparison

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Aug 28, 2023

@ori-amizur: This pull request references MCO-707 which is a valid jira issue.

In response to this:

When comparing os image URLs in or to decide if reboot is needed,
sometimes the URL starts with a prefix (i.e docker:// or registry:), and sometimes
without. In order to compare correctly, the prefix is removed before
comparison is done.

- What I did
Strip registry transport from os image URLs before checking if reboot is needed.
- How to verify it

  1. Regression
  2. Verify if URLs are considered equal when booted os URL is with a registry transport, and the required is without.
    - Description for the changelog

Strip registry transport from OS Image URLs for comparison

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ori-amizur ori-amizur changed the title MCO-707: Canonize os image URL for comparison MCO-707: Strip registry transport from os image URL for comparison Aug 28, 2023
@sinnykumari
Copy link
Contributor

Thanks Ori for working on this.
/lgtm
This should get merged once 4.14 branching has happened and we no longer require valid-bz label.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 28, 2023
@sergiordlr
Copy link

sergiordlr commented Sep 7, 2023

Verifying using IPI on GCP

  1. Create a custom osImage (we used the internal registry)
    image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-54056-new-os-image@sha256:5ccada2081859a95414cd3f6a96a5f871e9ef8b5463ad28e4e8128bfe503faa3

  2. Deploy a mc to use this custom osImage

kind: MachineConfig
apiVersion: machineconfiguration.openshift.io/v1
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: "test-new-os-image"
spec:
  osImageURL: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-54056-new-os-image@sha256:5ccada2081859a95414cd3f6a96a5f871e9ef8b5463ad28e4e8128bfe503faa3"

The new osImage is applied and the nodes rebooted.

  1. Edit the test-new-os-image MC create in step 2 to prepend "docker://" to the osIMage. Like this:
spec:
  osImageURL: "docker://image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/tc-54056-new-os-image@sha256:5ccada2081859a95414cd3f6a96a5f871e9ef8b5463ad28e4e8128bfe503faa3"

A new MC is rendered and, when it is applied to the nodes, it is considered to be already applied and the nodes are not rebooted nor drained, as expected.

Nevertheless, we need to be aware that "docker://..." is not the right format for osImageURL. So we are accepting a wrong format in osImageURL if the image is the same as the one already present in the cluster. Imho, It's not a big deal, but we need to be aware of it. A customer could wonder why "docker://" is accepted in the MCs' osImageURL only sometimes.

Regarding the agent-install, we can't verify this PR using agent-install until openshift/os#657 is merged. Is it right?

/cc @rioliu-rh

@rioliu-rh
Copy link

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 7, 2023
@rioliu-rh
Copy link

rioliu-rh commented Sep 8, 2023

hold this PR according to Sergio's comment, another issue about url format validation, if we create a mc to update os image with a new one w/o transport prefix docker://, the pool will be degraded with message like docker://... is not the right format

Sergio can provide more details about it.

@cgwalters
Copy link
Member

cgwalters commented Sep 8, 2023

Nevertheless, we need to be aware that "docker://..." is not the right format for osImageURL. So we are accepting a wrong format in osImageURL if the image is the same as the one already present in the cluster. Imho, It's not a big deal, but we need to be aware of it. A customer could wonder why "docker://" is accepted in the MCs' osImageURL only sometimes.

Hmm yes. I think we should actually continue to reject docker:// being present in osImageURL. I don't see what we gain by allowing it there.

So this PR started with:

sometimes the URL starts with a prefix (i.e docker:// or registry:), and sometimes without

Let's try to identify more precisely the code flows involved here.

I'm actually a bit confused here because I thought we were parsing the output from rpm-ostree status --json, which has e.g.:
"container-image-reference" : "ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd29255bd4e6634404ef8e0f4a7f19b4ae5a57a3afc7d67cbe00934ba998d6ac"

And then we sort of hackily scrape that off...

Ohh, wait hmm...I think I may understand now! Yes, the bug is in that code in the MCO that's incorrectly parsing ostree-container image references. When we use ostree container image deploy, it ends up as e.g.:

$ rpm-ostree status --json | jq -r '.deployments[0]."container-image-reference"'
ostree-unverified-image:docker://quay.io/fedora/fedora-coreos:stable

So the core bug is here

tokens := strings.SplitN(bootedDeployment.ContainerImageReference, ":", 2)

Now an immediate practical problem is that the code for parsing these strings today lives in Rust, not Go. (And even for using Rust, we don't really want to vendor all of ostree-rs-ext, which is its own distinct bug).

Here's what I'd propose:

@cgwalters
Copy link
Member

I can also look at "canonicalizing" ostree-unverified-image:docker:// back to ostree-unverified-registry: on the ostree side, which would also implicitly fix things.

@cgwalters
Copy link
Member

@cgwalters
Copy link
Member

PR in coreos/rpmostree-client-go#21

@cgwalters
Copy link
Member

So next we need #3916
and then we can update this PR to execute on the bottom half of #3857 (comment)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 19, 2023
@sinnykumari
Copy link
Contributor

#3916 has been merged, we should be good to utilize bootedDeployment.RequireContainerImage() to get bootedOSImageURL without worrying about transport prefix.

@ori-amizur
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 22, 2023

@ori-amizur: This pull request references MCO-707 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

In response to this:

One of the conditions to reboot after firstboot is different os image
URLs. In order to correctly compare the URLs, the booted OS image URL
has to be stripped from container image reference and transport.
This commit utilizes a recent change in rpm-ostree client to do just
that.

- What I did
Strip registry transport from os image URLs before checking if reboot is needed.
- How to verify it

  1. Regression
  2. Verify if URLs are considered equal when booted os URL is with a registry transport, and the required is without.
    - Description for the changelog

Strip registry transport from OS Image URLs for comparison

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…erence for booted OS image URL

One of the conditions to reboot after firstboot is different os image
URLs.  In order to correctly compare the URLs, the booted OS image URL
has to be stripped from container image reference and transport.
This commit utilizes a recent change in rpm-ostree client to do just
that.
@sinnykumari
Copy link
Contributor

LGTM
/approve
@sergiordlr This should be ready for qe testing now
@cgwalters PTAL

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@ori-amizur
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 26, 2023
@sergiordlr
Copy link

sergiordlr commented Sep 26, 2023

Hello! @sinnykumari this PR depends on openshift/os#657

We need that PR to be merged before we can execute the verification here.

We have verified that the new change does not have the problem reported above, and it fails properly when we use osImage values with "docker://".

The rest of the verification (agent-installer skipping the first boot) will be executed once the PR linked above is merged.

@sinnykumari
Copy link
Contributor

Thanks Sergio for testing it. Agree it is difficult to test complete functionality without openshift/os#657 . With current testing, it should be sufficient for the PR to get merged as it doesn't break any existing functionalities.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, ori-amizur, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 30c28f3 and 2 for PR HEAD b19dc30 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 26, 2023

@ori-amizur: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 119b164 into openshift:master Sep 26, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants