Bug 1951203: Allow users to set a limit on ICSP file size #818

awgreene · 2021-05-05T13:11:34Z

No description provided.

openshift-ci-robot · 2021-05-05T13:11:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: awgreene
To complete the pull request process, please assign ecordell after the PR has been reviewed.
You can assign the PR to them by writing /assign @ecordell in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/cli/admin/catalog/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/cli/admin/catalog/mirror.go

kevinrizza · 2021-05-07T12:58:00Z

Aside from test failures, this lgtm

awgreene · 2021-05-07T16:19:45Z

Unit tests will keep failing until #821 is merged.

awgreene · 2021-05-07T18:18:23Z

/test e2e-metal-ipi-ovn-ipv6

openshift-ci · 2021-05-08T13:55:52Z

@awgreene: This pull request references Bugzilla bug 1951203, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1951203: Allow users to set a limit on ICSP file size

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

awgreene · 2021-05-08T13:57:48Z

/bugzilla refresh

soltysh · 2021-06-10T16:48:55Z

so that if someone needs it, we have an answer for them instead of "sorry, we didn't consider that and you'll have to wait for a fix". it's an escape hatch.

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

bparees · 2021-06-10T16:59:08Z

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

it doesn't matter why it happened. Whether or not there is another bug, it is perfectly valid to fix this command so that it cannot/does not generate ICSPs which are too big to be applied to the cluster.

So we can get back to you on "why did it happen/is there a bug in the generation"(I would assume/hope the OLM team did that investigation before producing this fix, but sure, we can sanity check that) but that is not sufficient reason to block a change that makes the generation logic safer: "we know generated content larger than 1meg is invalid, so break that content up into pieces". We know the limit exists, we should update the command to respect the limit.

And if we agree that it it's reasonable to protect ourselves from etcd limits, then the next question is, "should we also make that protection configurable" for which i again say "i can think of no good reason not to give ourselves that escape hatch. The addition of the flag(which has a sane default value) does not meaningfully impact users (oc adm catalog mirror has 14 flags already, many of which are not "normally" needed)"

bparees · 2021-06-10T20:10:19Z

pkg/cli/admin/catalog/mirror.go

 	for key := range registryMapping {
 		icsp.Spec.RepositoryDigestMirrors = append(icsp.Spec.RepositoryDigestMirrors, operatorv1alpha1.RepositoryDigestMirrors{
 			Source:  key,
 			Mirrors: []string{registryMapping[key]},
 		})
+		y, err := yaml.Marshal(icsp)


@awgreene out of curiousity, did this add any significant processing time? marshalling the icsp once for every mapping entry when there are hundreds/thousands?

I can run a performance check but I did not see a significant change when running the tool locally with ~600 entries.

awgreene · 2021-06-10T21:56:27Z

Thanks for all the conversation @bparees and @soltysh. The crux of the conversation seems to focus primarily around this point:

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

The oc adm mirror catalog is generating the correct number of ICSPs. When generating an ICSP for a catalog with the --icsp-scope flag set to repository a single operator may account for multiple
entries because of the "Related Images" they define which typically exist in their own repository. For context, I've seen a single operator define over 50 Related Images.

I tested the existing oc command generated from the master branch against the 4.8 RH Catalog to better understand the current size of the generated ICSP using the following command:

$ oc adm catalog mirror --icsp-scope=repository registry.redhat.io/redhat/redhat-operator-index:v4.8 quay.io/agreene/new-index --manifests-only`

The ICSP is 93857 bytes large and contains 579 entries, this file will continue to grow as more operators are added to the catalog.

To @soltysh's point, users will probably hit the 262144 byte annotation limit, which might make 250000 bytes a more sane default. However, if any other annotations exist on the ICSP (possibly by means
of a mutating webhook or some other controller), I believe users would still desire to set the ICSP limit using the flag introduced in this PR.

bparees · 2021-06-11T00:25:29Z

To @soltysh's point, users will probably hit the 262144 byte annotation limit, which might make 250000 bytes a more sane default. However, if any other annotations exist on the ICSP (possibly by means
of a mutating webhook or some other controller), I believe users would still desire to set the ICSP limit using the flag introduced in this PR.

currently this PR defaults it to 1,000,000 bytes, right? If we know that a 300,000 byte ICSP will break (because the entire ICSP shows up as an annotation, aiui?), then shouldn't we default it to a 250,000 limit?

In my mind the default should be the lowest value we know will prevent breakage in a standard customer environment. The point of making it user configurable is for the case where a customer environment has higher or lower limits (e.g. they tuned their etcd object size limit).

awgreene · 2021-06-11T03:30:12Z

currently this PR defaults it to 1,000,000 bytes, right? If we know that a 300,000 byte ICSP will break (because the entire ICSP shows up as an annotation, aiui?), then shouldn't we default it to a 250,000 limit?

Correct, I did not know that this annotation size limit existed until I saw @soltysh's review.

In my mind the default should be the lowest value we know will prevent breakage in a standard customer environment. The point of making it user configurable is for the case where a customer environment has higher or lower limits (e.g. they tuned their etcd object size limit).

Agreed.

Problem: It is possible for the `oc adm catalog mirror` command to generate ICSPs that are greater than 262144 bytes in size. ICSPs that exceed 262144 bytes are likely to fail when applied to the cluster if the objec already existed in an early state as the `kubectl.kubernetes.io/last-applied-configuration` annotation will likely exceed the 262144 annotation byte limit. Solution: Introduce a the max-icsp-size flag to the `oc adm catalog mirror` command, allowing users to specify the maximum byte size of ICSP files generated by the command. If an ICSP would exceed this limit, create and begin writting mirrors to a new ICSP. The default ICSP limit is 250000 bytes.

awgreene · 2021-06-11T03:35:52Z

@bparees I set the default to 250000 in the latest version of the PR.

bparees · 2021-06-11T04:44:04Z

/lgtm

@soltysh if you still have concerns we can discuss on slack tomorrow, i'd like to get this fix merged before 4.8 hits code freeze EOD tomorrow.

soltysh · 2021-06-11T11:12:44Z

And if we agree that it it's reasonable to protect ourselves from etcd limits, then the next question is, "should we also make that protection configurable" for which i again say "i can think of no good reason not to give ourselves that escape hatch. The addition of the flag(which has a sane default value) does not meaningfully impact users (oc adm catalog mirror has 14 flags already, many of which are not "normally" needed)"

My default thinking is we know better than user how to properly limit these, rarely do users need to change that limit. From my experience with oc and kubectl I know that the more flags the more confusion we introduce to the users. I'm very reluctant adding more flags just in case, which is how this sounds.

Having said that, I'll give you a free pass on this one 😄

soltysh

/approve
/retest

openshift-ci · 2021-06-11T11:13:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, bparees, kevinrizza, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bparees · 2021-06-11T13:51:41Z

/hold cancel
(i'm taking @soltysh's approval as implicit hold cancelation intent)

Thanks @soltysh, i think it's always a worthwhile discussion when adding more flags/configurability as to whether the added complexity is worth it/necessary.

openshift-bot · 2021-06-11T13:59:23Z

/retest