Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1951203: Allow users to set a limit on ICSP file size #818

Merged
merged 1 commit into from Jun 11, 2021

Conversation

awgreene
Copy link
Contributor

@awgreene awgreene commented May 5, 2021

No description provided.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 5, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: awgreene
To complete the pull request process, please assign ecordell after the PR has been reviewed.
You can assign the PR to them by writing /assign @ecordell in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@awgreene awgreene force-pushed the split-large-icsp branch 4 times, most recently from 303e11e to 6b8f059 Compare May 5, 2021 21:54
pkg/cli/admin/catalog/mirror.go Outdated Show resolved Hide resolved
pkg/cli/admin/catalog/mirror.go Outdated Show resolved Hide resolved
pkg/cli/admin/catalog/mirror.go Outdated Show resolved Hide resolved
@awgreene awgreene force-pushed the split-large-icsp branch 4 times, most recently from 7b6fc2d to da6c904 Compare May 6, 2021 20:23
@awgreene awgreene changed the title WIP: Split large icsp Allow users to set a limit on ICSP file size May 6, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 6, 2021
@kevinrizza
Copy link
Member

Aside from test failures, this lgtm

@awgreene awgreene force-pushed the split-large-icsp branch 5 times, most recently from 0458848 to f5e7b34 Compare May 7, 2021 15:55
@awgreene
Copy link
Contributor Author

awgreene commented May 7, 2021

Unit tests will keep failing until #821 is merged.

@awgreene
Copy link
Contributor Author

awgreene commented May 7, 2021

/test e2e-metal-ipi-ovn-ipv6

@awgreene awgreene changed the title Allow users to set a limit on ICSP file size Bug 1951203: Allow users to set a limit on ICSP file size May 8, 2021
@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label May 8, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 8, 2021

@awgreene: This pull request references Bugzilla bug 1951203, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1951203: Allow users to set a limit on ICSP file size

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 8, 2021
@awgreene
Copy link
Contributor Author

awgreene commented May 8, 2021

/bugzilla refresh

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 8, 2021
@soltysh
Copy link
Member

soltysh commented Jun 10, 2021

so that if someone needs it, we have an answer for them instead of "sorry, we didn't consider that and you'll have to wait for a fix". it's an escape hatch.

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

@bparees
Copy link
Contributor

bparees commented Jun 10, 2021

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

it doesn't matter why it happened. Whether or not there is another bug, it is perfectly valid to fix this command so that it cannot/does not generate ICSPs which are too big to be applied to the cluster.

So we can get back to you on "why did it happen/is there a bug in the generation"(I would assume/hope the OLM team did that investigation before producing this fix, but sure, we can sanity check that) but that is not sufficient reason to block a change that makes the generation logic safer: "we know generated content larger than 1meg is invalid, so break that content up into pieces". We know the limit exists, we should update the command to respect the limit.

And if we agree that it it's reasonable to protect ourselves from etcd limits, then the next question is, "should we also make that protection configurable" for which i again say "i can think of no good reason not to give ourselves that escape hatch. The addition of the flag(which has a sane default value) does not meaningfully impact users (oc adm catalog mirror has 14 flags already, many of which are not "normally" needed)"

for key := range registryMapping {
icsp.Spec.RepositoryDigestMirrors = append(icsp.Spec.RepositoryDigestMirrors, operatorv1alpha1.RepositoryDigestMirrors{
Source: key,
Mirrors: []string{registryMapping[key]},
})
y, err := yaml.Marshal(icsp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awgreene out of curiousity, did this add any significant processing time? marshalling the icsp once for every mapping entry when there are hundreds/thousands?

Copy link
Contributor Author

@awgreene awgreene Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run a performance check but I did not see a significant change when running the tool locally with ~600 entries.

@awgreene
Copy link
Contributor Author

Thanks for all the conversation @bparees and @soltysh. The crux of the conversation seems to focus primarily around this point:

That's why I'm pushing towards fixing the root cause of generating this many ICSP entries over exposing a flag which is confusing and might introduce a set of other problems.

The oc adm mirror catalog is generating the correct number of ICSPs. When generating an ICSP for a catalog with the --icsp-scope flag set to repository a single operator may account for multiple
entries because of the "Related Images" they define which typically exist in their own repository. For context, I've seen a single operator define over 50 Related Images.

I tested the existing oc command generated from the master branch against the 4.8 RH Catalog to better understand the current size of the generated ICSP using the following command:

$ oc adm catalog mirror --icsp-scope=repository registry.redhat.io/redhat/redhat-operator-index:v4.8 quay.io/agreene/new-index --manifests-only`

The ICSP is 93857 bytes large and contains 579 entries, this file will continue to grow as more operators are added to the catalog.

To @soltysh's point, users will probably hit the 262144 byte annotation limit, which might make 250000 bytes a more sane default. However, if any other annotations exist on the ICSP (possibly by means
of a mutating webhook or some other controller), I believe users would still desire to set the ICSP limit using the flag introduced in this PR.

@bparees
Copy link
Contributor

bparees commented Jun 11, 2021

To @soltysh's point, users will probably hit the 262144 byte annotation limit, which might make 250000 bytes a more sane default. However, if any other annotations exist on the ICSP (possibly by means
of a mutating webhook or some other controller), I believe users would still desire to set the ICSP limit using the flag introduced in this PR.

currently this PR defaults it to 1,000,000 bytes, right? If we know that a 300,000 byte ICSP will break (because the entire ICSP shows up as an annotation, aiui?), then shouldn't we default it to a 250,000 limit?

In my mind the default should be the lowest value we know will prevent breakage in a standard customer environment. The point of making it user configurable is for the case where a customer environment has higher or lower limits (e.g. they tuned their etcd object size limit).

@awgreene
Copy link
Contributor Author

currently this PR defaults it to 1,000,000 bytes, right? If we know that a 300,000 byte ICSP will break (because the entire ICSP shows up as an annotation, aiui?), then shouldn't we default it to a 250,000 limit?

Correct, I did not know that this annotation size limit existed until I saw @soltysh's review.

In my mind the default should be the lowest value we know will prevent breakage in a standard customer environment. The point of making it user configurable is for the case where a customer environment has higher or lower limits (e.g. they tuned their etcd object size limit).

Agreed.

Problem: It is possible for the `oc adm catalog mirror` command
to generate ICSPs that are greater than 262144 bytes in size.
ICSPs that exceed 262144 bytes are likely to fail when applied
to the cluster if the objec already existed in an early state as
the `kubectl.kubernetes.io/last-applied-configuration` annotation
will likely exceed the 262144 annotation byte limit.

Solution: Introduce a the max-icsp-size flag to the
`oc adm catalog mirror` command, allowing users to specify the
maximum byte size of ICSP files generated by the command. If
an ICSP would exceed this limit, create and begin writting mirrors
to a new ICSP.

The default ICSP limit is 250000 bytes.
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2021
@awgreene
Copy link
Contributor Author

@bparees I set the default to 250000 in the latest version of the PR.

@bparees
Copy link
Contributor

bparees commented Jun 11, 2021

/lgtm

@soltysh if you still have concerns we can discuss on slack tomorrow, i'd like to get this fix merged before 4.8 hits code freeze EOD tomorrow.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2021
@soltysh
Copy link
Member

soltysh commented Jun 11, 2021

And if we agree that it it's reasonable to protect ourselves from etcd limits, then the next question is, "should we also make that protection configurable" for which i again say "i can think of no good reason not to give ourselves that escape hatch. The addition of the flag(which has a sane default value) does not meaningfully impact users (oc adm catalog mirror has 14 flags already, many of which are not "normally" needed)"

My default thinking is we know better than user how to properly limit these, rarely do users need to change that limit. From my experience with oc and kubectl I know that the more flags the more confusion we introduce to the users. I'm very reluctant adding more flags just in case, which is how this sounds.

Having said that, I'll give you a free pass on this one 😄

Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awgreene, bparees, kevinrizza, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2021
@bparees
Copy link
Contributor

bparees commented Jun 11, 2021

/hold cancel
(i'm taking @soltysh's approval as implicit hold cancelation intent)

Thanks @soltysh, i think it's always a worthwhile discussion when adding more flags/configurability as to whether the added complexity is worth it/necessary.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 9c9a128 into openshift:master Jun 11, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 11, 2021

@awgreene: All pull requests linked via external trackers have merged:

Bugzilla bug 1951203 has been moved to the MODIFIED state.

In response to this:

Bug 1951203: Allow users to set a limit on ICSP file size

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@soltysh
Copy link
Member

soltysh commented Jun 14, 2021

(i'm taking @soltysh's approval as implicit hold cancelation intent)

sorry I forgot 😅 thx

Thanks @soltysh, i think it's always a worthwhile discussion when adding more flags/configurability as to whether the added complexity is worth it/necessary.

💯

icsps := [][]byte{}

for i := 0; len(registryMapping) != 0; i++ {
icsp, err := generateICSP(out, source+"-"+strconv.Itoa(i), maxICSPSize, registryMapping)
Copy link

@orenc1 orenc1 Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awgreene , is there a reason to add a suffix for the name of the ICSP object we are creating?
That way, the generated object name is different than the name the user requested.

/cc @tiraboschi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may have another unintended consequence.
If we create an ICSP with name "myname" it will be created with "myname-0"
if we check for an ICSP with name "myname" it will not be there. So if we then create another ICSP with the name "myname" it will clash on "myname-0" already existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awgreene , is there a reason to add a suffix for the name of the ICSP object we are creating?
That way, the generated object name is different than the name the user requested.

Good morning @orenc1, thanks for taking a look at this PR. This is an intentional change. The ICSPs generated when mirroring each repository can easily exceed the:

  • Maximum annotation key size when using the oc apply command
  • Maximum ETCD object store size

Using the user provided name would fail to handle either of the situations described above as the generated ICSP would be rejected when applied to the cluster.

this may have another unintended consequence.
If we create an ICSP with name "myname" it will be created with "myname-0"
if we check for an ICSP with name "myname" it will not be there. So if we then create another ICSP with the name "myname" it will clash on "myname-0" already existing.

This is true but I fail to understand how this is an issue. The second attempt to create the ICSP should notify the user that the ICSP already exists and that the ICSP has already been applied to the cluster. Furthermore, if you are concerned about removing the ICSPs, the oc adm catalog mirror --help command provides an example command for cleaning up the ICSPs it has previously generated..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @InbarRose ^^

@openshift-ci openshift-ci bot requested a review from tiraboschi July 20, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants