New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add image soft prune (api object yes, registry no) #17480
add image soft prune (api object yes, registry no) #17480
Conversation
results from will squash once all feedback on code changes is addressed |
I'm dubious about the flag name |
On Mon, Nov 27, 2017 at 3:07 PM, Jim Minter ***@***.***> wrote:
I'm dubious about the flag name --leave-registry-alone, otherwise looking
good. Does @openshift/cli-review
<https://github.com/orgs/openshift/teams/cli-review> need to sign off?
I did struggle with the name. I'll think about it some more, but certainly
welcome suggestions.
And yes, we need cli-review signoff. I was going to wait to complete the
devexp review first, especially feedback from @bparees since
he is out today. But either now or later works.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17480 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadHdNIYotuq0jvCmnz0qr_3uIQKc4ks5s6xZwgaJpZM4QsHRC>
.
|
unit test failure was the recently discovered flake /race condition #17472 |
test/extended/images/prune.go
Outdated
} | ||
}) | ||
|
||
g.It("should prune old image with config but leave registry alone", func() { testSoftPruneImages(oc) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ with config//
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of next push
pkg/oc/admin/prune/images.go
Outdated
@@ -162,6 +165,7 @@ func NewCmdPruneImages(f *clientcmd.Factory, parentName, name string, out io.Wri | |||
cmd.Flags().StringVar(&opts.CABundle, "certificate-authority", opts.CABundle, "The path to a certificate authority bundle to use when communicating with the managed Docker registries. Defaults to the certificate authority data from the current user's config file. It cannot be used together with --force-insecure.") | |||
cmd.Flags().StringVar(&opts.RegistryUrlOverride, "registry-url", opts.RegistryUrlOverride, "The address to use when contacting the registry, instead of using the default value. This is useful if you can't resolve or reach the registry (e.g.; the default is a cluster-internal URL) but you do have an alternative route that works. Particular transport protocol can be enforced using '<scheme>://' prefix.") | |||
cmd.Flags().BoolVar(&opts.ForceInsecure, "force-insecure", opts.ForceInsecure, "If true, allow an insecure connection to the docker registry that is hosted via HTTP or has an invalid HTTPS certificate. Whenever possible, use --certificate-authority instead of this dangerous option.") | |||
cmd.Flags().BoolVar(opts.LeaveRegistryAlone, "leave-registry-alone", *opts.LeaveRegistryAlone, "If true, the prune operation will clean up image API objects in master and etcd, but none of the associated content in the registry is removed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about --metadata-only
?
s/in master and etcd/in etcd/
I'd add a recommendation to run this only in case of enormously large etcd/too many images and errors from pruning blobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommendation / string change part of next push
yeah, to elaborate on my thought process here, I considered things like --soft-prune
and --metadata-only
but felt like they were terminology that speaks to use and developers/owners of the code, but perhaps wouldn't to users with less familiarity / background
where as --leave-registry-alone
, klunky as it might be speaks of the registry concept which the user should be familiar with.... I also considered --do-not-prune-registry
. At the time, I thought that was not as good, but in hindsight, perhaps it is better than --leave-registry-alone
.
@miminar @jim-minter @bparees thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If true, the prune operation will clean up image API objects in master and etcd, but none of the associated content in the registry is removed."
->
"If true, the prune operation will clean up image API objects, but the none of the associated content in the registry is removed."
As for the flag name, "--metadata-only", "--skip-registry", "--do-not-prune-registry", or "--skip-blobs" would all be preferred for me, in the end this flag is going to come down to how good the help+doc are(the flag name is never going to convey enough information), so i'd rather have a "clean" name than a clunky one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll go with --skip-registry
Change of first sentence of help to @bparees recommendation part of next push
re: the recommendation from @miminar on when to run, my next push currently has:
This option make most sense when there is either a large amount of data in etcd and/or a large number of images or blobs. You would then use this option to clean up etcd, and use the 'hard prune' administrative task (https://docs.openshift.org/latest/admin_guide/pruning_resources.html#hard-pruning-registry) to clean up the registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure where you're intending for that text to appear, but i'd suggest it should really only be part of the documentation. It's too wordy to convey in CLI help. (Also direct links to the doc in CLI output are problematic because the OCP version of the CLI should send users to the OCP docs, not the origin docs (and in particular, to the specific OCP docs for their version).
We need a general image pruning recommendation/best-practices guide (should also cover what pruning does (what will be pruned under what conditions, what references are respected, etc) and this would belong there imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I was proposing putting that text in the CLI help in response to @miminar 's suggestion
I'll remove and just stick to the first sentence.
Of course the best practice guide you mention sounds useful. Based on the exchanges in the card, I had only planned on making the necessary (and smaller in scope) changes specific to the new option in the existing https://docs.openshift.org/latest/admin_guide/pruning_resources.html#pruning-images where a broader best practice guide would be beyond that scope. That said, perhaps this new sentence would be part of that update.
Thoughts?
pkg/oc/admin/prune/images.go
Outdated
@@ -412,6 +417,10 @@ func (o PruneImagesOptions) Run() error { | |||
fmt.Fprintln(o.ErrOut, "Dry run enabled - no modifications will be made. Add --confirm to remove images") | |||
} | |||
|
|||
if o.LeaveRegistryAlone != nil && *o.LeaveRegistryAlone { | |||
fmt.Fprintln(o.Out, "Soft prune enabled - only API objects will be removed. No modifications to the image registry will be made.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, don't call it soft prune. It doesn't make any sense.
LeaveRegistryAlone makes the kube objects pruner more robust, but it still may remove too much, it doesn't make the pruner safer.
We have pruners for kubernetes objects (and I think we should split it in two smaller parts: for image streams and for images) and for a registry storage. Both of them are pretty dangerous, but both of them must be used by customers in some cases. None of them should be emphasised as safer and more delicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the "it is not safer" point @dmage - while that was in the card's description originally, that discussion point came up during planning as the comment was removed from the card description.
I'm fine for not using "soft prune" ... this does IMO tie in with the discussion / struggle with what the name of the parameter should be (see my comments above where I explicitly did not using --soft-prune
).
I'm open to suggestions on the message wording .... the pattern does not mimic the Dry run enabled - ..
message above, but how ab out simply removing Soft prune enabled -
and start with Only API objects will be...
?
On the splitting things up to an option for only deleting image streams and an option for only deleting images, I don't think I can assess the strength of the potentially decreased usability (i.e. more options) vs. the increased safety of further dissecting things. I would defer to you and the other SMEs here, though think I should wait on @bparees take before committing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't mind keeping the soft prune naming when accompanied by a strong warning that the registry will be left with unprunable objects. Prunable only via hard-prune. I like the idea of hard prune being a complement of soft prune. What should we call the regular prune 😄?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: pruning imagestreams and images separately:
removing images w/o updating imagestreams just creates a bunch of broken content.
removing imagestreams w/o removing images just creates a bunch of hidden/mostly-inaccessible content
so while I appreciate the idea from a perspective of reducing the granularity and potentially making each operation cleaner/safer, I don't think there's a compelling motivation for it right now. Either way it would be a follow on to this effort if we did it at all, so i think we can table it for now.
For the text, i'm ok w/ just removing the soft prune part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10-4 part of next push thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmage when i said "image" i meant the image etcd object, not the registry data. So given that understanding:
removing images w/o updating imagestreams just creates a bunch of broken content.
removing imagestreams w/o removing images just creates a bunch of hidden/mostly-inaccessible content
I didn't suggest this. This breaks consistency ans it's a wrong way.
isn't that exactly what you're suggestring here (remove image etcd objects and imagestream etcd objects independently):
We have pruners for kubernetes objects (and I think we should split it in two smaller parts: for image streams and for images)
and:
removing imagestreams w/o removing images just creates a bunch of hidden/mostly-inaccessible content
Yep, exactly what we are doing in this PR. Though, these hidden content can be reused later.
no, what we're doing in this PR is removing the image+imagestreamtag etcd data but not removing the registry blob data.
So to summarize, there are 3 types of things that get pruned:
- registry blob data
- image etcd definitions
- imagestreamtag references to image etcd definitions
This PR is going to allow 1 and 2/3 to be done independently.
It sounds like you were suggesting 2+3 should also be possible to perform independently, and I am suggesting that doing 2+3 independently doesn't add value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you remove image streams and images from etcd, you create a bunch of hidden/mostly-inaccessible content (I mean blobs on the storage).
Removing images from etcd (2) is just a garbage collection routine (remove images that are not referenced in any image stream), in theory it should be absolutely safe and it can be performed autonomously. It doesn't visible for a regular user, only cluster admins care about it.
On the other side, removing tags from image streams and removing image streams from etcd (3) may affect users: it makes some data inaccessible. And, as the integrated registry may be exposed, only the user may remove items from his image streams in a safe way. We cannot guarantee safety of this prune in general, but it still useful for a lot of users.
Therefore, 2 and 3 provide different safety levels. That's why I suggest to split this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i can see the value in being able to prune just the un-referenced image etcd objects. we should consider it for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that 2 is completely:
- pruner decides to prune image I since it's not referenced
- user re-pushes I to his repository
- pruner deletes I
- user has imagestreamtag pointing to the deleted I
I don't think we can put the burden of image stream pruning on all the cluster users - even if we do, there will always be irresponsible users building many images without pruning (e.g. online). Admins will want to prune their namespaces as well saying "each user can have up to 3 image revisions per istag, up to 5 is, etc...". With istag pruning immunity in place, users can request it for their imagestream(tag)s and be safe from the dangerous pruning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miminar that could only happen if the user re-pushes image I w/ the exact same imageid and does so at the same time that pruning is running and selected image I for pruning because it was not referenced by an imagestream. So a risk, but a pretty small one.
Anyway again, these are all discussion points for the larger "how do we rethink pruning" conversation, but independent of the incremental improvement in this PR.
latest responses to comments pushed in new commit |
contrib/completions/bash/oc
Outdated
@@ -5285,6 +5285,8 @@ _oc_adm_prune_images() | |||
local_nonpersistent_flags+=("--keep-tag-revisions=") | |||
flags+=("--keep-younger-than=") | |||
local_nonpersistent_flags+=("--keep-younger-than=") | |||
flags+=("--leave-registry-alone") | |||
local_nonpersistent_flags+=("--leave-registry-alone") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to regen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - thanks - will push when make update finishes
unit test failure was #17472 again |
4bd003b
to
bedd992
Compare
/retest |
the gce conformance test failure was because it could not provision the test cluster given the certificate issue Michalis recently noted on the mailing list, I'm going to hold until I see an announcement there before kicking off a redo of the test |
/test extended_conformance_gce |
/cc @openshift/cli-review |
pkg/image/prune/prune.go
Outdated
@@ -118,6 +119,9 @@ type PrunerOptions struct { | |||
PruneOverSizeLimit *bool | |||
// AllImages considers all images for pruning, not just those pushed directly to the registry. | |||
AllImages *bool | |||
// LeaveRegistryAlone allows for pruning of the API Objects in etcd, but bypass any associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix godoc
/unassign @dcbw |
opened flake #17518 for the extended networking minimal failure |
opened flake #17519 for first conformance install flake |
there was a yum mirror failure on the conformance gce |
/retest |
@openshift/api-review @openshift/cli-review please approve the new argument being added to pruning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/oc/admin/prune/images.go
Outdated
@@ -162,6 +165,7 @@ func NewCmdPruneImages(f *clientcmd.Factory, parentName, name string, out io.Wri | |||
cmd.Flags().StringVar(&opts.CABundle, "certificate-authority", opts.CABundle, "The path to a certificate authority bundle to use when communicating with the managed Docker registries. Defaults to the certificate authority data from the current user's config file. It cannot be used together with --force-insecure.") | |||
cmd.Flags().StringVar(&opts.RegistryUrlOverride, "registry-url", opts.RegistryUrlOverride, "The address to use when contacting the registry, instead of using the default value. This is useful if you can't resolve or reach the registry (e.g.; the default is a cluster-internal URL) but you do have an alternative route that works. Particular transport protocol can be enforced using '<scheme>://' prefix.") | |||
cmd.Flags().BoolVar(&opts.ForceInsecure, "force-insecure", opts.ForceInsecure, "If true, allow an insecure connection to the docker registry that is hosted via HTTP or has an invalid HTTPS certificate. Whenever possible, use --certificate-authority instead of this dangerous option.") | |||
cmd.Flags().BoolVar(opts.SkipRegistry, "skip-registry", *opts.SkipRegistry, "If true, the prune operation will clean up image API objects, but the none of the associated content in the registry is removed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My ingliš is not perfect, so I may be easily wrong, but the the between but and none seems superfluous to me.
The current name LGTM from a cli perspective. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
image registry, conformance install - prob building openshift-ansible /retest |
@@ -1006,9 +1015,11 @@ func (p *pruner) Prune( | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prunableComponents := getPrunableComponents(p.g, prunableImageIDs)
What is the reason in calculating components if they are not used afterwards in case of algorithm.pruneRegistry == true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed / good catch / part of next push
pkg/image/prune/prune.go
Outdated
@@ -1006,9 +1015,11 @@ func (p *pruner) Prune( | |||
|
|||
var errs []error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move errs
after algorithm.pruneRegistry == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no that causes a compile error since errs
is referenced both within that algorithm.pruneRegistry == true
block as well as after that block
@@ -412,6 +417,10 @@ func (o PruneImagesOptions) Run() error { | |||
fmt.Fprintln(o.ErrOut, "Dry run enabled - no modifications will be made. Add --confirm to remove images") | |||
} | |||
|
|||
if o.PruneRegistry != nil && !*o.PruneRegistry { | |||
fmt.Fprintln(o.Out, "Only API objects will be removed. No modifications to the image registry will be made.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a warning that re-doing the pruning without this option (or with --prune-registry=true
) will not lead to pruning the storage of image registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point it's too late to show a warning for the first time, the user won't be asked for any additional confirmations. Should we add this warning to help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were previous comments around keeping the help text concise, so that gives me pause, but yeah if we are going to warn, I see no other place to put it where the warning would do some good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I previously had a reference to hard prune in the help text before and @bparees had me remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that was prior to my understanding that once you run this, hard prune is the only way to finish cleaning up the registry. I'm ok w/ a concise warning in the help text that references the need to run hard prune (not regular prune) to clean up the registry after doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning to help added and pushed
@@ -162,6 +165,7 @@ func NewCmdPruneImages(f *clientcmd.Factory, parentName, name string, out io.Wri | |||
cmd.Flags().StringVar(&opts.CABundle, "certificate-authority", opts.CABundle, "The path to a certificate authority bundle to use when communicating with the managed Docker registries. Defaults to the certificate authority data from the current user's config file. It cannot be used together with --force-insecure.") | |||
cmd.Flags().StringVar(&opts.RegistryUrlOverride, "registry-url", opts.RegistryUrlOverride, "The address to use when contacting the registry, instead of using the default value. This is useful if you can't resolve or reach the registry (e.g.; the default is a cluster-internal URL) but you do have an alternative route that works. Particular transport protocol can be enforced using '<scheme>://' prefix.") | |||
cmd.Flags().BoolVar(&opts.ForceInsecure, "force-insecure", opts.ForceInsecure, "If true, allow an insecure connection to the docker registry that is hosted via HTTP or has an invalid HTTPS certificate. Whenever possible, use --certificate-authority instead of this dangerous option.") | |||
cmd.Flags().BoolVar(opts.PruneRegistry, "prune-registry", *opts.PruneRegistry, "If false, the prune operation will clean up image API objects, but the none of the associated content in the registry is removed. Note, if only image API objects are cleaned up through use of this flag, the only means for subsequently cleaning up registry data corresponding to those image API objects is to employ the 'hard prune' administrative task.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabemontero this warning looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool - thanks @legionus
conformance install failure was flake #17595 |
pkg/image/prune/prune.go
Outdated
errs = append(errs, pruneImageComponents(p.g, p.registryClient, p.registryURL, prunableComponents, layerLinkPruner)...) | ||
errs = append(errs, pruneBlobs(p.g, p.registryClient, p.registryURL, prunableComponents, blobPruner)...) | ||
errs = append(errs, pruneManifests(p.g, p.registryClient, p.registryURL, prunableImageNodes, manifestPruner)...) | ||
} | ||
|
||
if len(errs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabemontero how about this condition ? please move it too inside if p.algorithm.pruneRegistry
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep part of next push
lgtm and thanks for doing the revisions in separate commits! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cd91deb
to
f74100c
Compare
commits squashed |
last conformance install failure at https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17480/test_pull_request_origin_extended_conformance_install/3530/ looks like flake #17605 but minus the downward.api piece ... presuming same root cause, just a slightly different flavor |
/hold cancel |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, gabemontero The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
cmd test hit flake #17574 |
conformance install hit flake #17605 again |
/retest |
Automatic merge from submit-queue. |
https://trello.com/c/V1anqimn/1418-5-image-soft-prune-imagestreams
@openshift/sig-developer-experience ptal