Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry pruning improvements #3333

Closed
ncdc opened this issue Jun 19, 2015 · 44 comments
Closed

Registry pruning improvements #3333

ncdc opened this issue Jun 19, 2015 · 44 comments

Comments

@ncdc
Copy link
Contributor

ncdc commented Jun 19, 2015

When pruning images and layers, the oadm prune images command:

  1. determines which images and layers can be pruned
  2. removes images and updates image streams in etcd
  3. makes multiple http requests to the registry to prune layers and signatures

Instead of sending multiple http requests to the registry, it would be better to store information about which layers and signatures the registry needs to prune. The registry could periodically query OpenShift for that data and act on it.

This has multiple benefits:

  1. a potential significant reduction in the number of http requests needed to prune content from the registry
  2. less divergence from the upstream registry code base - we had to modify its internals pretty heavily to support deleting signatures and deleting layers via http requests

HA note - if there are multiple replicas of the registry pod, it is probably a good idea to have the registry's pruning task scheduled a bit randomly. For example, replica 1 might run the job at 13 minutes after the hour, while replica 2 might run it at 27 minutes after the hour.

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

QE has uncovered at least 1 issue with image pruning: if you delete an image stream, we lose the path from stream -> image -> layer, so the code currently doesn't prune a layer in this case (https://bugzilla.redhat.com/show_bug.cgi?id=1235148).

In the short term, we could modify the pruning code to fix this bug to at least delete the blobs (layer links would still exist, however). But there are still some gaps. For example, you can delete an image stream and then recreate it, and the registry's data about the deleted image stream (manifest signatures and layer links) are still there.

Ultimately, what I would like to do is this:

  1. When an image stream is deleted, if it uses the integrated registry, add a record to a new etcd list (I'll call it imagestreamdeletions for now).
  2. When an image is deleted, if it's stored in the integrated registry, add a record to a new etcd list (imagedeletions) for each image stream that references it
  3. Modify oadm prune images to:
    1. Calculate prunable images and delete them from etcd
    2. Calculate prunable layers and the streams that reference them and add a record to a new etcd list (imagelayerdeletions)
    3. Remove image references from all relevant image streams' status.tags history
  4. Add goroutine(s) to the registry to:
    1. get data from imagestreamdeletions and delete the corresponding repo directory in storage
    2. get data from imagedeletions and delete the corresponding manifest directory in storage
    3. get data from imagelayerdeletions and delete the corresponding blobs and repo layer links in storage

@smarterclayton @deads2k @jwhonce @brenton @sdodson what do you think?

@smarterclayton
Copy link
Contributor

On Jun 30, 2015, at 10:50 AM, Andy Goldstein notifications@github.com
wrote:

QE has uncovered at least 1 issue with image pruning: if you delete an
image stream, we lose the path from stream -> image -> layer, so the code
currently doesn't prune a layer in this case (
https://bugzilla.redhat.com/show_bug.cgi?id=1235148).

In the short term, we could modify the pruning code to fix this bug to at
least delete the blobs (layer links would still exist, however). But there
are still some gaps. For example, you can delete an image stream and then
recreate it, and the registry's data about the deleted image stream
(manifest signatures and layer links) are still there.

Ultimately, what I would like to do is this:

  1. When an image stream is deleted, if it uses the integrated registry,
    add a record to a new etcd list (I'll call it imagestreamdeletions for
    now).
  2. When an image is deleted, if it's stored in the integrated registry,
    add a record to a new etcd list (imagedeletions) for each image stream
    that references it
  3. Modify oadm prune images to:
    1. Calculate prunable images and delete them from etcd
    2. Calculate prunable layers and the streams that reference them and
      add a record to a new etcd list (imagelayerdeletions)
    3. Remove image references from all relevant image streams'
      status.tags history
      1. Add goroutine(s) to the registry to:
    4. get data from imagestreamdeletions and delete the corresponding
      repo directory in storage
    5. get data from imagedeletions and delete the corresponding manifest
      directory in storage
    6. get data from imagelayerdeletions and delete the corresponding
      blobs and repo layer links in storage

@smarterclayton https://github.com/smarterclayton @deads2k
https://github.com/deads2k @jwhonce https://github.com/jwhonce @brenton
https://github.com/brenton @sdodson https://github.com/sdodson what do
you think?

As a counter argument - why not just periodically read all layers not
referenced by an image manifest or link as a separate call?


Reply to this email directly or view it on GitHub
#3333 (comment).

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

As a counter argument - why not just periodically read all layers not referenced by an image manifest or link as a separate call?

In the registry's code? I'd like to avoid walking the registry's storage if possible, since it may not be an efficient or cheap operation for storage backends like S3.

I am working on a patch to at least be able to delete the blobs, relieving the storage burden. But it won't handle deleting the layer links.

@smarterclayton
Copy link
Contributor

On Jun 30, 2015, at 11:27 AM, Andy Goldstein notifications@github.com
wrote:

As a counter argument - why not just periodically read all layers not
referenced by an image manifest or link as a separate call?

In the registry's code? I'd like to avoid walking the registry's storage if
possible, since it may not be an efficient or cheap operation for storage
backends like S3.

I am working on a patch to at least be able to delete the blobs, relieving
the storage burden. But it won't handle deleting the layer links.

But don't we eventually have to have this check? What happens when a
customer runs into this? We're not going to write perfect storage code, so
why not write the fundamental "make this clean". And with S3, don't we
have at least basic sweep functionality?


Reply to this email directly or view it on GitHub
#3333 (comment).

@deads2k
Copy link
Contributor

deads2k commented Jun 30, 2015

But don't we eventually have to have this check? What happens when a
customer runs into this? We're not going to write perfect storage code, so
why not write the fundamental "make this clean". And with S3, don't we
have at least basic sweep functionality?

Do we care if we miss a few? I could sort of see wanting to prune dead links, but without knowing how good the backing storage is, using it as a primary mechanism seems to open us up to severe performance issues.

Also, lack of quoting makes it really hard to follow what you're saying.

@deads2k
Copy link
Contributor

deads2k commented Jun 30, 2015

@smarterclayton If you must reply without quoting, can you top-post the comment to make it a little easier?

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

It would be a good thing to have some code in the registry that deletes unreferenced data. The problem is that all the references to layers are stored in etcd in /images. The registry would need to get the list of all images (which contains the referenced layers), then walk each blob and/or repo layer link and determine if it's referenced or not.

But I do also think that the proposed refactoring gives us a better implementation overall.

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

I guess we could do this periodically (all code running in the registry):

  1. Construct a graph of all OpenShift-managed images and layers
  2. Walk the list of repos in storage
    1. For each repo, try to get the corresponding ImageStream from OpenShift
    2. If 404, delete the repo dir from storage
    3. Otherwise:
      1. For each repo manifest dir, if there's no corresponding image in the graph, delete the manifest dir
      2. For each layer link, if there's no corresponding layer in the graph, delete the link

Walking the blobs will be slightly more difficult, as it contains signatures and layers, without any indication as to file type. I guess we'd need to add all the manifests' signatures from all repos to the graph, and then at the end we'd be able to determine if there are any unreferenced blobs lying around that we can purge.

I am concerned about how the performance would be, as it's directly related to the number of files (blobs, repos, signatures, layer links) in storage.

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

And if we do ^^^, then we wouldn't need to store any additional information in etcd about deleted streams, images, layers, I wouldn't think.

@smarterclayton
Copy link
Contributor

On Jun 30, 2015, at 11:38 AM, Andy Goldstein notifications@github.com
wrote:

It would be a good thing to have some code in the registry that deletes
unreferenced data. The problem is that all the references to layers are
stored in etcd in /images. The registry would need to get the list of all
images (which contains the referenced layers), then walk each blob and/or
repo layer link and determine if it's referenced or not.


That could easily be a part of the prune command - I don't see that is a
significant objection


But I do also think that the proposed refactoring gives us a better
implementation overall.


Reply to this email directly or view it on GitHub
#3333 (comment).


I think my point is that it is more important that we have a solid way to
catch unreferenced images and links in the store, and have that solution
for filesystems and s3. I'd rather not get a P0 "My s3 bucket has 20TB of
data and I have 100k images" and have to hack something together.

Why is unreferenced data in real storage not a problem? Are we positive
Docker doesn't leak images in crash scenarios?

@smarterclayton
Copy link
Contributor

I believe you can generate 1 during the prune, and the if necessary supply
that as an input to the later steps.

Performance can be solved. Ultimately this is a 2pc problem and you are
going to have leftovers in the store. You'll have to handle tombstoning
somehow, but I don't know that we have to start in etcd.


On Jun 30, 2015, at 11:50 AM, Andy Goldstein notifications@github.com
wrote:

I guess we could do this periodically (all code running in the registry):

  1. Construct a graph of all OpenShift-managed images and layers
  2. Walk the list of repos in storage
    1. For each repo, try to get the corresponding ImageStream from
      OpenShift
    2. If 404, delete the repo dir from storage
    3. Otherwise:
      1. For each repo manifest dir, if there's no corresponding image
        in the graph, delete the manifest dir
      2. For each layer link, if there's no corresponding layer in the
        graph, delete the link

Walking the blobs will be slightly more difficult, as it contains
signatures and layers, without any indication as to file type. I guess we'd
need to add all the manifests' signatures from all repos to the graph, and
then at the end we'd be able to determine if there are any unreferenced
blobs lying around that we can purge.

I am concerned about how the performance would be, as it's directly related
to the number of files (blobs, repos, signatures, layer links) in storage.


Reply to this email directly or view it on GitHub
#3333 (comment).

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

That could easily be a part of the prune command - I don't see that is a significant objection

Not easily. The prune command could get a list of all images in etcd, but there's no API in the registry for a client to drive walking blobs and layer links to determine if they're unreferenced. I really think this has to be initiated by the registry.

I think my point is that it is more important that we have a solid way to
catch unreferenced images and links in the store, and have that solution
for filesystems and s3. I'd rather not get a P0 "My s3 bucket has 20TB of
data and I have 100k images" and have to hack something together.

Why is unreferenced data in real storage not a problem? Are we positive
Docker doesn't leak images in crash scenarios?

If you want to catch unreferenced images/repos/links, then I really think we need to write code in the registry to do this.

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

I believe you can generate 1 during the prune, and the if necessary supply
that as an input to the later steps.

Generate 1 what? Graph?

@smarterclayton
Copy link
Contributor

Item 1 - the list of all referenced images and the layers they reference.
We don't have to do an expensive prune every hour.

On Jun 30, 2015, at 11:59 AM, Andy Goldstein notifications@github.com
wrote:

I believe you can generate 1 during the prune, and the if necessary supply
that as an input to the later steps.

Generate 1 what? Graph?


Reply to this email directly or view it on GitHub
#3333 (comment).

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

Item 1 - the list of all referenced images and the layers they reference.
We don't have to do an expensive prune every hour.

Ok... but right now oadm prune images is run on the command line, and in order to walk the registry's storage you need to have that code run in the registry.

I guess we could create another new customized route in the registry that takes graph data as input and uses it to determine what's unreferenced. I'd really prefer not to do that, however.

Or, run the registry's internal pruning code less frequently - say once or twice a day instead of every hour.

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

cc @liggitt too

@ncdc
Copy link
Contributor Author

ncdc commented Jun 30, 2015

I'd like to come up with a list of action items and all agree on the path forward. To recap, here are the moving parts:

OpenShift owns:

  • images
  • image streams, including tags for each stream

Registry owns:

  • blobs (global for the registry)
    • signatures
    • layers
  • signatures per repo+manifest
  • layer links per repo

A user runs oadm prune images, which determines all the images that can be pruned, all the image streams referencing these images, all the layers that can be pruned, and all the image streams referencing these layers. The call to oadm prune images is the single thing that coordinates deleting content from OpenShift and the registry right now.

If an image stream is deleted and prune is then called, we have no way of knowing that the deleted stream referenced images and/or layers that need to be pruned. #3521 is 1 way to partially address this, but we probably still want a full resync to take place in the registry.

Here's my thinking about responsibilities for oadm prune images vs the registry:

oadm prune images

  • determines which images can be pruned
  • deletes images from OpenShift
  • removes references to pruned images from image streams status.tags

registry

  • removes orphaned blobs
  • removes orphaned repo dirs
  • removes orphaned repo manifest dirs (signatures)
  • removes orphaned repo layer links

The registry removal task can run anywhere that has access to the registry's storage. This could be code in the registry itself or a sidecar container perhaps. To determine what is orphaned, the registry will need a graph just like oadm prune images uses, but it can be smaller, since only image streams and images are needed as inputs. Pruning orphaned blobs might pose a challenge, as the blobs dir contains signatures and layers without any type information about each blob. We'd potentially need to walk the entire registry storage tree, adding all signature links and layer links. After we have that, we'd be able to remove any blob not in that list.

What do you all think about this separation of concerns?

@deads2k
Copy link
Contributor

deads2k commented Jul 1, 2015

What do you all think about this separation of concerns?

I like the idea. Given that we run our plugins in the registry, could we simply build the reverse links to the blobs and/or the type information about the blobs in a separate location? That would alleviate performance concerns.

@ncdc
Copy link
Contributor Author

ncdc commented Jul 1, 2015

I'd have to look into that. I'm not sure that it would be easy to do without some breaking changes to the upstream registry code, which we'd have to carry and cherry pick each time we rebase.

Ah... doesn't look like upstream is writing the media types to disk yet - that's what we'd need (https://github.com/docker/distribution/blob/master/registry/storage/blobstore.go#L81).

@pweil-
Copy link
Contributor

pweil- commented Jul 20, 2015

marked as p1 after convo with @ncdc - we are carrying upstream hacks to docker/distribution that makes it difficult to bump the godep

@miminar
Copy link

miminar commented Oct 7, 2015

The easiest, but hackish, solution to option 2 would be to re-verify that layers exist after the manifest is written. If not, delete the manifest and fail. Client will retry. And if not, left-overs will be collected upon next GC run.
Nah, scratch that. There are too many races possible. For example:

  1. upload of image started
  2. GC started (reading blobs, layers and manifests)
  3. layers uploaded, starting manifest upload
  4. GC collected all the data, initiating delete
  5. manifest upload finished, linking to repository, returning OK
  6. GC finished deletion

The result is a "successfully" uploaded image without blobs stored in the registry. I can't think of any viable way of running stand-alone GC in registry without writes being disabled.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 7, 2015

@miminar as I've mentioned previously, we must continue to use the graph we build from the Kubernetes and OpenShift resources (pods, RCs, DCs, BCs, builds, ISes, etc) to have an accurate picture of what images and layers are or are not in use.

While there is a race possible, it would look something like this:

  1. Push a bunch of images and layers over time.
  2. Time passes
  3. You get an image stream that has this history for a tag:
    1. dockerImageReference = 172.x.x.x:5000/project/stream@revision4
      1. various layers, but not layer123
    2. dockerImageReference = 172.x.x.x:5000/project/stream@revision3
      1. various layers, but not layer123
    3. dockerImageReference = 172.x.x.x:5000/project/stream@revision2
      1. various layers, but not layer123
    4. dockerImageReference = 172.x.x.x:5000/project/stream@revision1
      1. layer123 and others
  4. Run oadm prune images, with keep-tag-revisions=3

At this point, assuming nothing referenced the image known as 172.x.x.x:5000/project/stream@revision1, it would be a candidate for pruning. Similarly, layer123 would be a candidate for pruning if no in use images reference it. The assumption here is that revision1 is an old image (e.g. > 30, 60, 90 days, whatever the prune setting is). And given that it's an old image, it's not in use, and nothing else references layer123, the only way we'd have a race is if someone was pushing a new image that happened to reference layer123 and the same time we were in the process of pruning it. It's certainly possible, and we probably should address it, eventually.

@smarterclayton @pweil- your thoughts?

@pweil-
Copy link
Contributor

pweil- commented Oct 7, 2015

It seems like a fairly unlikely occurrence that has a workaround (re-push the image). I imagine that (for now) pruning would be a regularly scheduled administrative task that could largely avoid this by scheduling it during low-use times. Obviously that doesn't work for every scenario.

@miminar
Copy link

miminar commented Oct 7, 2015

@ncdc What you say is true in current scenario where OpenShift instructs registry which repositories, manifests and layers can be removed.
If, however, the registry is given just manifests (and optionally repositories) to remove and a responsibility to delete anything orphaned (which is suggested by above comment as a way to go), then every push uploading any new layer, occurring at the same time, will be affected as well.

If a severity of this is is indeed low, I have no other objection. I'd just like to ensure we're on the same board and aware of the consequences.

@smarterclayton
Copy link
Contributor

We should make a note of the behavior in the pruning definition and call it
out - ultimately we can't prevent it, and the user can just retry the push.

On Wed, Oct 7, 2015 at 10:14 AM, Michal Minar notifications@github.com
wrote:

@ncdc https://github.com/ncdc What you say is true in current scenario
where OpenShift instructs registry which repositories, manifests and layers
can be removed.
If, however, the registry is given just manifests (and optionally
repositories) to remove and a responsibility to delete anything orphaned
(which is suggested by above comment
#3333 (comment)
as a way to go), then every push uploading any new layer, occurring at
the same time, will be affected as well.

If a severity of this is is indeed low, I have no other objection. I'd
just like to ensure we're on the same board and aware of the consequences.


Reply to this email directly or view it on GitHub
#3333 (comment).

@ncdc
Copy link
Contributor Author

ncdc commented Oct 7, 2015

@miminar brings up a good point about my suggestion from above about how the registry prunes. I'll quote it here for convenience:

registry

  • removes orphaned blobs
  • removes orphaned repo dirs
  • removes orphaned repo manifest dirs (signatures)
  • removes orphaned repo layer links

The registry removal task can run anywhere that has access to the registry's storage. This could be code in the registry itself or a sidecar container perhaps. To determine what is orphaned, the registry will need a graph just like oadm prune images uses, but it can be smaller, since only image streams and images are needed as inputs.

If, as I proposed above, running oadm prune images results in images being hard deleted from etcd and references to said images being removed from image stream's status.tags, then I think it will be difficult to easily determine which layers can be deleted. It would probably make more sense instead to have oadm prune images mark images as deleted, instead of hard deleting them (it would be ok, however, to hard delete entries from status.tags). The registry's pruning code could then determine which layers can be safely deleted by retrieving the list of all images from OpenShift, building a graph of images and layers, and then finding all the layers whose only predecessors are images marked as deleted.

@smarterclayton @pweil- @miminar how does this sound?

@miminar
Copy link

miminar commented Oct 7, 2015

@ncdc That's a good idea. The same could be achieved without the need to query OpenShift. oadm prune images whould just issue regular manifest.Delete() which causes registry to do an unlink while still keeping manifest blob untouched (upstream behaviour). Registry could just build the graph of manifests/layers/blobs and remove those referenced only by unlinked manifests.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 7, 2015

@miminar the manifest is not stored in the registry. It's stored in etcd in /images.

@miminar
Copy link

miminar commented Oct 7, 2015

@ncdc I thought manifest stored in registry and OpenShift's one are the same. I can load manifest from storage from inside of registry just fine. Is it of no use to us?

@ncdc
Copy link
Contributor Author

ncdc commented Oct 7, 2015

@miminar they are, but we use a custom middleware in the registry to take over handling tags, manifests, etc: https://github.com/openshift/origin/blob/master/pkg/dockerregistry/server/repositorymiddleware.go#L101-L139. It's set up here: https://github.com/openshift/origin/blob/master/images/dockerregistry/config.yml#L16.

Manifest data is only stored in OpenShift's etcd. When you load a manifest from inside the registry, you're actually getting it from OpenShift.

@miminar
Copy link

miminar commented Oct 8, 2015

Aha! I didn't know that. Thanks for pointing this out.

@miminar
Copy link

miminar commented Oct 8, 2015

The main issue now I'm facing is how to mark images as deleted without removing them physically during oadm prune images. @ncdc and @derekwaynecarr came up with with an idea to add a new field to image stream to record an agent to do a finalize action to release the lock blocking the deletion. The deletion in etcd would be done once registry prunes orphaned objects.
I haven't yet a chance to get a grasp of needed changes and considering my knowledge of OSO's internals, I'd appreciate if you could review this approach before I try to implement it. Thanks in advance for any insight.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 8, 2015

@miminar we should lay out the proposed design, with all the moving parts, before we do any coding

@pweil-
Copy link
Contributor

pweil- commented Oct 8, 2015

lay out the proposed design

👍

@smarterclayton
Copy link
Contributor

Sounds good. We also need to queue up the work to automate pruning.

On Oct 7, 2015, at 11:29 AM, Andy Goldstein notifications@github.com
wrote:

@miminar https://github.com/miminar brings up a good point about my
suggestion from above about how the registry prunes. I'll quote it here for
convenience:

registry

  • removes orphaned blobs
  • removes orphaned repo dirs
  • removes orphaned repo manifest dirs (signatures)
  • removes orphaned repo layer links

The registry removal task can run anywhere that has access to the
registry's storage. This could be code in the registry itself or a sidecar
container perhaps. To determine what is orphaned, the registry will need a
graph just like oadm prune images uses, but it can be smaller, since only
image streams and images are needed as inputs.

If, as I proposed above, running oadm prune images results in images being
hard deleted from etcd and references to said images being removed from
image stream's status.tags, then I think it will be difficult to easily
determine which layers can be deleted. It would probably make more sense
instead to have oadm prune images mark images as deleted, instead of hard
deleting them (it would be ok, however, to hard delete entries from
status.tags). The registry's pruning code could then determine which layers
can be safely deleted by retrieving the list of all images from OpenShift,
building a graph of images and layers, and then finding all the layers
whose only predecessors are images marked as deleted.

@smarterclayton https://github.com/smarterclayton @pweil-
https://github.com/pweil- @miminar https://github.com/miminar how does
this sound?


Reply to this email directly or view it on GitHub
#3333 (comment).

@miminar
Copy link

miminar commented Oct 9, 2015

Let me me give it a shot:

Revised design for image and imagestream disposal

Delete of imagestream marks it for deletion. New controller watching for streams being deleted will kick in. The controller will mark all its images for deletion and store its name to /openshift.io/imagestreamdeletions (let's call it isTrash) as a type ImageStreamDeletion.

By the Delete is meant a deletion from command line like oc delete is -n project streamName or oc delete namespace nsName.

Delete of an image causes it to be just marked for deletion.

Registry pruning

Is a go-routine in dockerregistry triggered by a timer with a configurable interval. It does the following:

  1. loads manifest revisions (OSO's images) from etcd -- both marked and not marked for deletion
  2. builds a list of references to layers and signatures whose only predecessors are manifest revisions marked for deletion
  3. removes such references and deletes associated blobs in registry's storage
  4. deletes manifest revision directory for each image marked for deletion
  5. calls Finalize() function on image in etcd which removes finalizer from image.Finalizers and deletes it
  6. obtains all imagestreamdeletions from isTrash
  7. deletes one repository for each such item
  8. deletes each processed item object from isTrash

Image type

Will have a 2-phase deletion. It will get a new Finalizers []FinalizerName attribute which will be given initial value upon image's creation. This will prevent associated Delete() method from actually deleting it (similar to the same method for namespace object). With Spec.Finalizers emptied by a call to Finalize() function, it can be deleted from etcd store from inside of registry.

Marking for deletion

Means to set DeletionTimestamp on an image.

ImageStreamDeletion

Is a new type stored in isTrash inaccessible from a cli as a resource. It has no extra attributes. Its Namespace and Name identify a repository in dockerregistry.

ImageStream

Again 2-phase deletion. Type will be extended for .Spec.Finalizers and .Status.Phase which will initially beavailable`.

ImageStream controller

Is a new type that takes care of image stream's termination. Once the termination is complete, it finalizes the stream and removes it. Finalized image has empty image.Finalizers. It works similar to NamespaceController. Stream's termination comprises of storing new image stream deletion into etcd, marking contained images as deleted and finalizing the stream.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 9, 2015

  1. builds a list of layers and signatures belonging to manifest revisions (OSO's images) marked for deletion (read this from etcd)
  2. remove each such layer and signature link and deletes associated blobs in registry's storage

You have to retrieve the complete list of images - both preserved and marked for deletion - to be able to determine which layers can be pruned. A layer can be pruned if and only if the only predecessors it has are images marked for deletion. If a layer has a predecessor that is a preserved image, you can't delete that layer.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 9, 2015

Registry pruning is a go-routine in dockerregistry triggered by a timer with a configurable interval

This needs to be a function that's easy to invoke from both an internally-scheduled timer in the registry process and from a separate executable. We want to have the flexibility to run this automatically inside the registry container or in a separate sidecar container in the registry pod.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 9, 2015

  • remove each such layer and signature link and deletes associated blobs in registry's storage
  • deletes manifest revision directory for each image marked for deletion

Based on conversations with Stephen Day from Docker, he expects us to use a repository-scoped BlobDeleter to "delete a repository layer link", rather than exposing the internal storage (i.e. the fact that there is a repository layer link) to the API. I would assume that we would use a global BlobDeleter to delete the actual blob.

I think calling "delete manifest" for a given repo/manifest is what should end up deleting the signature links. We'll need to be careful here, since I believe that call will initially route to our middleware. So we'll just need to have our middleware delegate to the Repository we're wrapping.

@ncdc
Copy link
Contributor Author

ncdc commented Oct 9, 2015

Also please review #3521 as it relates to deleting and recreating image streams with the same name. I don't think we'll be able to include the repo uid in the repo storage path given that upstream doesn't want to expose the path mapper, but hopefully we can figure something out...

@miminar
Copy link

miminar commented Oct 12, 2015

@ncdc Design updated.

Based on conversations with Stephen Day from Docker, he expects us to use a repository-scoped BlobDeleter to "delete a repository layer link", rather than exposing the internal storage (i.e. the fact that there is a repository layer link) to the API.

Yes, that's what I'm using now. BlobsStore.Delete() to delete a layer link, ManifestService.Delete() to delete manifest revision's directory, vacuum.RemoveRepository() to delete repository folder and BlobKeeper.Delete() to delete blob data. Anything else operates on etcd store.

@soltysh
Copy link
Member

soltysh commented Sep 6, 2016

Adding #7564 for cross reference.

@bparees
Copy link
Contributor

bparees commented Feb 6, 2018

/close

pretty sure hard-prune implemented some of the suggestions and here and prune itself looks a lot like the other discussions i read, with the exception of the proposed "mark+sweep" technique, though i think even if that is basically what oc adm prune does (it doesn't update the etcd image objects to mark them as being deleted, but it builds the full graph in cache and then goes through and deletes everything it should, including registry storage content)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests