Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an endpoint for removing images recursively by digests #247

Merged
merged 1 commit into from
Feb 23, 2021

Conversation

lubosmj
Copy link
Member

@lubosmj lubosmj commented Feb 19, 2021

closes #8105

@pulpbot
Copy link
Member

pulpbot commented Feb 19, 2021

Attached issue: https://pulp.plan.io/issues/8105

@lubosmj lubosmj force-pushed the remove-by-digest-man-push-repo-8105 branch from 1379aaa to 017324d Compare February 19, 2021 23:52
@lubosmj
Copy link
Member Author

lubosmj commented Feb 20, 2021

@ipanova, @mdellweg, is it sufficient enough to lock just a repository while performing the removal task? Is it always guaranteed that all objects within the repository (i.e., manifests, blobs, tags) are going to be locked as well? If so, has the locking of all the objects any negative impact on the performance?

@ipanova
Copy link
Member

ipanova commented Feb 20, 2021

@ipanova, @mdellweg, is it sufficient enough to lock just a repository while performing the removal task? Is it always guaranteed that all objects within the repository (i.e., manifests, blobs, tags) are going to be locked as well? If so, has the locking of all the objects any negative impact on the performance?

we can't lock on the content for the performance reasons. lock on the repo means that no other action can be performed on it until you finish your current task. So if you have locked the repo for a removal operation, another removal/add operation won't be executed in parallel, it will wait until the lock will be released, so tldr no content will disappear from the repo during the removal rask in course.

@@ -0,0 +1 @@
Added a new API endpoint that allows users to remove an imange by a digest within a push repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/imange/image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/within a push repository/from a push repository

Manifest,
MEDIA_TYPE,
Tag,
)


def recursive_remove_content_find_tags(repository_pk, manifest_pk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add blobs here as well. And manifests in case what you are trying to remove by digest happens to be a manifest_list. Now the trick is to remove only those blobs that are not referenced by any other image within that repo; remove only those image manifests that are not referenced in other manifest lists within that repo and those that are not tagged. All this logic is respected in the task recursive_remove_content.
I suggest using it, just in the content_unit already provide pks of the manifest/manifest-list and pks of all tags you will find referenced by that object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can find all the pks i was talking about in the viewet and pass it to the task from the viewset.

@lubosmj lubosmj force-pushed the remove-by-digest-man-push-repo-8105 branch 5 times, most recently from d0b0955 to 9d16554 Compare February 22, 2021 18:38
@lubosmj lubosmj marked this pull request as ready for review February 22, 2021 18:51
@lubosmj lubosmj requested a review from ipanova February 22, 2021 18:51
new_data["digest"], new_data["latest_version"]
)

def get_manifest(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracting the repeated code into a separate function and leaving it like this in between all other serializers' code makes it hard to read because it gets 'lost', you need to logically place it.
you can define a mixin that would validate repo_version and manifest or you can just let it be and leave this try/excpet repeated twice. Sometimes it is more readable to have the code repeated rather then adding mixins and have multi-inheritance for the sake of saving several lines of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this Manifest lookup can be rephrased with a HiddenField, as well as the latest_version [0]. But that would be refactoring and we should not do that along with a feature PR. I agree with @ipanova that sometimes keeping the things where they happen to not have to scoll up and down to understand makes reading easier.
At the very least i am quite unhappy with the function name, because in django all the "get_" functions raise DoesNotExist and this one will raise ValidationError.

[0] https://www.django-rest-framework.org/api-guide/fields/#hiddenfield

def tearDownClass(cls):
"""Delete the created distribution."""
cls.distributions_api.delete(cls.distribution.pulp_href)
cls.namespaces_api.delete(cls.namespace.pulp_href)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to delete distribution, it gets removed automatically with the namespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last force-push is resolving this comment.

Validate and extract the latest repository version, manifest, and tags from the passed data.
"""
new_data = {}
new_data.update(self.initial_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here?
Should this not be new_data = super().validate(data)?
Does this circumvent the field validations?
Why do we refer to a repository here, and there is no repository field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to call the base validate() method because it does literally nothing. The field for a repository is initialized automatically. We already use such a technique here:

def validate(self, data):
"""
Validate data passed through a request call.
Check if a repository has got a reference to a latest repository version. A
new dictionary object is initialized by the passed data and altered by a latest
repository version.
"""
new_data = {}
new_data.update(self.initial_data)
latest_version = new_data["repository"].latest_version()

It does not circumvent the validation because of this:
https://github.com/pulp/pulp_container/pull/247/files#diff-31c44b2812e50f3171248db027f439e8e64248cb84067fc7c305de74cdb9a86fR875-R880

latest_version_href = self.repositories_api.read(self.repo.pulp_href).latest_version_href
content_summary = self.versions_api.read(latest_version_href).content_summary

self.assertEqual(content_summary.present, {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🌞

@lubosmj lubosmj force-pushed the remove-by-digest-man-push-repo-8105 branch 2 times, most recently from 767942c to 1258117 Compare February 23, 2021 13:54
@lubosmj lubosmj force-pushed the remove-by-digest-man-push-repo-8105 branch from 1258117 to 5fad458 Compare February 23, 2021 14:28
@ipanova ipanova merged commit b797b5f into pulp:master Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants