Skip to content

Implement registry and registry cache cleaning#253

Merged
ianpittwood merged 27 commits intomainfrom
clean-command
Dec 8, 2025
Merged

Implement registry and registry cache cleaning#253
ianpittwood merged 27 commits intomainfrom
clean-command

Conversation

@ianpittwood
Copy link
Contributor

Closes #247

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

Test Results

797 tests   797 ✅  8m 59s ⏱️
  1 suites    0 💤
  1 files      0 ❌

Results for commit fdf2add.

♻️ This comment has been updated with latest results.

@ianpittwood
Copy link
Contributor Author

@ianpittwood ianpittwood marked this pull request as ready for review November 7, 2025 20:49
Copy link
Contributor

@bschwedler bschwedler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense to me.

Comment on lines +100 to +102
clean-caches:
name: Clean Caches
permissions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you anticipate consuming workflows will invoke this as a separate job as is done here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a separate job or a separate workflow? Not sure. I guess I'd lean for it to be a job that runs at the end of builds to clean up. Did you have a lean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For repos that trigger ci at least daily, running it there makes a lot of sense.

This job has no needs, so it will continue to clean up old images even if the builds fail. Yay!

Comment on lines 77 to 78
untagged_versions = package_versions.filter_untagged()
untagged_old_versions = untagged_versions.filter_older_than(remove_untagged_older_than)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to chain the methods here?
Example uses the method names without filter_

Suggested change
untagged_versions = package_versions.filter_untagged()
untagged_old_versions = untagged_versions.filter_older_than(remove_untagged_older_than)
untagged_old_versions = package_versions.untagged().older_than(remove_untagged_older_than)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would select items that are untagged and older than the given time. Don't we want to always remove images that are older than that date regardless of their tagging?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was thinking about this in terms of a subfilter.

For the cache, it makes sense to clean up anything older than a certain age.

For production repos we probably want to support this type of subselection. The logic of the larger function should help accomplish this though.

Copy link
Contributor

@bschwedler bschwedler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. A few small thoughts, but take 'em or leave 'em!

Comment on lines +100 to +102
clean-caches:
name: Clean Caches
permissions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For repos that trigger ci at least daily, running it there makes a lot of sense.

This job has no needs, so it will continue to clean up old images even if the builds fail. Yay!

def get_repositories(self, namespace: str = None) -> list[dict]:
if namespace is None:
namespace = self.identifier
target = urljoin(self.BASE_URL, self.ENDPOINTS["repositories"].format(namespace=namespace))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this a class method?

Suggested change
target = urljoin(self.BASE_URL, self.ENDPOINTS["repositories"].format(namespace=namespace))
target = urljoin(self.BASE_URL, self.endpoints("repositories", namespace=namespace))

"""Get details on a package."""
target_url = self.ENDPOINTS["package"].format(organization=organization, package=quote(package, safe=""))
log.debug(f"GET {target_url}")
headers, response = self.client.requester.requestJsonAndCheck(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unused variable

Suggested change
headers, response = self.client.requester.requestJsonAndCheck(
_, response = self.client.requester.requestJsonAndCheck(

organization=organization, package=quote(package, safe="")
)
log.debug(f"GET {target_url} (page {page}/{page_count})")
headers, response = self.client.requester.requestJsonAndCheck(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
headers, response = self.client.requester.requestJsonAndCheck(
_, response = self.client.requester.requestJsonAndCheck(

Comment on lines 77 to 78
untagged_versions = package_versions.filter_untagged()
untagged_old_versions = untagged_versions.filter_older_than(remove_untagged_older_than)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I was thinking about this in terms of a subfilter.

For the cache, it makes sense to clean up anything older than a certain age.

For production repos we probably want to support this type of subselection. The logic of the larger function should help accomplish this though.

"tag": "/namespaces/{namespace}/repositories/{repository}/tags/{tag}",
}

def __init__(self, identifier: str = None, secret: str = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several places in this PR where we should use the union type for nullalble params.

Suggested change
def __init__(self, identifier: str = None, secret: str = None):
def __init__(self, identifier: str | None = None, secret: str | None = None):

@ianpittwood ianpittwood merged commit a9dd4ab into main Dec 8, 2025
6 of 7 checks passed
@ianpittwood ianpittwood deleted the clean-command branch December 8, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automation to clean up image repositories

2 participants