Skip to content

Optional automatic image cleanup#171

Merged
sjmiller609 merged 6 commits into
mainfrom
codex/image-retention-auto-delete
Mar 27, 2026
Merged

Optional automatic image cleanup#171
sjmiller609 merged 6 commits into
mainfrom
codex/image-retention-auto-delete

Conversation

@sjmiller609

@sjmiller609 sjmiller609 commented Mar 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add a server-wide image auto-delete feature for cached converted images under data_dir/images
  • track unused image digests in persisted retention state and delete by digest once the unused window expires
  • protect deletion with an explicit allow list of normalized repository patterns, with "*" as the global wildcard and an empty allow list deleting nothing
  • clear stale retention state during instance creation, keep OCI cache untouched, and expose retention metrics/logging

Config

images:
  auto_delete:
    enabled: false
    unused_for: 720h
    allowed: []

Examples:

  • allowed: ["docker.io/library/*", "ghcr.io/kernel/*"]
  • allowed: ["*"]

Testing

Local:

  • make build-embedded
  • make build-vz-shim
  • go test ./cmd/api/config
  • go test ./lib/imageretention
  • go test ./cmd/api
  • go test ./lib/instances -run 'Test(CreateInstanceClearsRetentionStateBeforeMetadataSave|StorageOperations|Filter|Image)'
  • go test ./cmd/api/config ./lib/imageretention ./cmd/api ./lib/instances -run 'Test(DefaultConfigIncludesMetricsSettings|LoadUsesDefaultImageAutoDeleteRetentionWindow|ValidateRejectsInvalidImageAutoDeleteUnusedFor|ValidateTrimsImageAutoDeleteAllowedPatterns|Sweep|AllowPatternMatches|MarkUsed|RunPerformsImmediateSweep|StartImageRetentionController|CreateInstanceClearsRetentionStateBeforeMetadataSave)'

Deft (deft-kernel-dev, fresh short-path workspace ~/hmret):

  • bootstrap: make ensure-ch-binaries ensure-firecracker-binaries ensure-caddy-binaries build-embedded
  • go test ./lib/imageretention ./cmd/api/config ./cmd/api
  • TMPDIR=/mnt/data/home/sjmiller609/tmp go test ./lib/instances -run "TestCreateInstanceClearsRetentionStateBeforeMetadataSave" -count=1
  • manual QA harness on Deft verified:
    • empty allow list blocks retention tracking and deletion
    • specific allow list preserves non-matching repositories and clears stale retention state
    • allowed repositories get retention state first and are deleted only after the retention window
    • digest directory and tag symlink are removed together
    • OCI cache remains untouched
    • global wildcard allow list and MarkUsed both work

Notes

  • no API schema changes
  • auto-delete remains disabled by default
  • an empty images.auto_delete.allowed list is a deliberate safety default: nothing is eligible for deletion until explicitly allow-listed

Note

Medium Risk
Adds a background controller that can automatically delete cached image digests based on persisted retention state, which could cause unexpected data loss if misconfigured despite being disabled by default and gated by an allow-list.

Overview
Adds an optional server-wide image auto-delete feature for cached converted images under data_dir/images, controlled via new config images.auto_delete (enabled flag, unused_for retention window, and an explicit allow-list of repository patterns).

Introduces a new lib/imageretention controller that persists per-digest unused_since state under data_dir/system/image-retention, sweeps on startup and every minute, skips images referenced by instance metadata or snapshots, and deletes eligible digests (with metrics and logging). The API startup wires this controller behind the config flag, and instance creation now calls an optional ImageUsageRecorder hook to clear stale retention state before persisting new metadata.

Written by Cursor Bugbot for commit 99a7b37. This will update automatically on new commits. Configure here.

Comment thread cmd/api/main.go
@sjmiller609 sjmiller609 changed the title Add automatic image retention cleanup Optional automatic image cleanup Mar 26, 2026
@sjmiller609 sjmiller609 marked this pull request as ready for review March 26, 2026 19:36
@sjmiller609 sjmiller609 requested a review from hiroTamada March 26, 2026 19:37
Comment thread lib/imageretention/controller.go
Comment thread lib/imageretention/metrics.go
Comment thread lib/instances/manager.go Outdated

@hiroTamada hiroTamada left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fine for the review, adding as a note for discussion

Comment thread lib/imageretention/controller.go
@sjmiller609 sjmiller609 requested a review from hiroTamada March 27, 2026 17:16

@hiroTamada hiroTamada left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. the allow-list addition addresses the built-image concern nicely — deny-by-default with explicit opt-in per repository is the right approach. clean implementation overall: good separation of concerns, solid test coverage, proper race protection via MarkUsed, and conservative defaults (disabled + empty allow list + 30-day window).

…-auto-delete

# Conflicts:
#	lib/instances/manager.go
@sjmiller609 sjmiller609 merged commit 8321f96 into main Mar 27, 2026
6 checks passed
@sjmiller609 sjmiller609 deleted the codex/image-retention-auto-delete branch March 27, 2026 17:50

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

matcher = strings.ReplaceAll(matcher, `\*`, ".*")
matcher = strings.ReplaceAll(matcher, `\?`, ".")
return regexp.MustCompile(matcher).MatchString(repository)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex recompiled on every pattern match invocation

Low Severity

allowPatternMatches calls regexp.MustCompile on every invocation. Since isAllowed calls this for each ready image against each allowed pattern, and sweep runs every minute, the total regex compilations per sweep is O(images × patterns). The allowed patterns are static after controller construction, so the compiled regexes could be computed once in NewController or normalizeAllowedPatterns and reused.

Fix in Cursor Fix in Web

Comment thread cmd/api/config/config.go
}
if err := validateDuration("images.auto_delete.unused_for", c.Images.AutoDelete.UnusedFor); err != nil {
return err
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing positive duration validation for unused_for config

Low Severity

The images.auto_delete.unused_for field is validated only by validateDuration, which accepts any parseable Go duration including zero and negative values. A zero or negative value would cause images to be eligible for deletion on the very next sweep after being marked unused (~1 minute), rather than observing a meaningful retention window, contradicting the documented 720h default intent.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants