Skip to content

🐛 OCPBUGS-78787: fix(operator-controller): clean up orphaned temp dirs in catalog cache#2574

Merged
openshift-merge-bot[bot] merged 1 commit intooperator-framework:mainfrom
tmshort:fix-OCPBUGS-78787-operator-controller
Mar 20, 2026
Merged

🐛 OCPBUGS-78787: fix(operator-controller): clean up orphaned temp dirs in catalog cache#2574
openshift-merge-bot[bot] merged 1 commit intooperator-framework:mainfrom
tmshort:fix-OCPBUGS-78787-operator-controller

Conversation

@tmshort
Copy link
Copy Markdown
Contributor

@tmshort tmshort commented Mar 18, 2026

Related to #2537 which fixed catalogd issues.

filesystemCache.writeFS creates a temp dir (.{catalog}-{random}) and renames it into place atomically. If the process is interrupted before the rename, the temp dir persists. Each restart adds another, eventually filling the disk.

Additionally, writeFS had no defer os.RemoveAll(tmpDir), so any error during WalkMetasReader or the rename step also left the temp dir behind — no process kill required.

Two fixes:

  • Add defer os.RemoveAll(tmpDir) so errors during normal operation clean up.
  • Add removeOrphanedTempDirs, called at the start of writeFS (under the write mutex), to clean up dirs orphaned by a previous process run. This bounds worst-case accumulation to one orphaned dir per catalog regardless of restart rate.

Description

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

…in catalog cache

filesystemCache.writeFS creates a temp dir (.{catalog}-{random}) and renames
it into place atomically. If the process is interrupted before the rename, the
temp dir persists. Each restart adds another, eventually filling the disk.

Additionally, writeFS had no defer os.RemoveAll(tmpDir), so any error during
WalkMetasReader or the rename step also left the temp dir behind — no process
kill required.

Two fixes:
- Add defer os.RemoveAll(tmpDir) so errors during normal operation clean up.
- Add removeOrphanedTempDirs, called at the start of writeFS (under the write
  mutex), to clean up dirs orphaned by a previous process run. This bounds
  worst-case accumulation to one orphaned dir per catalog regardless of
  restart rate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Todd Short <tshort@redhat.com>
Copilot AI review requested due to automatic review settings March 18, 2026 14:12
@openshift-ci openshift-ci Bot requested review from joelanford and oceanc80 March 18, 2026 14:12
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 18, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 06896bb
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/69bab2c8997dd400081adb5b
😎 Deploy Preview https://deploy-preview-2574--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented Mar 18, 2026

Related to #2537 which fixed catalogd issues.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds cleanup for orphaned temporary directories in the filesystem cache implementation. These orphaned directories can be left behind if a write operation is interrupted (e.g., pod eviction or crash) before the temporary staging directory is renamed to the final cache location. The changes improve reliability by automatically cleaning up these dangling directories when a new write operation begins.

Changes:

  • Added removeOrphanedTempDirs() method to scan and remove temporary directories with the catalog-specific prefix pattern that were left behind by interrupted writes
  • Integrated orphaned directory cleanup into the writeFS() method to run before creating a new temporary directory
  • Added a defer statement to ensure temporary directories are cleaned up if the write operation fails
  • Added comprehensive test coverage for the orphaned directory cleanup functionality

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
internal/operator-controller/catalogmetadata/cache/cache.go Added orphaned temp directory cleanup with removeOrphanedTempDirs() method and integrated it into writeFS() flow
internal/operator-controller/catalogmetadata/cache/cache_test.go Added TestFilesystemCachePutCleansOrphanedTempDirs() test to verify orphaned directories are cleaned up while preserving directories for other catalogs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.58%. Comparing base (f7a8220) to head (06896bb).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...operator-controller/catalogmetadata/cache/cache.go 46.66% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2574      +/-   ##
==========================================
+ Coverage   63.42%   68.58%   +5.16%     
==========================================
  Files         131      131              
  Lines        9333     9348      +15     
==========================================
+ Hits         5919     6411     +492     
+ Misses       2939     2442     -497     
- Partials      475      495      +20     
Flag Coverage Δ
e2e 39.02% <40.00%> (+<0.01%) ⬆️
experimental-e2e 51.57% <40.00%> (?)
unit 53.82% <46.66%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine for me
/approved

@camilamacedo86
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: camilamacedo86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 19, 2026
@kuiwang02
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026
@camilamacedo86
Copy link
Copy Markdown
Contributor

/override codecov/patch

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 20, 2026

@camilamacedo86: Overrode contexts on behalf of camilamacedo86: codecov/patch

Details

In response to this:

/override codecov/patch

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 0db26d7 into operator-framework:main Mar 20, 2026
31 of 32 checks passed
@tmshort tmshort deleted the fix-OCPBUGS-78787-operator-controller branch April 2, 2026 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants