Skip to content

fix(runtime): Add cleanup for orphaned temp directories#96

Merged
dpup merged 3 commits into
mainfrom
fix/temp-dir-cleanup
Jan 30, 2026
Merged

fix(runtime): Add cleanup for orphaned temp directories#96
dpup merged 3 commits into
mainfrom
fix/temp-dir-cleanup

Conversation

@dpup
Copy link
Copy Markdown
Collaborator

@dpup dpup commented Jan 30, 2026

Fixes #81

Summary

  • Adds moat system clean-temp command to clean up orphaned temporary directories
  • Scans /tmp for moat temp directories older than specified age (default: 1h)
  • Supports --dry-run, --force, and --min-age flags
  • Updates CLI reference documentation

Problem

Moat creates temporary directories for AWS credential helpers, Claude config, and Codex config. These are normally cleaned up when a run completes, but accumulate if moat crashes.

Solution

Added a new moat system clean-temp command that scans for orphaned directories and safely removes them after confirmation.

Testing

  • ✅ Built and tested command locally
  • ✅ Verified dry-run, force, and min-age flags work correctly
  • ✅ CLI tests pass

🤖 Generated with Claude Code

Addresses issue #81 by adding a new `moat system clean-temp` command
that scans for and removes orphaned temporary directories in /tmp.

Moat creates temporary directories for AWS credentials, Claude config,
and Codex config. These are normally cleaned up when a run completes,
but accumulate if moat crashes or is killed.

Changes:
- Add internal/system/tempclean.go with directory scanning logic
- Add cmd/moat/cli/system_cleantemp.go command implementation
- Update CLI reference documentation
- Support --dry-run, --force, and --min-age flags
- Only clean directories older than specified age (default: 1h)

The command scans for these patterns:
- agentops-aws-* (AWS credential helpers)
- moat-claude-staging-* (Claude configuration)
- moat-codex-staging-* (Codex configuration)
@claude
Copy link
Copy Markdown

claude Bot commented Jan 30, 2026

📦 Previous Review (superseded by newer review)

Pull Request Review: Add cleanup for orphaned temp directories

This PR adds a useful maintenance command to clean up temporary directories that may accumulate after crashes. The implementation is solid overall with good user experience. Here are my findings:

✅ Strengths

  1. Clear documentation - The CLI help text and reference docs clearly explain what the command does and when it's needed
  2. Good UX patterns - Implements --dry-run, --force, and --min-age flags following standard conventions
  3. Safe defaults - 1-hour minimum age and confirmation prompt prevent accidental deletion
  4. Helpful output - Shows size, age, and pattern information before prompting for confirmation
  5. Consistent patterns - Matches the temp directory patterns actually used in internal/run/manager.go

🔍 Potential Issues

1. TOCTOU Race Condition (Medium Severity)

// internal/system/tempclean.go:46-54
info, err := os.Stat(match)
if err != nil {
    continue
}

// Skip if directory was modified recently (still in use)
if info.ModTime().After(cutoff) {
    continue
}

Issue: There's a time-of-check-time-of-use (TOCTOU) vulnerability between:

  1. Checking if a directory is old enough (FindOrphanedTempDirs)
  2. User confirming deletion (interactive prompt)
  3. Actually deleting (CleanOrphanedTempDirs)

Scenario: A new moat run could start and create a temp directory between the scan and deletion, potentially removing an active directory if it happens to reuse a previously scanned path.

Recommendation: Re-verify directory age immediately before deletion:

func CleanOrphanedTempDirs(dirs []OrphanedTempDir, minAge time.Duration) error {
    var errs []string
    cutoff := time.Now().Add(-minAge)

    for _, dir := range dirs {
        // Re-check age before deletion to avoid race condition
        if info, err := os.Stat(dir.Path); err == nil {
            if info.ModTime().After(cutoff) {
                // Directory was modified since scan - skip it
                continue
            }
        }
        
        if err := os.RemoveAll(dir.Path); err != nil {
            errs = append(errs, fmt.Sprintf("%s: %v", dir.Path, err))
        }
    }
    // ... rest of function
}

2. Missing Test Coverage (Medium Severity)

Issue: No unit tests for the new functionality. The PR description mentions "CLI tests pass" but there are no test files for:

  • internal/system/tempclean.go
  • cmd/moat/cli/system_cleantemp.go

Recommendation: Add tests for:

  • FindOrphanedTempDirs with various age scenarios
  • CleanOrphanedTempDirs error handling
  • FormatSize edge cases (0 bytes, large values)
  • formatDuration edge cases

Example test structure:

func TestFindOrphanedTempDirs(t *testing.T) {
    // Create temp directories with known ages
    // Verify correct filtering by minAge
    // Verify proper handling of permission errors
}

3. Silent Error Handling in dirSize (Low Severity)

// internal/system/tempclean.go:90-101
func dirSize(path string) (int64, error) {
    var size int64
    err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
        if err != nil {
            return nil // Skip files we can't access
        }
        // ...
    })
    return size, err
}

Issue: Silently skipping inaccessible files means reported sizes may be inaccurate without user awareness.

Recommendation: Consider logging or counting skipped files, especially since this is a cleanup tool where accuracy matters for user decision-making.

4. No Active Run Cross-Check (Low-Medium Severity)

Issue: The help text claims directories are "not associated with active runs" but there's no actual verification. The implementation only checks modification time, which could give false positives if a long-running container hasn't modified its temp directory recently.

Recommendation: Consider adding a check against active containers via the container runtime (similar to how system_containers.go uses rt.ListContainers()). This would make the "not associated with active runs" claim accurate.

🎯 Code Quality

  • Style: Follows Go conventions and project patterns ✓
  • Error messages: Clear and actionable ✓
  • Documentation: Comprehensive and matches STYLE-GUIDE.md ✓
  • Consistency: Matches existing system command patterns ✓

📊 Performance Considerations

  • dirSize using filepath.Walk is fine for temp directories (typically small)
  • filepath.Glob per pattern is reasonable (only 3 patterns)
  • No concerns for typical usage

🔒 Security Considerations

📝 Minor Suggestions

  1. Line 88: Error from reader.ReadString is silently ignored - consider at least logging it
  2. Naming: plural function is generic - could be moved to a shared helpers file if used elsewhere
  3. Documentation: The long description mentions patterns twice (lines 28-31 in CLI and in docs) - this is fine but ensure they stay in sync

✅ Recommendation

APPROVE with changes requested

The feature is valuable and well-implemented. The TOCTOU race condition should be fixed before merging, and test coverage should be added. The other issues are lower priority but would improve robustness.

Priority fixes:

  1. Required: Add age re-verification before deletion (TOCTOU fix)
  2. 📝 Recommended: Add unit tests
  3. 💡 Optional: Add active run cross-check or update docs to be accurate

Great work on the UX and documentation! 🎉

Address review feedback from PR #96:

1. Fix TOCTOU race condition
   - Re-verify directory age before deletion
   - Skip directories modified since initial scan
   - Prevents deleting newly created temp dirs

2. Add comprehensive unit tests
   - TestFindOrphanedTempDirs with various age scenarios
   - TestCleanOrphanedTempDirs for successful cleanup
   - TestCleanOrphanedTempDirs_SkipsRecentlyModified for race protection
   - TestFormatSize edge cases
   - TestDirSize directory size calculation
   - TestPluralSuffix helper function

3. Update documentation
   - Remove inaccurate "not associated with active runs" claim
   - Documentation now accurately reflects what the command checks

All tests pass and the TOCTOU race condition is now prevented.
@claude
Copy link
Copy Markdown

claude Bot commented Jan 30, 2026

📦 Previous Review (superseded by newer review)

Pull Request Review: Add cleanup for orphaned temp directories

This PR adds a useful maintenance command to clean up temporary directories that may accumulate when moat crashes. The implementation is well-structured with good test coverage and clear documentation. Below are my findings:

Code Quality ✅

Strengths:

  • Clean separation of concerns: CLI command logic in cmd/moat/cli/system_cleantemp.go and core functionality in internal/system/tempclean.go
  • Comprehensive test coverage (253 lines) with multiple test scenarios
  • Follows existing patterns from system_containers.go and system_images.go
  • Good use of Go idioms and error handling

Suggestions:

  1. Race condition protection is excellent - The re-verification of directory age before deletion (internal/system/tempclean.go:82-85) is a good security measure. However, consider adding a comment explaining the TOCTOU (Time-Of-Check-Time-Of-Use) protection:
// Re-check age before deletion to avoid TOCTOU race condition
// A new moat run could have started between scan and deletion
if info, err := os.Stat(dir.Path); err == nil {
    if info.ModTime().After(cutoff) {
        skipped = append(skipped, dir.Path)
        continue
    }
}

The comment is there, but it could be more prominent as this is a critical security feature.

  1. Inconsistent plural handling - There are two pluralSuffix functions:
    • internal/system/tempclean.go:108-113 (unexported)
    • cmd/moat/cli/system_cleantemp.go:106-111 (unexported, named plural)

Consider extracting this to a shared utility or keeping only one implementation.

Potential Issues 🔍

  1. Missing verification that patterns match actual usage - The patterns list in MoatTempPatterns should be verified against the actual MkdirTemp calls:

    • agentops-aws-* matches internal/run/manager.go:575
    • moat-claude-staging-* matches internal/run/manager.go:1190
    • moat-codex-staging-* matches internal/run/manager.go:1291

    Good! The patterns are accurate.

  2. Error handling in dirSize - In internal/system/tempclean.go:118, the walk function silently skips files it can't access:

    err := filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
        if err != nil {
            return nil // Skip files we can't access
        }

    This is reasonable for a size calculation, but consider logging these errors at debug level or collecting them to warn users about permission issues.

  3. Potential edge case: Empty TMPDIR - In the test at internal/system/tempclean_test.go:14, setting TMPDIR to t.TempDir() is good, but the code should handle the case where os.TempDir() might return an empty string (though unlikely in practice).

Performance Considerations ⚡

  1. Glob performance - Using filepath.Glob for each pattern sequentially (internal/system/tempclean.go:39-42) is fine for /tmp, but on systems with many files, this could be slow. Consider:

    • The current implementation is acceptable since /tmp cleanup is an infrequent operation
    • If performance becomes an issue, could switch to a single directory walk
  2. Directory size calculation - The dirSize function walks entire directory trees (internal/system/tempclean.go:116-127). For large temp directories, this could be slow. Consider:

    • Adding a timeout or size limit
    • Making size calculation optional with a flag like --show-sizes

Currently, this is fine since these are temporary directories, not user workspaces.

Security Concerns 🔒

Excellent security posture:

  1. TOCTOU protection - Re-verifies directory age before deletion
  2. Default minimum age - 1 hour default prevents accidental deletion of active directories
  3. Confirmation prompt - Requires user confirmation unless --force is used
  4. Dry-run support - Allows safe preview before deletion
  5. Limited scope - Only targets specific moat-created patterns, not arbitrary temp files

One consideration:

  • Symlink handling - os.RemoveAll follows symlinks. While unlikely, if a moat temp directory contained a symlink to important data, it would be deleted. Consider checking for symlinks before deletion or documenting this behavior.

Test Coverage ✅

Excellent test coverage:

  • TestFindOrphanedTempDirs - Tests age-based filtering
  • TestCleanOrphanedTempDirs - Tests successful cleanup
  • TestCleanOrphanedTempDirs_SkipsRecentlyModified - Tests race condition protection
  • TestFormatSize - Tests human-readable size formatting
  • TestDirSize - Tests recursive size calculation
  • TestPluralSuffix - Tests plural handling

Missing tests:

  1. Error case: What happens when a directory can't be deleted (permission denied)?
  2. Edge case: Pattern with no matches
  3. Edge case: Very large directories (though this may be impractical for unit tests)

Consider adding a test for permission errors:

func TestCleanOrphanedTempDirs_PermissionError(t *testing.T) {
    tmpDir := t.TempDir()
    testDir := filepath.Join(tmpDir, "test-readonly")
    if err := os.Mkdir(testDir, 0000); err != nil {
        t.Fatal(err)
    }
    defer os.Chmod(testDir, 0755) // Clean up
    
    orphaned := []OrphanedTempDir{{Path: testDir}}
    err := CleanOrphanedTempDirs(orphaned, 1*time.Hour)
    if err == nil {
        t.Error("expected error cleaning readonly directory")
    }
}

Documentation 📚

Strengths:

  • Clear command description and examples in docs/content/reference/01-cli.md
  • Helpful long description in the CLI command
  • Good examples covering dry-run, force, and min-age flags

Minor suggestions:

  1. Documentation accuracy - Good! The patterns listed in the docs match the code exactly.

  2. Style guide compliance - The documentation follows the STYLE-GUIDE.md well:

    • ✅ Objective (states facts about what the command does)
    • ✅ Direct (uses active voice)
    • ✅ Factual (specific about what gets cleaned)
  3. Consider adding to CLAUDE.md - Since this is a new system maintenance command, consider documenting it in the "Development Commands" or creating a "Maintenance" section.

Git Conventions 🔧

Issue with commit format:

The PR description states "do not include Co-Authored-By lines for Claude" per CLAUDE.md, but the PR body includes:

🤖 Generated with Claude Code

This is fine for PR descriptions, but ensure the actual commit messages follow the Conventional Commits format without Co-Authored-By lines.

The commit title fix(runtime): Add cleanup for orphaned temp directories is good, but consider:

  • Type could be feat instead of fix since it's adding new functionality, not fixing a bug
  • Scope could be system or cli instead of runtime since it's a system maintenance command

Suggested: feat(system): add cleanup command for orphaned temp directories

Overall Assessment ⭐

This is a solid PR that adds valuable functionality. The implementation is clean, well-tested, and follows project conventions. The security considerations (TOCTOU protection, confirmation prompts, dry-run mode) demonstrate thoughtful design.

Recommended actions before merge:

  1. Critical: None - the PR is ready to merge from a functionality standpoint
  2. Nice to have:
    • Consider adding a test for permission errors
    • Consolidate the two plural/pluralSuffix functions
    • Consider whether feat is more appropriate than fix for the commit type

Rating: 8.5/10 - High-quality implementation with comprehensive testing and good security practices. The minor suggestions are refinements rather than blockers.

@claude
Copy link
Copy Markdown

claude Bot commented Jan 30, 2026

Pull Request Review: Add cleanup for orphaned temp directories

This PR successfully addresses issue #81 by implementing a manual cleanup command for orphaned temporary directories. The implementation is well-structured with good test coverage and clear documentation.

Code Quality ✅

Strengths:

  • Clean separation of concerns: CLI command (cmd), core logic (internal/system), and tests
  • Follows conventional commits format with proper type and scope
  • Comprehensive test coverage including edge cases (TOCTOU race condition)
  • Code style is consistent with existing codebase conventions
  • Good use of defer patterns for cleanup responsibility tracking

Minor suggestions:

  1. internal/system/tempclean.go:41 - The filepath.Glob error is silently ignored with continue. Consider logging or collecting these errors:

    matches, err := filepath.Glob(filepath.Join(tmpDir, pattern.Pattern))
    if err != nil {
        // Consider: log.Warnf("failed to glob pattern %s: %v", pattern.Pattern, err)
        continue
    }
  2. internal/system/tempclean.go:46 - Similarly, os.Stat errors are silently ignored. This is likely intentional (directory deleted between glob and stat), but a comment would clarify:

    info, err := os.Stat(match)
    if err != nil {
        continue // Directory may have been cleaned up between glob and stat
    }

Potential Issues 🔍

No critical issues found. The implementation correctly handles the TOCTOU race condition by re-verifying directory age before deletion.

Observation: The dirSize calculation (line 118) uses filepath.Walk which silently ignores errors for individual files. This is reasonable for a cleanup tool, but the function signature (int64, error) suggests it might return an error - consider documenting that the error is always nil or removing it:

// dirSize calculates the total size of a directory recursively.
// Inaccessible files are silently skipped.
func dirSize(path string) int64 {
    var size int64
    filepath.Walk(path, func(_ string, info os.FileInfo, err error) error {
        if err == nil && !info.IsDir() {
            size += info.Size()
        }
        return nil
    })
    return size
}

Performance Considerations ⚡

Good: The implementation is efficient for the expected use case (small number of temp directories). The re-verification before deletion adds minimal overhead while preventing dangerous race conditions.

Note: dirSize walks entire directory trees to calculate sizes. For very large temp directories, this could be slow. However, this is acceptable for:

  • An explicitly-run cleanup command (not a background task)
  • Display purposes (user wants to see what they're deleting)
  • Typical moat temp directories are small (configs, credential helpers)

Security Concerns 🔒

Excellent security posture:

  1. TOCTOU protection: Re-verifies directory age at deletion time (line 80-85) to prevent accidentally deleting recently-created directories
  2. Pattern matching: Only scans specific patterns, won't touch arbitrary /tmp files
  3. Confirmation prompt: Requires user confirmation unless --force is used
  4. Dry-run mode: Allows safe preview of what would be deleted
  5. Age-based filtering: Default 1-hour minimum age prevents deleting active temp directories

Consideration: The patterns are hardcoded in MoatTempPatterns. If future code adds new temp directory patterns, they must be added here. This is acceptable since:

  • The patterns are centrally documented
  • The command is manually invoked (users can check what it finds)
  • Adding new patterns is a deliberate decision

Test Coverage 🧪

Excellent test coverage:

  • ✅ Age filtering with various thresholds
  • ✅ Pattern matching verification
  • ✅ Successful cleanup
  • ✅ TOCTOU race condition handling (recently modified directories are skipped)
  • ✅ Size formatting edge cases
  • ✅ Directory size calculation
  • ✅ Plural suffix helper

All key scenarios are tested. The TOCTOU test is particularly valuable as it validates the security-critical re-verification logic.

Documentation 📚

Strong documentation:

  • ✅ CLI reference updated with command syntax, flags, and examples
  • ✅ Inline code comments explain the patterns and behavior
  • ✅ Long description in cobra command explains when directories accumulate
  • ✅ Examples show common use cases (dry-run, different ages, force mode)

Matches CLAUDE.md style guide: Objective tone, factual claims, practical examples first.

Alignment with Issue #81

The PR fully addresses the issue requirements:

  • ✅ Implements moat system clean-temp command as suggested
  • ✅ Handles all three temp directory types mentioned (AWS, Claude, Codex)
  • ✅ Provides both dry-run and force modes for safe operation
  • ✅ Configurable minimum age to avoid deleting active directories

Note: The issue mentioned considering "finalizers or background cleanup" as alternatives. This PR implements the manual cleanup approach, which is more conservative and gives users control. Background cleanup could be a future enhancement if needed.

Overall Assessment ⭐

Recommend: APPROVE with minor suggestions

This is a well-crafted PR that solves a real problem (temp directory accumulation on crashes) with a safe, tested implementation. The security considerations (TOCTOU protection, confirmation prompts, pattern restrictions) demonstrate thoughtful design. The only suggestions are minor code clarity improvements.

The implementation follows Go best practices, aligns with the project's style guide, and includes comprehensive tests and documentation.

@dpup dpup merged commit dcd17b8 into main Jan 30, 2026
6 of 7 checks passed
@dpup dpup deleted the fix/temp-dir-cleanup branch January 30, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(runtime): Temporary directories not cleaned up when moat crashes

1 participant