feat(snapshot): add workspace snapshots and process tracing by andybons · Pull Request #36 · majorcontext/moat

andybons · 2026-01-20T20:01:31Z

Summary

Workspace Snapshots: Automatic and manual snapshotting of agent workspaces with APFS (macOS) and archive backends
Process Tracing: Platform-specific execution tracing via Linux proc connector and macOS sysctl polling
Rollback Support: moat rollback command to restore workspace to previous snapshot

Key Components

Snapshots (`internal/snapshot/`)

APFS backend for instant copy-on-write snapshots on macOS
Archive backend with gitignore-aware tar.gz fallback
Snapshot engine with automatic pre/post-run capture
Security fix: prevent symlink path traversal attacks in archive restore

Process Tracing (`internal/trace/`)

Linux: Netlink proc connector for kernel-level exec notifications
macOS: sysctl polling with start-time-based exec detection
Build-tagged implementations with stub fallback
Race condition fixes and periodic PID cleanup

CLI Commands

moat snapshot - Manual workspace snapshot
moat snapshots - List/manage snapshots
moat rollback - Restore to previous snapshot

Configuration

New snapshot and tracing fields in agent.yaml

Test Plan

Unit tests for all new packages
Integration tests for platform-specific tracers
Regression tests for race conditions and edge cases
go test -race ./... passes

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR adds comprehensive workspace snapshot capabilities and process execution tracing to the moat agent system. The implementation provides automatic pre-run snapshots, event-based snapshot triggers (git commits, builds, idle detection), and platform-specific execution tracing via Linux proc connector and macOS sysctl polling.

Changes:

Workspace snapshot system with APFS (macOS) and archive (tar.gz) backends
Process execution tracing with platform-specific implementations (Linux/macOS/stub)
Configuration support for snapshot triggers, exclusions, and retention policies
Storage integration for execution events
CLI commands for snapshot management (moat snapshots, moat snapshot, moat rollback)

Reviewed changes

Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
internal/snapshot/snapshot.go	Core snapshot types, metadata, and backend interface
internal/snapshot/archive.go	Archive backend with tar.gz and gitignore support
internal/snapshot/apfs_darwin.go	APFS backend using macOS tmutil commands
internal/snapshot/engine.go	Snapshot engine with backend detection and management
internal/trace/event.go	Execution event types with git/build detection helpers
internal/trace/tracer.go	Tracer interface definition
internal/trace/tracer_stub.go	No-op tracer for testing/unsupported platforms
internal/trace/tracer_linux.go	Linux proc connector tracer via netlink
internal/trace/tracer_darwin.go	macOS sysctl polling tracer
internal/config/config.go	Snapshot and tracing configuration schema
internal/storage/storage.go	Execution event storage (exec.jsonl)
internal/run/run.go	Added SnapEngine field to Run struct
internal/run/manager.go	Integrated snapshot engine into run lifecycle
go.mod/go.sum	Added go-git dependency for gitignore parsing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T20:04:29Z

+				f.Close()
+				return fmt.Errorf("write file %s: %w", header.Name, err)
+			}
+			f.Close()


File handle may be writable as a result of data flow from a call to OpenFile and closing it may result in data loss upon failure, which is not handled explicitly.

Suggested change

f.Close()

return fmt.Errorf("write file %s: %w", header.Name, err)

}

f.Close()

// Best-effort close; preserve the original write error

_ = f.Close()

return fmt.Errorf("write file %s: %w", header.Name, err)

}

if err := f.Close(); err != nil {

return fmt.Errorf("close file %s: %w", header.Name, err)

}

Copilot · 2026-01-20T20:04:29Z

+				f.Close()
+				return fmt.Errorf("write file %s: %w", header.Name, err)
+			}
+			f.Close()


File handle may be writable as a result of data flow from a call to OpenFile and closing it may result in data loss upon failure, which is not handled explicitly.

Suggested change

f.Close()

return fmt.Errorf("write file %s: %w", header.Name, err)

}

f.Close()

if cerr := f.Close(); cerr != nil {

return fmt.Errorf("write file %s: %v (also failed to close: %w)", header.Name, err, cerr)

}

return fmt.Errorf("write file %s: %w", header.Name, err)

}

if err := f.Close(); err != nil {

return fmt.Errorf("close file %s: %w", header.Name, err)

}

Copilot · 2026-01-20T20:04:30Z

+	defer f.Close()
+
+	data, err := json.Marshal(event)
+	if err != nil {
+		return fmt.Errorf("marshaling exec event: %w", err)
+	}
+	if _, writeErr := f.Write(data); writeErr != nil {
+		return fmt.Errorf("writing exec event: %w", writeErr)
+	}
+	_, err = f.Write([]byte("\n"))
+	return err


File handle may be writable as a result of data flow from a call to OpenFile and closing it may result in data loss upon failure, which is not handled explicitly.

Suggested change

defer f.Close()

data, err := json.Marshal(event)

if err != nil {

return fmt.Errorf("marshaling exec event: %w", err)

}

if _, writeErr := f.Write(data); writeErr != nil {

return fmt.Errorf("writing exec event: %w", writeErr)

}

_, err = f.Write([]byte("\n"))

return err

data, err := json.Marshal(event)

if err != nil {

_ = f.Close()

return fmt.Errorf("marshaling exec event: %w", err)

}

if _, writeErr := f.Write(data); writeErr != nil {

_ = f.Close()

return fmt.Errorf("writing exec event: %w", writeErr)

}

if _, writeErr := f.Write([]byte("\n")); writeErr != nil {

_ = f.Close()

return fmt.Errorf("writing exec event newline: %w", writeErr)

}

if closeErr := f.Close(); closeErr != nil {

return fmt.Errorf("closing exec file: %w", closeErr)

}

return nil

claude · 2026-01-20T20:04:36Z

Code Review - PR #36: Workspace Snapshots & Process Tracing

Excellent work on this substantial feature! The architecture is solid with good separation of concerns, strong security practices, and comprehensive test coverage. However, I've identified several issues that should be addressed before merging.

Critical Issues (Must Fix)

1. Race Condition in .git Preservation ⚠️

Location: internal/snapshot/archive.go:156-210

Issue: The .git backup/restore creates a window where the workspace is in an inconsistent state:

// Current code:
os.Rename(gitDir, gitBackup)        // .git renamed away
// ... multiple fs operations happen here ...
os.Rename(gitBackup, gitDir)        // .git restored

If the process crashes between these operations, the workspace loses all git history (it's in .git.backup).

Recommendations:

Copy instead of rename for the backup
Use a temporary directory in snapshot storage instead of the workspace
At minimum: document this risk clearly

2. Unhandled .git Restoration Errors ⚠️

Locations: archive.go:174, 188, 198

// Errors are silently ignored:
os.Rename(gitBackup, gitDir)

If restoration fails, the workspace is corrupted with no feedback to the user.

Fix: Check and wrap these errors:

if err := os.Rename(gitBackup, gitDir); err != nil {
    return fmt.Errorf("restoring .git from backup: %w", err)
}

High Priority Issues

3. APFS Stub Silently Succeeds on Non-Darwin

Location: internal/snapshot/apfs_stub.go:21

Returns empty string (success) when APFS snapshots aren't available. Users won't know snapshots aren't being created.

Fix:

func (a *APFSBackend) Create(name string) (string, error) {
    return "", fmt.Errorf("APFS snapshots not available on this platform")
}

4. Metadata Corruption Not Recoverable

Location: internal/snapshot/engine.go:232

If snapshots.json is corrupted, users get a generic unmarshal error with no recovery path.

Fix: Provide actionable guidance:

if err := json.Unmarshal(data, &e.snapshots); err != nil {
    return fmt.Errorf("corrupted snapshot metadata at %s: %w\n"+
        "To reset: rm %s", metaPath, err, metaPath)
}

Medium Priority Issues

5. Event Emission Callback Race

Locations: tracer_linux.go:376-398, tracer_darwin.go:416-439

Callbacks are invoked outside the lock after event channel send. If Stop() is called concurrently, tracer state can become inconsistent.

Current:

t.mu.Lock()
select {
case t.events <- event:
default:
    t.droppedEvents++
}
t.mu.Unlock()

// Callback outside lock - can race with Stop()
for _, cb := range cbs {
    cb(event)
}

Fix: Either hold lock during callbacks or use defer/channels for cleanup coordination.

6. Darwin Tracer Offset Assumptions Not Validated

Location: tracer_darwin.go:29-40

Hardcoded struct offsets for kinfo_proc are assumed correct but not validated at runtime.

Fix: Add sanity checks after parsing:

pid := binary.LittleEndian.Uint32(b[pidOffset : pidOffset+4])
if pid > 99999 {  // Reasonable max PID
    log.Printf("WARNING: parsed suspicious PID %d, offset may be wrong", pid)
}

Code Quality Issues

7. Error Ignored in Archive Cleanup

Location: archive.go:149

os.Remove(archivePath)  // Error ignored
return nil, fmt.Errorf("creating tar writer: %w", err)

Should log or wrap the cleanup error.

8. .git Symlink Not Handled

The .git preservation logic doesn't handle the case where .git is a symlink. After restore, the symlink would point to the old (backup) location.

Fix: Use filepath.EvalSymlinks() before backup operations.

Test Coverage Gaps

Excellent test coverage overall, but consider adding:

Concurrent snapshot creation - Two engine.Create() calls simultaneously
.git corruption recovery - What happens if .git backup rename fails
Corrupted snapshots.json - Unmarshal error handling
Permission errors - Read-only workspace during archive creation
Linux tracer edge cases - Malformed netlink packets, out-of-order events

The existing path traversal tests are excellent! 👍

Documentation Suggestions

Add docstring to Backend interface explaining method semantics
Document snapshot ID format (snap_<8-hex-chars>)
Document APFS backend limitations (no .git preservation vs archive backend)
Add performance characteristics note for large workspaces
Document tracer limitations (can't restart, event dropping)

Positive Highlights

✅ Excellent security practices - Path traversal prevention is defense-in-depth with comprehensive tests
✅ Clean architecture - Interface-based design, proper platform abstraction with build tags
✅ Resource management - Proper file handle cleanup (nice comment on archive.go:138)
✅ Structured logging - Consistent use of slog throughout
✅ Error wrapping - Mostly consistent use of %w for error chains
✅ Concurrency design - Good use of RWMutex for PID tracking in Linux tracer

Overall Assessment

This is a well-designed feature with strong fundamentals. The architecture is sound and the code demonstrates good Go practices. The main concerns are around error handling completeness and edge case robustness, particularly around the .git preservation logic which poses a data integrity risk.

Recommendation: Address the critical issues (#1, #2) and high priority issues (#3, #4) before merging. The medium priority issues can be addressed post-merge if time is constrained.

claude · 2026-01-20T20:11:52Z

Code Review: PR #36 - Workspace Snapshots and Process Tracing

Overview

This PR adds significant new functionality with 9,098 additions across 37 files, introducing workspace snapshots with APFS/archive backends and platform-specific process tracing. The implementation is generally solid with good security practices and comprehensive testing.

Summary Assessment

Overall Quality: ✅ Strong - Well-architected with good security practices
Test Coverage: ✅ Good - 2,247 lines of tests, though some concurrency edge cases need coverage
Security: ✅ Solid - Excellent symlink path traversal protection
Documentation: ✅ Excellent - Clear error messages and examples following CLAUDE.md

Critical Findings

🟢 No Critical Issues Found

The code is production-ready with no blocking security vulnerabilities or critical bugs.

High-Priority Recommendations

1. Archive Restore Atomicity (Medium Priority)

File: internal/snapshot/archive.go:194-200

Issue: If extraction fails midway, the workspace may contain partial files in an inconsistent state. While .git is restored, other files may be corrupted.

Current Code:

if err := b.RestoreTo(nativeRef, workspacePath); err != nil {
    // Restore .git on error (best effort)
    if gitBackup != "" {
        _ = os.Rename(gitBackup, gitDir)
    }
    return fmt.Errorf("extract archive: %w", err)
}

Recommendation: Extract to a temporary directory first, then swap atomically:

tmpDir := workspacePath + ".restore-tmp"
if err := b.RestoreTo(nativeRef, tmpDir); err != nil {
    os.RemoveAll(tmpDir)
    return fmt.Errorf("extract archive: %w", err)
}
// Atomic swap with proper cleanup

2. Insufficient Concurrency Test Coverage (Medium Priority)

Files: internal/trace/*_test.go

Missing Tests:

Concurrent Start()/Stop() calls
Channel overflow scenarios (dropped events)
Callback panic handling
Process tree tracking edge cases (grandchildren)
Linux consecutive error recovery
Goroutine cleanup verification

Recommendation: Add these test cases:

func TestTracerConcurrentStartStop(t *testing.T)
func TestTracerChannelOverflow(t *testing.T) 
func TestTracerCallbackPanic(t *testing.T)
func TestTracerGrandchildTracking(t *testing.T)
func TestLinuxTracerConsecutiveErrors(t *testing.T)
func TestTracerGoroutineCleanup(t *testing.T)

Security Review

✅ Strengths

Symlink Path Traversal Protection (archive.go:272-301)
- Correctly rejects absolute symlink targets
- Validates relative symlinks don't escape destination
- Comprehensive test coverage in archive_test.go:413-539
Decompression Bomb Protection (archive.go:266-271)
- 1GB limit per file prevents resource exhaustion
- Consider making configurable for legitimate large files
Race Condition Prevention (tracer_*.go)
- Proper mutex usage throughout
- Defensive channel send patterns prevent deadlocks
- Lock held during channel send to prevent Stop() race (tracer_darwin.go:416-438)

⚠️ Minor Security Considerations

APFS Backend Privilege Requirements (apfs_darwin.go:60-85)
- tmutil restore may require elevated privileges
- Restore operates on entire volume, not just workspace subdirectory
- Consider documenting these limitations or defaulting to archive backend
Error Cleanup Failures Silently Ignored (archive.go:147-151)
- Partial archives may remain if cleanup fails
- Recommendation: Log cleanup failures

Code Quality Observations

✅ Excellent Patterns

Error Messages - Follow CLAUDE.md guidelines perfectly:
- rollback.go:127: Shows undo command after operations
- snapshots.go:200-201: Explains state clearly
- Actionable guidance throughout
Resource Management:
- Proper defer chains for file handles (archive.go:57-63)
- Clean shutdown sequence in tracers (tracer_linux.go:119-141)
- Immediate file handle close to avoid accumulation (archive.go:133-142)
Error Handling:
- Metadata consistency with rollback (engine.go:124-129)
- Linux tracer consecutive error limiting (tracer_linux.go:195-235)
- Non-blocking snapshot creation (manager.go:956-961)

📝 Improvement Opportunities

Enhance CLI Error Messages (rollback.go:68, 83)
- Suggest next actions when snapshots not found
- Example: "List available snapshots: moat snapshots "
Add Destination Validation (rollback.go:100-108)
- Check if --to directory exists before extraction
- Provide clear guidance on how to proceed
Darwin Tracer Error Rate Limiting (tracer_darwin.go:155-157)
- Missing consecutive error tracking like Linux tracer has
- Silent failures in getAllProcesses() could hide persistent issues
Configuration Default Duplication (config.go:159-165, 171-186)
- Defaults set in both Load() and DefaultConfig()
- Consider consolidating or documenting why both are needed

Performance Considerations

✅ Good Choices

APFS Copy-on-Write: Instant snapshots on macOS with APFS
Gitignore-Aware Archives: Reduces snapshot size significantly
Separate Mutexes: tracer_linux.go uses RWMutex for PID map (lines 52-53)
Non-Blocking Snapshots: Optional feature doesn't impact run performance

💡 Potential Optimizations

Prune Operation (snapshots.go:176-197)
- No transaction semantics - snapshots created during prune aren't considered
- Acceptable for non-critical operation, but worth noting
Archive Compression: Using gzip - consider offering compression level options

Testing Assessment

Test Lines: 2,247 lines across snapshot and trace packages

✅ Well-Tested Areas

Archive backend with security tests (730 lines)
Engine operations and metadata persistence (604 lines)
Symlink path traversal attacks
Gitignore handling
File permissions preservation

⚠️ Needs More Coverage

Concurrent tracer operations
Process tree tracking edge cases
Error recovery scenarios
Resource leak detection

Recommendation: Run go test -race ./... to verify concurrency safety (PR mentions this passes ✓)

Documentation Quality

✅ Excellent

All CLI commands have detailed Long descriptions with examples
Comments explain "why" not just "what"
Design docs included in docs/plans/ (3,839 lines total)
Inline security considerations documented

📝 Suggestions

Add godoc examples for exported functions in snapshot/trace packages
Document APFS backend limitations (privilege requirements, volume-level operations)
Clarify when archive vs APFS backend is selected

Specific File Reviews

`internal/snapshot/archive.go` ✅

Security: Excellent symlink protection
Resource management: Good defer patterns
Minor: Log cleanup failures, consider atomic restore

`internal/snapshot/engine.go` ✅

Clean abstraction over backends
Good metadata consistency
Proper cleanup on errors

`internal/trace/tracer_darwin.go` ✅

Excellent concurrency patterns
Good process tree tracking
Add error rate limiting like Linux version

`internal/trace/tracer_linux.go` ✅

Netlink integration looks solid
Good consecutive error handling
Proper mutex usage

`cmd/moat/cli/*.go` ✅

Excellent UX and error messages
Good command structure
Enhance with suggested next actions

`internal/config/config.go` ✅

Clean schema design
Sensible defaults
Consolidate default logic

Dependencies

New dependencies added (go.mod):

No new direct dependencies (only transitive updates)
Clean implementation using stdlib where possible ✅

Recommendations Summary

Must Address Before Merge

None - code is production-ready

Should Address Soon

Add atomic restore for archive backend
Expand tracer concurrency test coverage
Add error rate limiting to Darwin tracer

Nice to Have

Make decompression limit configurable
Enhance CLI error messages with next actions
Add godoc examples
Consider fuzzing for archive extraction

Conclusion

This is high-quality work that follows project conventions and security best practices. The implementation is thoughtful with excellent documentation and good test coverage. The symlink path traversal protection is particularly well-done with comprehensive tests.

Recommendation: ✅ Approve with minor suggestions

The suggestions above are for improvement but don't block merging. The code is secure, well-tested, and ready for production use.

Review generated using Claude Code following CLAUDE.md guidelines

claude · 2026-01-20T21:17:44Z

Comprehensive Code Review: PR #36 - Workspace Snapshots and Process Tracing

Overview

This PR adds significant new functionality (~9,130 additions across 38 files) for workspace snapshots and process tracing. Overall, this is a solid implementation with excellent security practices, though there are areas for improvement.

✅ Strengths

Security Implementation (Excellent)

The path traversal protection in internal/snapshot/archive.go is textbook-quality:

File Path Validation (lines 241-246):

relToDestPath, err := filepath.Rel(destPath, targetPath)
if err != nil || strings.HasPrefix(relToDestPath, "..") {
    return fmt.Errorf("invalid path in archive: %s", header.Name)
}

Symlink Validation (lines 280-296):

Rejects absolute symlink targets
Validates relative symlinks resolve within destination
Correctly uses filepath.Rel() to detect escape attempts

Additional Protections:

Decompression bomb protection: 1GB limit per file (line 267)
Mode masking to prevent setuid/setgid bits (lines 251, 261)

This is security done right! 🎯

Concurrency Safety

Excellent use of atomic operations in manager.go:301-332:

var storeRef atomic.Value // holds *storage.RunStore

The atomic reference pattern cleverly avoids a race condition where the proxy logger might try to write before the store is initialized.

Proper mutex usage throughout:

snapshot/engine.go: Consistent lock/unlock for snapshot map operations
trace/tracer_linux.go: Good separation with sync.RWMutex for read-heavy PID tracking

Resource Management

Smart file handle management in archive.go:138:

f.Close() // Close immediately, not deferred, to avoid accumulating file handles

Excellent comment explaining why immediate close is used in loops instead of defer!

Architecture

Clean Backend interface abstraction for snapshot strategies
Proper platform-specific code using build tags (//go:build darwin, //go:build linux)
Well-integrated into run lifecycle with sensible defaults
Good CLI design with safety features (rollback creates safety snapshot before restore)

⚠️ Issues & Concerns

Critical: APFS Backend May Not Work as Designed

Location: internal/snapshot/apfs_darwin.go:809-835

The APFS backend uses tmutil localsnapshot which is designed for Time Machine, not programmatic snapshots. The restore logic assumes tmutil restore works like a rollback, but it's actually for restoring files from Time Machine backups to a different location.

Recommendation: Use diskutil apfs createSnapshot and diskutil apfs revertToSnapshot instead, or clearly document that APFS support is experimental/unsupported.

High Priority: Missing Security Tests

Despite excellent security implementation, there are no tests validating the path traversal fixes:

No tests for ../../../etc/passwd in tar entries
No tests for the complex symlink validation logic (lines 280-296)
No tests for decompression bomb protection
No tests for concurrent snapshot operations

Recommendation: Add dedicated security tests to prevent regressions:

func TestArchive_PathTraversalProtection(t *testing.T) {
    // Test cases for .., absolute paths, symlinks escaping destination
}

Medium Priority: Resource Management Edge Cases

Git Directory Backup Races (archive.go:158-210):
- Concurrent restores could overwrite .git.backup
- Crashes leave .git.backup orphaned
- Consider using unique temp directories: .git.backup.<timestamp>
No Disk Space Checks: Archive creation doesn't verify available space before starting. Large workspaces could fill the disk.
Synchronous Pre-Run Snapshots (manager.go:957-961): Block container startup. For large workspaces, consider making async or adding a timeout.

Low Priority: Code Quality

Long Functions: manager.Create() is 870+ lines and handles too many responsibilities (container, proxy, SSH, snapshots, audit). Consider extracting helpers.
gitignore Performance (archive.go:507-547): buildMatcher() walks entire workspace to find all .gitignore files. For large workspaces, consider caching or only checking root.
Missing Constants Documentation: Some magic numbers lack explanation (4096 buffer size, 1GB file limit).
Minor Race Condition: tracer_linux.go:147-150 - OnExec() appends to callbacks without lock, but callbacks are invoked without holding lock. Likely not an issue since OnExec() is called during init, but worth noting.

📝 Additional Observations

Error Handling

Good:

Consistent error wrapping with context (fmt.Errorf("...: %w", err))
Cleanup on errors with explicit _ = for best-effort cleanup
Helpful error messages with recovery instructions

Could Improve:

Silent snapshot failures during run creation (logged but don't fail the run) - users might not notice lost protection
APFS backend silently falls back on parsing errors

Test Coverage

Present:

Basic archive create/restore/delete operations
Engine lifecycle tests
Event parsing tests

Missing:

Security-focused tests (as noted above)
Integration/E2E tests for snapshot creation during runs
APFS backend functional tests (only detection is tested)

Documentation

Design docs are comprehensive, but:

Implementation diverges slightly (idle triggers and build triggers not fully wired up)
APFS implementation may not work as described
Missing operational docs (disk space requirements, troubleshooting)

📊 Summary & Recommendation

Verdict: Approve with Minor Concerns ✅

The core functionality is sound and the security implementation is exemplary. The architecture is clean and extensible. However:

Before Merge:

✅ Mark APFS backend as experimental in code/docs, or fix the implementation
✅ Add at least basic security tests for path traversal protection

Follow-up Work:

Improve test coverage (especially security and concurrency tests)
Add disk space checks before snapshot creation
Consider async pre-run snapshots for large workspaces
Refactor manager.Create() to reduce complexity
Add confirmation prompt to snapshots prune command
Wire up tracer integration completely (appears incomplete)

🎯 Final Notes

This is well-crafted code that demonstrates strong security awareness and good Go practices. The path traversal fix is thorough and production-ready. With the recommended test additions and APFS clarifications, this will be a solid feature addition to moat.

Great work on the security implementation! The symlink validation logic is particularly well thought out.

claude · 2026-01-21T00:37:01Z

Pull Request Review: Workspace Snapshots and Process Tracing

Overall Assessment

This is a well-architected and thoroughly implemented PR that adds two major features: workspace snapshots and process tracing. The code demonstrates strong engineering practices with comprehensive test coverage (39 snapshot tests, 19 trace tests), proper security considerations, and platform-specific optimizations.

Recommendation: ✅ Approve with minor suggestions

Code Quality & Best Practices

Strengths

Excellent Separation of Concerns
- Clean abstraction with Backend interface for snapshot implementations
- Platform-specific build tags (//go:build linux, //go:build darwin) properly isolate code
- Factory pattern for tracer selection based on platform
Robust Error Handling
- Comprehensive error wrapping with context (e.g., fmt.Errorf("backend create: %w", err))
- Recovery paths with best-effort cleanup (e.g., gitignore backup restoration on errors in archive.go:187-198)
- Clear, actionable error messages (follows CLAUDE.md guidance)
Thread Safety
- Proper mutex usage in Engine and all tracers
- Lock-then-copy pattern for callbacks to prevent deadlock (e.g., tracer_linux.go:383-386)
- Stopped flags to prevent double-close panics
Test Coverage
- 58 total tests across snapshot and trace packages
- Regression tests for race conditions and edge cases
- Security-focused tests for symlink path traversal attacks
- Integration tests for platform-specific tracers

Security Review

✅ Security Strengths

Path Traversal Protection (archive.go:238-296)
- Validates archive paths don't escape destination using filepath.Rel
- Rejects absolute symlink targets (archive.go:282)
- Validates symlink targets stay within destination (archive.go:293-296)
- Comprehensive test coverage for attack vectors (archive_test.go:413-484)
Decompression Bomb Protection (archive.go:267)
- 1GB per-file limit using io.LimitReader
- Prevents zip bomb attacks
Safe Integer Conversions
- Appropriate //nolint:gosec G115 with validation comments where needed
- PIDs and modes properly range-validated before conversion

⚠️ Security Considerations

APFS Backend Disabled (Good Decision)
- The APFS backend using tmutil creates volume-level snapshots, not directory-level
- Correctly disabled by default in engine.go:87-98
- Warning comment explains the issue
- Suggestion: Consider documenting the proper approach (using cp -c for COW cloning) in a GitHub issue for future implementation
Privileged Operations
- Linux tracer requires CAP_NET_ADMIN or root (documented in error message, tracer_linux.go:86)
- APFS restore may require elevated privileges (noted in comment, apfs_darwin.go:61-62)
- This is acceptable for the use case but worth noting in documentation

Performance Considerations

✅ Optimizations

Platform-Specific Efficiency
- Linux: Kernel-level netlink connector (zero polling overhead)
- macOS: Sysctl polling with 100ms interval (acceptable tradeoff vs ESF complexity)
- Start-time-based exec detection on macOS avoids false positives
Resource Management
- Non-blocking event channels with dropped event counters
- Periodic PID cleanup to handle missed EXIT events (tracer_linux.go:206-208)
- File handle discipline (immediate close after read, not deferred in loops)
Gitignore-Aware Snapshots
- Uses go-git library for proper gitignore pattern parsing
- Avoids snapshotting node_modules, build artifacts, etc.

💡 Performance Suggestions

Archive Backend Buffer Size
- archive.go:194: Consider a larger read buffer (currently 4KB default from WalkDir)
- For large workspaces, buffered I/O could improve snapshot creation speed
Darwin Tracer Poll Interval
- 100ms polling may be aggressive for long-running agents
- Consider making pollInterval configurable via Config

Potential Issues

🐛 Bugs

APFS IsAPFS() Function Unreliable (apfs_darwin.go:156-184)
- String matching on diskutil output is fragile
- Falls back to mount point check, then checks for substring "apfs"
- Impact: Low (APFS backend disabled by default)
- Suggestion: If re-enabling APFS backend in future, use a more robust check (e.g., statfs syscall with f_fstypename)
Potential Race in Metadata Corruption Recovery (engine.go:237)
- Error message suggests deleting snapshots.json, but doesn't handle orphaned .tar.gz files
- Impact: Low (user recoverable, data not lost)
- Suggestion: Add a recovery command or auto-rebuild metadata from existing archives

⚠️ Minor Issues

Magic Numbers (tracer_linux.go:238)
- if n >= 52 uses hardcoded minimum message size
- Suggestion: Define as constant minProcEventSize = 52 // nlhdr(16) + cnhdr(20) + proc_event(16)
Error Shadowing (archive.go:137)
- Variable named copyErr to avoid shadowing err, but could use inline assignment
- Minor style issue, not a bug
Inconsistent Error Handling for Close() (archive.go:57, 138)
- Line 57: defer file.Close() ignores error
- Line 138: f.Close() called immediately, error not checked
- Line 268: Close error properly checked
- Suggestion: Consistently check close errors in functions that return errors

Test Coverage Analysis

✅ Well-Tested Areas

Snapshot creation, restoration, deletion (archive backend)
Gitignore pattern matching
Symlink handling and path traversal attacks
Concurrent tracer operations (race condition tests)
PID tracking and cleanup
Build/git commit detection

🧪 Suggested Additional Tests

Large File Handling
- Test snapshot creation with files approaching 1GB limit
- Verify decompression bomb protection triggers correctly
Metadata Corruption Recovery
- Test loadMetadata() behavior with corrupted JSON
- Test recovery when .tar.gz exists but not in metadata
CLI Command Integration Tests
- End-to-end test of moat snapshot → moat rollback flow
- Test error messages match expected format

Architecture & Design

✅ Strong Design Decisions

Backend Abstraction
- Backend interface allows swapping implementations
- Engine auto-detects best backend (currently always archive, ready for future APFS cp -c implementation)
Metadata Persistence
- snapshots.json survives process restarts
- Clean separation between metadata and native refs
Type Safety
- Type enum for snapshot types (Auto, Manual, Safety, Rollback)
- Prevents string-based type confusion

💭 Design Questions

Snapshot Retention Policy
- No automatic cleanup of old snapshots
- For long-running agents, disk usage could grow unbounded
- Suggestion: Consider adding retention config (e.g., keep last N auto snapshots)
Snapshot Compression Level
- Uses default gzip compression (level 6)
- Suggestion: Consider exposing compression level in ArchiveOptions for speed vs size tradeoff

Documentation

✅ Good Documentation

Package-level docs for trace package (doc.go)
Inline comments explain security validations
CLI help text clear and includes examples
Commit messages follow Conventional Commits format

📝 Documentation Suggestions

agent.yaml Schema
- Add example agent.yaml showing snapshot and tracing configuration
- Document what UseGitignore and Additional do
APFS Limitation
- Create a GitHub issue documenting why APFS tmutil approach is incorrect
- Link issue from engine.go:92 comment (currently has placeholder URL)
Tracer Limitations
- Document that Linux tracer requires CAP_NET_ADMIN
- Document macOS tracer polling frequency and accuracy tradeoffs

Specific File Comments

`internal/snapshot/archive.go`

Lines 88-94: Git directory exclusion is correctly handled, but the logic is duplicated.

if relPath == ".git" || strings.HasPrefix(relPath, ".git/") || 
   strings.HasPrefix(relPath, ".git"+string(filepath.Separator)) {

Suggestion: Simplify to strings.HasPrefix(relPath, ".git") || relPath == ".git"

`internal/trace/event.go`

Lines 59-69: Build command detection is clever with the word boundary matching!

// Pattern followed by space (e.g., "build ./..." or "run build --production")
if strings.HasPrefix(argsStr, pattern+" ") {

This correctly avoids false positives like npm run build-docker. Nice work!

`internal/trace/tracer_darwin.go`

Lines 18-42: Architecture-specific struct offsets are well-documented

Clean separation for arm64 vs amd64
Suggestion: Consider adding a compile-time check that panics on unknown architectures instead of silently using amd64 offsets

`cmd/moat/cli/rollback.go`

Lines 111-117: Safety snapshot feature is excellent UX

fmt.Print("Creating safety snapshot of current state... ")
safetySnap, err := engine.Create(snapshot.TypeSafety, "pre-rollback")

Prevents destructive accidents
Clear undo message printed at end

Summary of Recommendations

High Priority

✅ Approve and merge - code is production-ready
📝 Create GitHub issue for APFS cp -c implementation (replace placeholder URL in engine.go:92)
🧹 Add snapshot retention policy or document disk usage growth

Medium Priority

🐛 Make Darwin tracer poll interval configurable
📚 Add agent.yaml examples to docs
🧪 Add metadata corruption recovery tests

Low Priority (Polish)

🎨 Extract magic number 52 to named constant in tracer_linux.go
🔧 Consistent error handling for Close() operations
📖 Document tracer privilege requirements in main README

Conclusion

This is excellent work that adds significant value to the moat project. The implementation is secure, performant, and well-tested. The conscious decision to disable the APFS backend until a proper implementation shows good engineering judgment. The comprehensive test suite and attention to security details (especially symlink path traversal protection) demonstrate production-quality code.

The code follows the CLAUDE.md guidelines well - good error messages, conventional commits, and practical examples in CLI help text.

Ship it! 🚀

claude · 2026-01-21T00:42:23Z

Code Review: Workspace Snapshots and Process Tracing

Overview

This PR adds significant new functionality: workspace snapshots with APFS and archive backends, process tracing for Linux and macOS, and rollback capabilities. The implementation is generally well-designed with good test coverage, but there are several issues that need attention.

Critical Issues

1. Security: Command Injection Risk in APFS Backend

File: internal/snapshot/apfs_darwin.go (lines 52, 121, 157, 209, 228)

While the current implementation uses separate arguments (preventing shell injection), paths come from user-controlled input. If an attacker can control the workspace path or snapshot ID, they could potentially pass malicious arguments starting with -.

Recommendation: Add path validation:

// Before executing commands, validate paths
if strings.HasPrefix(filepath.Base(workspacePath), "-") {
    return "", fmt.Errorf("invalid workspace path: cannot start with -")
}

2. Performance: Potential Resource Leak in Archive Creation

File: internal/snapshot/archive.go (lines 59-63)

Deferred closes happen in reverse order, but errors from Close() are ignored. This could lead to incomplete archives.

Recommendation: Implement proper cleanup with error checking:

var writeErr error
defer func() {
    if err := tw.Close(); err != nil && writeErr == nil {
        writeErr = err
    }
    if err := gw.Close(); err != nil && writeErr == nil {
        writeErr = err
    }
    if err := file.Close(); err != nil && writeErr == nil {
        writeErr = err
    }
}()
// Return writeErr at the end of function

3. Security: Add Total Archive Size Limit

File: internal/snapshot/archive.go (line 267)

The 1GB per-file limit is excellent, but a malicious archive could still contain thousands of 1GB files.

Recommendation: Add a total archive size limit or file count limit in the restore loop.

Major Issues

4. Race Condition: Engine Metadata Concurrency

File: internal/snapshot/engine.go (lines 104-133)

The mutex is held during potentially long-running I/O operations (creating snapshots), which could block other goroutines unnecessarily.

Recommendation: Narrow the critical section:

id := NewID()
nativeRef, err := e.backend.Create(e.workspace, id)  // Outside lock
if err != nil {
    return Metadata{}, fmt.Errorf("backend create: %w", err)
}

e.mu.Lock()
defer e.mu.Unlock()
// Only metadata operations under lock

5. Process Tracing: Missing Privilege Requirements Documentation

File: internal/trace/tracer_linux.go (line 86)

There's no package-level documentation about privilege requirements.

Recommendation: Add godoc comments:

// NewProcConnectorTracer creates a new proc connector tracer.
// Requires CAP_NET_ADMIN capability or root privileges on Linux.
// Returns an error if the netlink socket cannot be created.

6. Darwin Tracer: Hardcoded Structure Offsets

File: internal/trace/tracer_darwin.go (lines 28-41)

These offsets are fragile and could break across macOS versions.

Recommendation:

Add version detection and skip gracefully on unsupported versions
Document the supported macOS versions in documentation
Consider adding a warning when running on untested macOS versions

Minor Issues

7. Error Handling: Quote Path in Recovery Message

File: internal/snapshot/engine.go (lines 232-235)

The error message includes an rm command with an unquoted path.

Recommendation:

return fmt.Errorf("corrupted snapshot metadata at %s: %w\nTo reset, delete the file: rm %q", path, err, path)

8. Configuration: Add Validation

File: internal/config/config.go (lines 160-165)

There's no validation that MaxCount is positive or that IdleThresholdSeconds is reasonable.

Recommendation: Add validation:

if cfg.Snapshots.Retention.MaxCount < 0 {
    return nil, fmt.Errorf("snapshots.retention.max_count must be non-negative")
}

9. Test Coverage: Missing Edge Cases

File: internal/snapshot/archive_test.go

Excellent coverage for symlink attacks and basic operations, but missing tests for:

Concurrent snapshot creation/deletion
Disk full scenarios during restore
Corrupted tar.gz archives
Partial restore failures (cleanup verification)

10. Performance: Darwin Tracer Polling Not Configurable

File: internal/trace/tracer_darwin.go (line 51)

100ms polling might be too coarse for short-lived processes, but there's no way to configure it.

Recommendation: Make polling interval configurable via Config struct.

Positive Highlights

✅ Excellent test coverage with regression tests for race conditions
✅ Security-conscious design with symlink validation and decompression bomb protection
✅ Great error messages following CLAUDE.md guidelines with actionable recovery steps
✅ Good separation of concerns with Backend interface abstraction
✅ Platform-specific implementations properly isolated with build tags
✅ Comprehensive design documentation

Summary

Recommendation: Approve with changes

The code is well-designed and follows good practices. The symlink path traversal protection is excellent. Main concerns:

Must fix before merge:

Add path validation for command arguments in APFS backend
Add total archive size limit in restore
Fix error handling for writer close operations

Should fix before merge:
4. Narrow critical sections in Engine.Create()
5. Add configuration validation for snapshot settings
6. Document privilege requirements for Linux tracer

Nice to have:
7. Add more edge case tests
8. Make Darwin tracer polling interval configurable
9. Add version detection for Darwin tracer offsets

The PR adds valuable functionality with good attention to security and testing. The identified issues are addressable and don't represent fundamental design flaws.

claude · 2026-01-21T00:53:43Z

Security and Code Quality Review - PR #36

I've completed a thorough security and code quality review of the workspace snapshots and process tracing implementation.

Executive Summary

Overall Assessment: EXCELLENT ✅

This PR demonstrates strong security practices and solid engineering. The implementation shows careful attention to security-critical details like symlink path traversal prevention, race condition mitigation, and proper error handling.

Key Strengths:

Symlink path traversal protection is thoroughly implemented with comprehensive test coverage
Race condition fixes are well-designed with proper synchronization
Excellent test coverage with edge case testing (11 archive tests, 8 regression tests)
Resource cleanup is handled carefully (file descriptor management)
Error messages are clear and actionable per CLAUDE.md guidelines

1. Security Analysis

✅ Symlink Path Traversal Prevention (EXCELLENT)

Location: internal/snapshot/archive.go:274-296

The symlink handling demonstrates excellent security practice:

// Reject absolute symlink targets - they could point anywhere on the filesystem
if filepath.IsAbs(header.Linkname) {
    return fmt.Errorf("invalid symlink in archive: absolute path not allowed: %s -> %s",
        header.Name, header.Linkname)
}

// Resolve the symlink target relative to its location within destPath
symlinkDir := filepath.Dir(targetPath)
resolvedTarget := filepath.Join(symlinkDir, header.Linkname)
resolvedTarget = filepath.Clean(resolvedTarget)

// Verify the resolved target stays within destPath
relToDestPath, err := filepath.Rel(destPath, resolvedTarget)
if err != nil || strings.HasPrefix(relToDestPath, "..") {
    return fmt.Errorf("invalid symlink in archive: target escapes destination: %s -> %s",
        header.Name, header.Linkname)
}

Validated against:

✅ Absolute symlinks (/etc/passwd)
✅ Relative path traversal (../../../etc/passwd)
✅ Subdirectory escapes
✅ Test coverage: TestArchiveBackendSymlinkPathTraversal with 5 test cases

✅ Regular File Path Traversal Prevention (EXCELLENT)

Location: internal/snapshot/archive.go:238-246

Every extracted file path is validated before filesystem operations:

targetPath := filepath.Join(destPath, header.Name) //nolint:gosec // G305: validated below

relToDestPath, err := filepath.Rel(destPath, targetPath)
if err != nil || strings.HasPrefix(relToDestPath, "..") {
    return fmt.Errorf("invalid path in archive: %s", header.Name)
}

✅ Decompression Bomb Protection (GOOD)

Location: internal/snapshot/archive.go:266-267

// Limit copy size to prevent decompression bombs (1GB max per file)
if _, err := io.Copy(f, io.LimitReader(tr, 1<<30)); err != nil {

The 1GB per-file limit is reasonable for workspace snapshots.

✅ Command Execution Safety (GOOD)

Location: internal/snapshot/apfs_darwin.go

All command executions use exec.Command with separate args (no shell invocation):

cmd := exec.Command("cp", "-c", "-R", "-p", workspacePath, clonePath)
cmd := exec.Command("diskutil", "info", mountPoint)

No injection risk as arguments aren't shell-interpreted.

2. Race Conditions & Concurrency

✅ ProcConnectorTracer Synchronization (EXCELLENT)

Location: internal/trace/tracer_linux.go

The synchronization design is careful and correct:

Mutex Strategy:

mu sync.Mutex: Guards callbacks, started/stopped state
pidMu sync.RWMutex: Guards shared PID map (allows concurrent reads)

Key Pattern - Event Emission:

func (t *ProcConnectorTracer) emitEvent(event ExecEvent) {
    t.mu.Lock()
    if t.stopped {
        t.mu.Unlock()
        return
    }
    // Copy callbacks under lock
    cbs := make([]func(ExecEvent), len(t.callbacks))
    copy(cbs, t.callbacks)

    // Non-blocking channel send
    select {
    case t.events <- event:
    default:
        t.droppedEvents++
    }
    t.mu.Unlock()

    // Invoke callbacks OUTSIDE lock to prevent deadlock
    for _, cb := range cbs {
        cb(event)
    }
}

Strengths:

✅ Minimal critical section duration
✅ Callback invocation outside lock (prevents deadlocks)
✅ Non-blocking channel send with drop counter
✅ RWMutex for PID tracking (concurrent reads)

Test Coverage: tracer_regression_test.go includes 8 regression tests:

Concurrent Stop() calls
Double-Stop() safety
Concurrent OnExec() and Emit() (100 iterations)
Event dropping when channel full

3. Error Handling & Edge Cases

✅ File Descriptor Management (EXCELLENT)

Location: internal/snapshot/archive.go:138

for info.Mode().IsRegular() {
    f, err := os.Open(path)
    if err != nil {
        return fmt.Errorf("open file %s: %w", relPath, err)
    }
    _, copyErr := io.Copy(tw, f)
    f.Close() // Close immediately, not deferred, to avoid accumulating file handles
    if copyErr != nil {
        return fmt.Errorf("copy file content %s: %w", relPath, copyErr)
    }
}

Closes immediately instead of deferred to avoid fd leaks in loops - subtle but important.

✅ Archive Restore Error Recovery (EXCELLENT)

Location: internal/snapshot/archive.go:156-211

The restore operation preserves .git across destructive restore with best-effort recovery on failures:

// Backup .git before cleaning workspace
gitBackup := gitDir + ".backup"
if err := os.Rename(gitDir, gitBackup); err != nil {
    return fmt.Errorf("backup .git directory: %w", err)
}

// Extract archive
if err := b.RestoreTo(nativeRef, workspacePath); err != nil {
    if gitBackup != "" {
        _ = os.Rename(gitBackup, gitDir) // Best-effort recovery
    }
    return fmt.Errorf("extract archive: %w", err)
}

Test: TestArchiveBackendPreservesGitDirOnRestore validates this behavior.

✅ Engine Metadata Consistency (EXCELLENT)

Location: internal/snapshot/engine.go:103-133

Snapshot creation is atomic - if metadata save fails, the snapshot is cleaned up:

nativeRef, err := e.backend.Create(e.workspace, id)
if err != nil {
    return Metadata{}, fmt.Errorf("backend create: %w", err)
}

meta := Metadata{...}
e.snapshots[id] = meta

if err := e.saveMetadata(); err != nil {
    // Clean up the snapshot if we can't save metadata
    _ = e.backend.Delete(nativeRef)
    delete(e.snapshots, id)
    return Metadata{}, fmt.Errorf("save metadata: %w", err)
}

Prevents orphaned snapshots.

4. Platform-Specific Code Correctness

✅ Build Tags (EXCELLENT)

Proper platform separation:

File	Build Tag	Purpose
`apfs_darwin.go`	`//go:build darwin`	APFS implementation
`apfs_stub.go`	`//go:build !darwin`	Stub for non-macOS
`tracer_linux.go`	`//go:build linux`	Netlink implementation
`tracer_darwin.go`	`//go:build darwin`	sysctl polling
`tracer_stub.go`	(no tag)	Universal fallback

✅ macOS Architecture Handling (EXCELLENT)

Location: internal/trace/tracer_darwin.go:27-42

Correctly handles ARM64 and x86_64 with different kinfo_proc offsets:

func init() {
    switch runtime.GOARCH {
    case "arm64":
        procSize = 648
        pidOffset = 72
        ppidOffset = 76
        commOffset = 243
        startOffset = 128
    default: // amd64/x86_64
        procSize = 492
        pidOffset = 68
        ppidOffset = 72
        commOffset = 163
        startOffset = 120
    }
}

5. Test Coverage Quality

✅ Archive Backend Tests (EXCELLENT)

File: internal/snapshot/archive_test.go

11 test functions covering:

Basic create/restore/delete
.gitignore exclusion (including nested .gitignore files)
Custom exclude patterns
.git preservation across restore
Security: Symlink path traversal (5 attack scenarios)
File permissions preservation
Empty workspace edge case

Helper function createMaliciousArchive for security testing.

✅ Tracer Regression Tests (EXCELLENT)

File: internal/trace/tracer_regression_test.go

8 tests specifically for concurrency issues:

Multiple concurrent Stop() calls
Sequential double-stop
Race between emit and stop (100 iterations)
Concurrent callback registration
Channel overflow handling
Double-start prevention
Stop before start
Start after stop

6. Code Quality & Best Practices

✅ Error Messages (EXCELLENT - per CLAUDE.md)

return fmt.Errorf("create netlink socket: %w (requires CAP_NET_ADMIN or root)", err)
return fmt.Errorf("invalid symlink in archive: target escapes destination: %s -> %s",
                  header.Name, header.Linkname)
return fmt.Errorf("corrupted snapshot metadata at %s: %w\nTo reset, delete the file: rm %s",
                 path, err, path)

✅ Specific and actionable
✅ Include context (e.g., CAP_NET_ADMIN requirement)
✅ Chain errors with %w
✅ Provide recovery instructions

✅ Code Style (EXCELLENT)

Follows Go conventions
Consistent naming
Comments explain "why" not "what"
Interface compliance checks: var _ Backend = (*ArchiveBackend)(nil)

✅ Documentation (EXCELLENT)

Package-level doc.go files
CLI commands have Long descriptions with examples
Inline comments for non-obvious logic

Summary

Ready for Merge ✅

This PR demonstrates strong security engineering with no critical vulnerabilities found.

Category	Status	Notes
Symlink Path Traversal	✅ Excellent	Thoroughly validated, comprehensive test coverage
Command Injection	✅ Good	Uses exec.Command safely
Race Conditions	✅ Excellent	Careful mutex usage, regression tests
Decompression Bombs	✅ Good	1GB per-file limit
File Descriptor Leaks	✅ Excellent	Immediate close in loops
Error Handling	✅ Excellent	Comprehensive recovery, clear messages
Platform Code	✅ Excellent	Proper build tags, architecture handling
Test Coverage	✅ Excellent	11 archive tests, 8 regression tests
Code Style	✅ Excellent	Follows CLAUDE.md guidelines

Optional Improvements (non-blocking):

Consider documenting the path traversal protections in a security doc
Consider making the decompression bomb limit configurable
Consider adding benchmarks for large workspace snapshots

Great work on this implementation! The attention to security details, comprehensive testing, and careful concurrency handling make this a high-quality addition to the codebase.

claude · 2026-01-21T00:54:53Z

Code Review: PR #36 - Workspace Snapshots and Process Tracing

I've completed a comprehensive review of this PR, examining security-critical components, race condition fixes, error handling, and test coverage. Overall, this is high-quality, security-conscious code that delivers well on its promises.

🟢 STRENGTHS

Security (Excellent)

Archive path traversal protections are robust and well-implemented (internal/snapshot/archive.go:252-307)
- Correct use of filepath.Rel() to validate paths before extraction
- Defense-in-depth symlink validation: rejects absolute symlinks AND validates relative targets
- Excellent test coverage in TestArchiveBackendSymlinkPathTraversal
Zip/decompression bomb mitigations: 100k file limit + 1GB per-file limit
APFS command injection prevention: rejects paths starting with -

Race Condition Handling (Excellent)

The tracer event emission code is textbook-perfect (internal/trace/tracer_linux.go:385-402, tracer_darwin.go:434-457):

Checks stopped flag while holding lock (prevents send-on-closed-channel)
Non-blocking send with select/default (prevents deadlock)
Copies callbacks under lock, invokes outside lock (prevents callback deadlock)
Tracks dropped events for observability

Comprehensive regression tests in tracer_regression_test.go with 100-iteration concurrent tests are excellent.

Error Messages (Excellent)

Perfect adherence to CLAUDE.md guidance: "Good error messages are documentation"

Example from engine.go:234:

return nil, fmt.Errorf("corrupted snapshot metadata at %s: %w\nTo reset, delete the file: rm %q", path, err, path)

Tells users exactly what's wrong AND how to fix it.

Test Coverage (Excellent)

729 lines of archive tests covering 16 scenarios
Security-focused tests for path traversal attacks
Race condition regression tests with concurrent execution
Integration tests with real process execution
go test -race ./... passes ✅

Go Best Practices (Excellent)

Interface compliance checks: var _ Backend = (*ArchiveBackend)(nil)
Consistent mutex patterns with defer mu.Unlock()
Proper build tags for platform-specific code
Clean stub implementations for unsupported platforms

🟡 ISSUES FOUND

🔴 Medium Severity: Rollback Safety Snapshot Accumulation

File: cmd/moat/cli/rollback.go:111-117

If creating the safety snapshot fails (e.g., disk full), rollback aborts. If the restore itself fails on a second attempt, another safety snapshot is created, potentially filling disk space with accumulating safety snapshots.

Recommendation: Check for existing safety snapshots and reuse them, or add a --skip-safety flag.

🟡 Low Severity: CLI Prune Returns Success on Partial Failure

File: cmd/moat/cli/snapshots.go:229-248

The prune command continues after deletion failures (reasonable for bulk operations) but returns nil even if some deletions failed.

Fix:

if failed > 0 {
    return fmt.Errorf("failed to delete %d of %d snapshots", failed, len(toDelete))
}
return nil

🟡 Low Severity: Darwin kinfo_proc Structure Offsets

File: internal/trace/tracer_darwin.go:19-42

Hard-coded structure offsets for kinfo_proc are architecture-specific AND macOS version-specific. No runtime validation that offsets are correct.

Concerns:

Silent data corruption if offsets are wrong
Binary parsing with binary.LittleEndian.Uint32 (lines 297, 300) could read garbage
Integration test acknowledges potential for incorrect results (line 53)

Recommendations:

Add runtime validation in Start() to verify offsets (e.g., check PID 1 exists with expected properties)
Document which macOS versions these offsets are valid for
Consider falling back to stub tracer if validation fails

🟢 Minor: APFS Path Validation Only Checks Base Name

File: internal/snapshot/apfs_darwin.go:35-40

if strings.HasPrefix(filepath.Base(workspacePath), "-") {
    return "", fmt.Errorf("invalid workspace path: name cannot start with -")
}

Only validates the final path component. A path like /valid/path/-malicious would pass validation. This is likely acceptable since paths are application-constructed, but full path validation would be more robust.

🔵 ENHANCEMENT SUGGESTIONS (Non-Blocking)

1. Atomic Metadata Writes

File: internal/snapshot/engine.go:262

Use write-to-temp-then-rename pattern to prevent metadata corruption if process is killed mid-write:

tmpPath := path + ".tmp"
if err := os.WriteFile(tmpPath, data, 0644); err != nil {
    return err
}
return os.Rename(tmpPath, path)

2. Total Snapshot Size Limit

Archive backend has file count (100k) and per-file size (1GB) limits, but no total snapshot size limit. Consider adding:

const maxSnapshotSize = 10 << 30 // 10GB

3. Tracer Metrics Exposure

The tracer tracks droppedEvents but only logs on Stop(). Consider exposing via Metrics() method for runtime observability.

4. Configurable Darwin Polling Interval

100ms polling = 10 syscalls/sec. Consider making configurable or using longer interval (200-500ms) for low-priority tracing.

📊 PERFORMANCE

✅ APFS Copy-on-Write: Using cp -c is correct - instant clones with no space overhead
✅ Archive Gitignore: Efficient single-pass tree walk with pattern building
✅ Linux Proc Connector: Event-driven netlink is far superior to /proc polling
⚠️ Darwin Polling: 100ms interval is reasonable but consider making configurable

📝 DOCUMENTATION

✅ Excellent inline documentation for privilege requirements, security model, and platform differences
✅ Good CLI help text with examples and explanations
✅ Comprehensive design docs in docs/plans/

✅ RECOMMENDATION: APPROVE WITH MINOR FIXES

This PR maintains the high code quality standards of the moat codebase. The snapshot implementation is well-architected, security-conscious, and thoroughly tested.

Required before merge:

Fix CLI prune command to return error on partial failure
Add validation for Darwin kinfo_proc offsets (or document known-working macOS versions)

Suggested enhancements (can be follow-up PRs):

Atomic metadata file writes
Total snapshot size limits
Tracer metrics exposure
Configurable Darwin polling interval

No critical bugs or security vulnerabilities found. The symlink path traversal fixes are solid, race conditions are properly handled, and test coverage is excellent. Great work! 🎉

claude · 2026-01-21T01:07:07Z

PR Review: Workspace Snapshots and Process Tracing

This is a substantial and well-implemented feature addition. Overall, the code quality is excellent with strong attention to security, error handling, and testing.

Summary

This PR adds two major features:

Workspace Snapshots: APFS-backed (macOS) and archive-backed snapshot system
Process Tracing: Platform-specific execution tracing via Linux proc connector and macOS sysctl polling

Strengths

Code Quality

Excellent security hardening: Path traversal prevention, symlink validation, command injection protection, zip bomb limits
Strong error handling: Detailed error messages with actionable guidance (per CLAUDE.md)
Platform-specific implementations: Clean use of build tags
Good observability: Dropped event tracking, structured logging
Comprehensive testing: Unit, integration, and regression tests

Architecture

Clean abstractions via Backend interface
Thread-safe operations with proper mutex usage
Graceful degradation with stub implementations

Issues & Recommendations

1. Security: Archive File Count Limit (Low)

Location: internal/snapshot/archive.go:215

maxArchiveFiles is 100,000 which seems high. Most projects have fewer files. Consider 25-50k for better zip bomb protection.

2. Prune Command Error Handling (Medium)

Location: cmd/moat/cli/snapshots.go:244

Returns error after partial success. Document this behavior in help text or make it a warning instead.

3. APFS Detection Robustness (Low)

Location: internal/snapshot/apfs_darwin.go:219-242

IsAPFS uses diskutil which could fail in restricted environments. Current fallback is safe but could have better error handling.

4. Documentation: APFS Evolution (Low)

Add comment explaining why cp -c vs tmutil (volume-level vs directory-level snapshots).

5. Cleanup Interval Configuration (Low)

Location: internal/trace/tracer_linux.go:69

60s cleanup interval is hardcoded. Consider making configurable for short-lived process tracking.

Performance

APFS cp -c: instant (COW)
Archive: streaming tar.gz (good)
Archive walks full tree: could be slow (consider progress feedback)
Linux tracer: event-driven (efficient)
Darwin tracer: 100ms polling (acceptable)

Test Coverage

Excellent coverage with unit, integration, regression, security, and edge case tests.

Security Review

Path Traversal: Validated with filepath.Rel
Command Injection: Checks paths dont start with -
Zip Bomb: 100k files, 1GB per file limits
Symlinks: Rejects absolute, validates relative
Privileges: Documents CAP_NET_ADMIN requirement

Final Verdict

Approve with minor suggestions

High-quality, production-ready code with strong security awareness, excellent error handling, clean architecture, and comprehensive testing. Issues are low-medium priority and dont block merging.

Optional follow-ups:

Add TESTING.md
Progress feedback for large archives
Document APFS backend rationale
Review maxArchiveFiles limit
Document prune partial failure behavior

Great work! The attention to security and edge cases is commendable.

claude · 2026-01-21T01:15:23Z

Code Review: PR #36 - Workspace Snapshots and Process Tracing

Overview

This PR adds comprehensive workspace snapshot functionality with APFS and archive backends, plus platform-specific process tracing. The implementation is substantial (9,364 lines added) with good test coverage. Overall, this is high-quality code with strong security practices, but there are several critical issues that should be addressed before merge.

🔴 Critical Issues

1. Command Injection Vulnerability in APFS Backend

File: internal/snapshot/apfs_darwin.go:35-40, 83-87, 159-163

The argument injection protection only checks filepath.Base(), which validates only the final path component. A path like /tmp/normal/../../-evil would pass validation but still inject arguments.

Recommendation: Use the -- argument separator with cp:

cmd := exec.Command("cp", "-c", "-R", "-p", "--", workspacePath, clonePath)

Or validate the full resolved path:

absPath, err := filepath.Abs(workspacePath)
if err != nil {
    return "", fmt.Errorf("invalid workspace path: %w", err)
}
if strings.Contains(absPath, "/-") {
    return "", fmt.Errorf("invalid workspace path: path components cannot start with -")
}

2. Potential DoS via Compression Bomb

File: internal/snapshot/archive.go:215, 278

While there's a 1GB per-file limit and 100,000 file count limit, there's no check for overall decompressed size. An attacker could create a tar.gz with 100,000 files × 1GB each = 100TB extraction.

Recommendation: Add cumulative size tracking:

const maxTotalSize = 10 << 30 // 10GB total
var totalWritten int64

// In extraction loop:
if totalWritten > maxTotalSize {
    return fmt.Errorf("archive exceeds maximum size limit (%d bytes)", maxTotalSize)
}
written, err := io.Copy(f, io.LimitReader(tr, 1<<30))
totalWritten += written

3. Integer Overflow Risk in Darwin Tracer

File: internal/trace/tracer_darwin.go:308-311, 326

The type conversions have //nolint:gosec comments but no validation. If kernel structures change, this could produce invalid values.

Recommendation: Add bounds checking:

pidRaw := binary.LittleEndian.Uint32(buf[pidOffset:])
if pidRaw > math.MaxInt32 {
    return processInfo{} // Invalid PID
}
info.pid = int(int32(pidRaw))

🟡 High Priority Issues

4. Snapshot ID Collision Risk

File: internal/snapshot/snapshot.go:39-45

4 bytes (32 bits) provides only ~4B unique IDs, giving 50% collision probability after ~65,000 snapshots (birthday paradox). The timestamp fallback is also non-deterministic.

Recommendation: Use 8 bytes for IDs:

func NewID() string {
    b := make([]byte, 8) // 64 bits = better collision resistance
    if _, err := rand.Read(b); err != nil {
        return fmt.Sprintf("snap_%016x", time.Now().UnixNano())
    }
    return "snap_" + hex.EncodeToString(b)
}

5. Missing Metadata Corruption Recovery

File: internal/snapshot/engine.go:232-235

If metadata is corrupted, the entire snapshot system fails, even though actual snapshot data still exists on disk.

Recommendation: Add automatic recovery that rescans the snapshot directory and rebuilds metadata from discovered snapshots.

6. Linux Tracer Memory Leak

File: internal/trace/tracer_linux.go:317-327

In cleanupStalePIDs(), if os.Stat() fails with an error other than IsNotExist (e.g., permission denied), the PID remains tracked indefinitely.

Fix:

if _, err := os.Stat(procPath); err != nil {
    // Remove PID on any error - if we can't check, assume it's gone
    delete(t.trackedPIDs, pid)
}

7. Config Defaults Applied When Snapshots Disabled

File: internal/config/config.go:159-165

Snapshot defaults are applied even when cfg.Snapshots.Disabled == true.

Fix:

if !cfg.Snapshots.Disabled {
    if cfg.Snapshots.Triggers.IdleThresholdSeconds == 0 {
        cfg.Snapshots.Triggers.IdleThresholdSeconds = 30
    }
    // ... other defaults
}

🔵 Code Quality Improvements

8. Error Messages Leak Path Information

File: internal/snapshot/archive.go:234

Per CLAUDE.md, error messages should be actionable without exposing unnecessary internals:

// Instead of: "corrupted snapshot metadata at /full/path: rm /full/path"
// Use: "corrupted snapshot metadata: moat snapshots prune <run-id>"

9. Inconsistent CLI Error Handling

Files: cmd/moat/cli/rollback.go:49-51, others

CLI commands duplicate existence checks that the storage layer already performs. Delegate to storage:

store, err := storage.NewRunStore(baseDir, runID)
if err != nil {
    return fmt.Errorf("run %s not found or inaccessible: %w", runID, err)
}

10. Hardcoded Architecture Offsets

File: internal/trace/tracer_darwin.go:31-45

Document which macOS versions the offsets were tested against:

// kinfo_proc offsets for macOS 13+ (Ventura)
// Validated on: macOS 13.0, 14.0 (Sonoma), 15.0 (Sequoia)

✅ Positive Observations

Excellent security: Comprehensive symlink path traversal prevention with tests
Proper concurrency: All tracers correctly handle locks to prevent deadlocks
Good error messages: Generally follows CLAUDE.md guidance
Clean platform abstraction: Build tags properly separate Darwin/Linux/stub implementations
Strong regression tests: tracer_regression_test.go provides excellent race condition coverage
Defensive programming: PID 1 validation catches offset mismatches early

📊 Test Coverage Gaps

Missing edge case tests for:

Snapshot pruning with pre-run snapshots (preserving them)
Archive backend with deeply nested directories (max path depth)
APFS backend cleanup on cp -c failure
Concurrent snapshot creation (race conditions in Engine.Create())

⚡ Performance Considerations

Darwin tracer: 100ms poll interval may be aggressive for battery devices - consider making configurable
Archive creation: No parallelization for large workspaces - consider progress indication for >1GB
Metadata persistence: Saves entire list on every create/delete - could cause I/O pressure with many snapshots

📝 Conventions (CLAUDE.md)

✅ No Co-Authored-By in commit message
✅ Standard Go formatting
✅ Generic placeholder names in examples
⚠️ Some error messages leak path info (see issue #8)

Summary

Overall Assessment: High-quality implementation with strong security practices and comprehensive testing. The critical issues (#1-3) represent security/reliability risks that should be addressed before merge. Medium-priority issues are mostly robustness and UX improvements.

Recommendation: Request changes for critical issues, then approve with optional follow-up items.

Estimated effort:

Critical fixes: 2-3 hours
High priority: 4-5 hours
Code quality improvements: 3-4 hours (optional)

Add ExecEvent struct for capturing command execution events including: - Process info (PID, PPID, timestamp, working directory) - Command execution details (command, args, exit code, duration) - IsGitCommit() method to detect git commit operations - IsBuildCommand() method to detect common build commands Build detection supports npm, yarn, go, make, cargo, mvn, gradle with proper word boundary matching to avoid false positives (e.g., npm run build-docker is correctly not detected as a build command).

Implements ArchiveBackend that creates tar.gz snapshots with: - Gitignore pattern support via go-git library - Additional exclude patterns configuration - .git directory preservation during restore - Symlink and file permission preservation - Path traversal protection on extract

Add Engine type that provides high-level snapshot management: - Auto-detects APFS on macOS, falls back to archive backend - Persists metadata to snapshots.json for durability across restarts - Thread-safe Create/Restore/RestoreTo/Delete/List/Get operations - Supports ForceBackend option for testing specific backends - Passes through UseGitignore and Additional exclude patterns

- Reject absolute symlink targets (could point anywhere on filesystem) - Validate relative symlinks don't escape destination directory - Add comprehensive test cases for path traversal attack vectors - Use filepath.Rel to properly detect escape attempts via ../

…handling - Add stopped flag to all tracers to prevent double-close panics - Fix emitEvent race condition by holding lock during channel send - Add droppedEvents counter and logging for observability - Fix Darwin grandchild tracking to properly track all descendants - Improve Linux readLoop error handling with max consecutive errors limit - Add comprehensive regression tests for concurrent operations

- Add cleanupStalePIDs() to Linux tracer to handle missed EXIT events - Add started flag to StubTracer for consistency with other tracers - Fix misleading test comment that referenced old behavior

- Add TestStubTracerDoubleStart to verify Start() returns error when called twice - Add Linux-specific tests for cleanupStalePIDs function

- Fix errcheck: explicitly ignore os.Rename errors in recovery paths - Fix gosec G115: add nolint for safe integer conversions - Fix gosec G305/G110: add nolint with validation comments, limit copy size - Fix govet shadow: rename shadowed err variables - Fix prealloc: pre-allocate slices with known capacity - Fix staticcheck: remove unused metas slice in test

- Fix file close error handling in archive.go (check close errors) - Fix file close error handling in storage.go WriteExecEvent - APFS stub now returns ErrAPFSNotAvailable instead of silently succeeding - Add recovery guidance for corrupted snapshot metadata - Add consecutive error tracking to Darwin tracer (like Linux tracer)

…fault The APFS backend incorrectly uses tmutil (Time Machine) which creates volume-level snapshots, not directory-level snapshots. This means: - Snapshots are of the entire disk, not the workspace - Restore requires root privileges - Restore affects the entire volume, not just the workspace The correct approach for APFS workspace snapshots would be to use cp -c for copy-on-write directory cloning. Until that's implemented, default to the archive backend which correctly handles directory-level snapshots. The APFS backend is still available via ForceBackend option but with a warning comment.

Replace tmutil-based Time Machine snapshots with cp -c copy-on-write directory cloning. This is the correct approach for workspace snapshots: - tmutil creates volume-level snapshots (entire disk) - cp -c creates directory-level clones (exactly what we need) Benefits of cp -c on APFS: - Instant cloning (metadata-only operation) - Space-efficient (shared blocks until modified) - No Time Machine entitlements required - Works at directory granularity Changes: - APFSBackend.Create() now uses cp -c -R -p for COW cloning - APFSBackend.Restore() copies files back using cp -c - Engine auto-detects APFS and uses APFS backend when available - Updated tests for new cp -c based implementation - NewAPFSBackend() now takes snapshotDir parameter

Security improvements: - Add path validation in APFS backend to prevent argument injection (paths starting with "-" could be misinterpreted as flags) - Add file count limit (100k) in archive restore to prevent zip bombs Documentation: - Add privilege requirements documentation for Linux proc connector tracer - Quote path in metadata corruption error message for safe copy/paste Addresses feedback from Claude code review.

CLI: - Return error from prune command on partial deletion failure instead of silently succeeding Darwin tracer: - Add runtime validation of kinfo_proc structure offsets - Validates by checking PID 1 (launchd) has expected properties - Logs warning if validation fails (offsets may be wrong for macOS version)

- Rename ALL_CAPS constants to camelCase (revive var-naming) - Add nolint comments for syscall.Getpid() uint32 conversion (gosec G115) Also installed pre-commit hook to catch these issues locally.

- APFS backend: add "--" separator to all cp commands to prevent command injection via filenames starting with "-" - Archive backend: add total extracted size limit (10GB) to prevent compression bomb attacks that could exhaust disk space

Add tests verifying: - File count limit (100k files) triggers extraction failure - Archive limit constants exist with reasonable values These tests ensure the compression bomb protections are in place and catch any accidental removal of security limits.

claude · 2026-01-21T02:13:31Z

Code Review: PR #36 - Workspace Snapshots and Process Tracing

Executive Summary

I've completed a comprehensive review of this PR. Overall, this is high-quality code with excellent security considerations, thorough testing, and solid engineering practices.

Recommendation: ✅ APPROVE with minor suggestions

The code is production-ready. All critical security issues have been addressed, comprehensive tests are in place, and the implementation follows Go best practices.

Key Strengths 🎉

1. Security - EXCELLENT ⭐

Path traversal prevention: Comprehensive validation for both regular files AND symlinks (archive.go:256-261, 303-319)
Zip bomb protection: File count, per-file size, and total size limits with test coverage (archive.go:214-296)
Command injection prevention: Validates paths don't start with - and uses proper separators (apfs_darwin.go:34-40)
Security test coverage: 5 test cases for symlink attacks, comprehensive path traversal tests

2. Concurrency - EXCELLENT ⭐

The event emission pattern across all tracers is perfect - non-blocking channel sends, callbacks invoked outside locks, double-stop protection, and comprehensive regression tests.

3. Platform-Specific Code - EXCELLENT ⭐

Darwin: Brilliant PID 1 sanity check for offset validation (tracer_darwin.go:332-372)
Linux: Clear CAP_NET_ADMIN documentation and proper netlink usage
Build tags: Clean separation with stub fallback for unsupported platforms

4. User Experience - EXCELLENT ⭐

Safety snapshots: Automatic pre-rollback snapshots prevent data loss (rollback.go:111-117)
Error messages: Actionable with recovery instructions (engine.go:234)
CLI help: Clear examples and usage patterns

Minor Suggestions 💡

1. Timing Issue in Total Size Check (archive.go:282-285)

Check happens AFTER potentially exceeding limit. Suggest checking totalWritten+header.Size > maxArchiveTotalSize before writing.

2. Benign Race in Cleanup Timing (tracer_linux.go:211-214)

t.lastCleanup read without lock (updated under lock in cleanupStalePIDs). Benign race - worst case is slightly delayed cleanup. Consider adding comment explaining this is intentionally lockless.

3. Type Conversion Clarity (tracer_darwin.go:308-311)

Add comment explaining why int32 intermediate conversion is necessary.

Security Analysis

All Critical Attack Vectors Mitigated ✅

✅ Path traversal (files and symlinks)
✅ Zip bombs (decompression attacks)
✅ Command injection
✅ TOCTOU races (proper locking)

Security Score: 9.5/10

What Makes This PR Stand Out 🌟

Security-first mindset: Path traversal, zip bombs, command injection all properly addressed with tests
Concurrency expertise: Non-blocking channel sends, proper callback handling, comprehensive race tests
Regression testing: Tests for subtle issues like double-stop and concurrent operations
Platform awareness: Offset validation for Darwin, clear privilege docs for Linux
User safety: Safety snapshots before rollback with helpful undo instructions

Final Recommendation

✅ APPROVE

This PR is production-ready as-is. The minor suggestions are truly optional - the code already meets high standards for security, correctness, and maintainability.

Suggested Actions Before Merge:

Required: NONE

Optional:

Fix archive.go:282 total size check timing
Add comment for tracer_linux.go:212 lockless read
Consider optimizing double-walk in buildMatcher (low priority)

Excellent work on this PR! The attention to security details, thorough testing, and careful handling of platform-specific code demonstrates strong engineering expertise. 🎉

Copilot AI review requested due to automatic review settings January 20, 2026 20:01

Copilot started reviewing on behalf of andybons January 20, 2026 20:02 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

andybons added 18 commits January 20, 2026 21:05

docs(plans): add snapshot implementation plan

bf52fca

feat(snapshot): add snapshot types and backend interface

671959f

feat(trace): add execution event types with git/build detection

a64a912

feat(snapshot): add APFS backend for macOS

540a59b

feat(config): add snapshot and tracing configuration

bc3813e

feat(storage): add execution event storage

272cfd9

feat(cli): add moat snapshots command

b03f150

feat(cli): add moat snapshot command for manual snapshots

eae35dd

feat(cli): add moat rollback command with safety snapshots

8ac9ab0

feat(run): integrate snapshot engine into run lifecycle

c615725

feat(trace): add tracer interface and stub implementation

536e782

docs(plans): add tracer implementation plan

9868df8

feat(trace): add Linux proc connector tracer structure

03ec9ca

feat(trace): add netlink constants for proc connector

2626118

andybons added 21 commits January 20, 2026 21:07

feat(trace): implement netlink socket setup and subscription

b13b6e9

feat(trace): implement proc connector event reading and parsing

b8f815a

feat(trace): add macOS tracer using sysctl polling

ad12d2d

chore: tidy go.mod dependencies

f38376e

feat(trace): add platform-specific tracer factory function

c9fae9d

test(trace): add integration tests for platform tracers

28832ef

docs(trace): add package documentation

982f159

fix(trace): prevent double-close panic in StubTracer.Stop()

1d54a74

fix(trace): add periodic PID cleanup and StubTracer started flag

c53013c

- Add cleanupStalePIDs() to Linux tracer to handle missed EXIT events - Add started flag to StubTracer for consistency with other tracers - Fix misleading test comment that referenced old behavior

test(trace): add tests for double-start and PID cleanup

d7e5424

- Add TestStubTracerDoubleStart to verify Start() returns error when called twice - Add Linux-specific tests for cleanupStalePIDs function

style(trace): fix gofmt formatting

4ed5b36

fix(trace): address CI lint issues

8746c5b

- Rename ALL_CAPS constants to camelCase (revive var-naming) - Add nolint comments for syscall.Getpid() uint32 conversion (gosec G115) Also installed pre-commit hook to catch these issues locally.

fix(snapshot): address security review findings

f595402

- APFS backend: add "--" separator to all cp commands to prevent command injection via filenames starting with "-" - Archive backend: add total extracted size limit (10GB) to prevent compression bomb attacks that could exhaust disk space

andybons force-pushed the feat/snapshots branch from da690d3 to 1615725 Compare January 21, 2026 02:08

andybons merged commit 555965e into main Jan 21, 2026
4 checks passed

andybons deleted the feat/snapshots branch January 21, 2026 02:29

Conversation

andybons commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Components

Snapshots (internal/snapshot/)

Process Tracing (internal/trace/)

CLI Commands

Configuration

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jan 20, 2026

Code Review - PR #36: Workspace Snapshots & Process Tracing

Critical Issues (Must Fix)

1. Race Condition in .git Preservation ⚠️

2. Unhandled .git Restoration Errors ⚠️

High Priority Issues

3. APFS Stub Silently Succeeds on Non-Darwin

4. Metadata Corruption Not Recoverable

Medium Priority Issues

5. Event Emission Callback Race

6. Darwin Tracer Offset Assumptions Not Validated

Code Quality Issues

7. Error Ignored in Archive Cleanup

8. .git Symlink Not Handled

Test Coverage Gaps

Documentation Suggestions

Positive Highlights

Overall Assessment

Uh oh!

claude Bot commented Jan 20, 2026

Code Review: PR #36 - Workspace Snapshots and Process Tracing

Overview

Summary Assessment

Critical Findings

🟢 No Critical Issues Found

High-Priority Recommendations

1. Archive Restore Atomicity (Medium Priority)

2. Insufficient Concurrency Test Coverage (Medium Priority)

Security Review

✅ Strengths

⚠️ Minor Security Considerations

Code Quality Observations

✅ Excellent Patterns

📝 Improvement Opportunities

Performance Considerations

✅ Good Choices

💡 Potential Optimizations

Testing Assessment

✅ Well-Tested Areas

⚠️ Needs More Coverage

Documentation Quality

✅ Excellent

📝 Suggestions

Specific File Reviews

internal/snapshot/archive.go ✅

internal/snapshot/engine.go ✅

internal/trace/tracer_darwin.go ✅

internal/trace/tracer_linux.go ✅

cmd/moat/cli/*.go ✅

internal/config/config.go ✅

Dependencies

Recommendations Summary

Must Address Before Merge

Should Address Soon

Nice to Have

andybons commented Jan 20, 2026 •

edited

Loading

Snapshots (`internal/snapshot/`)

Process Tracing (`internal/trace/`)

`internal/snapshot/archive.go` ✅

`internal/snapshot/engine.go` ✅

`internal/trace/tracer_darwin.go` ✅

`internal/trace/tracer_linux.go` ✅

`cmd/moat/cli/*.go` ✅

`internal/config/config.go` ✅

`internal/snapshot/archive.go`

`internal/trace/event.go`

`internal/trace/tracer_darwin.go`

`cmd/moat/cli/rollback.go`