Skip to content

fix(scan): enforce path and size boundaries under partial failures #91

@danielewood

Description

@danielewood

Summary

scan currently has multiple boundary/resilience gaps that can cause unexpected file access, skipped data, or memory pressure.

Underlying problems covered by this issue:

  1. Symlinks are followed to targets outside the requested scan tree.
  2. Archive ingestion can bypass --max-file-size via symlink target mismatch and then os.ReadFile large targets.
  3. A single WalkDir callback error returns filepath.SkipDir, which can prune an entire subtree.

Why this matters

Users expect scan <path> to stay inside scope and degrade gracefully. Current behavior can miss certificates, process unintended files, or consume excessive memory.

Evidence

  • Symlink follow via os.Stat and processing resolved path: cmd/certkit/scan.go:121, cmd/certkit/scan.go:123, cmd/certkit/scan.go:162
  • Archive read uses os.ReadFile after size gate path that can be symlink-sensitive: cmd/certkit/scan.go:122, cmd/certkit/scan.go:133, cmd/certkit/scan.go:141
  • Directory walk error handling prunes subtree: cmd/certkit/scan.go:109, cmd/certkit/scan.go:112

Acceptance criteria

  • Scans are constrained to the requested root by default (or require explicit opt-in for following external symlink targets).
  • Size checks are enforced against the actual file content source for archive ingestion.
  • Walk errors skip only the affected entry where possible, and continue scanning siblings/subtrees safely.
  • Tests cover: external symlink target, symlink-to-large-archive, and partial walk permission error behavior.

Suggested approach

  • Canonicalize and validate resolved paths against scan root.
  • Use open/stat strategy that validates target size before full read for archives.
  • Replace subtree-pruning error behavior with per-entry skip + structured debug logging.

Dedupe notes

Checked existing issues before creating:

  • gh issue list --state open --limit 200 --json number,title,url,labels -> no open issues
  • gh search issues "is:open repo:sensiblebit/certkit" --limit 100 --json number,title,url,state -> no matches
    Classified as: new.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggoPull requests that update go code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions