Skip to content

perf(policy): Incremental content scanning and batched stat() for monorepo-scale CI #377

@danielmeppiel

Description

@danielmeppiel

Context

Follow-up from PR #365 review (EPAM Phase D recommendation). Addresses I/O-heavy checks at monorepo scale (200+ deps, 1,500+ deployed files).

Problem

Three CI checks are disk I/O bound at scale:

Check Current complexity At scale (1500 files) Bottleneck
content-integrity O(F × C) 1,500 file reads + char scans Disk I/O
deployed-files-present O(F_total) 1,500 stat() calls Disk I/O
unmanaged-files O(G) 500 stat + path ops Disk I/O

Proposed Optimizations

1. Incremental content scanning

Only scan deployed files whose mtime is newer than lockfile's generated_at timestamp. For PR-triggered CI, this typically limits the scan to 2-5 changed files instead of 1,500.

Estimated latency reduction: ~90% for typical CI runs.

2. Batched stat() calls in _check_deployed_files_present

Walk unique parent directories once via os.scandir() to build a set of existing files. Replace 1,500 individual stat() calls with ~20 scandir() calls + O(1) set lookups.

Estimated improvement: ~10x on network-mounted CI filesystems.

Acceptance Criteria

  • Incremental scanning uses lockfile generated_at as baseline
  • os.scandir() batch approach for file presence checks
  • Benchmarks comparing before/after at 1K+ file scale
  • Falls back to full scan when lockfile has no generated_at

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions