-
Notifications
You must be signed in to change notification settings - Fork 49
Closed as not planned
Closed as not planned
Copy link
Labels
Description
Context
Follow-up from PR #365 review (EPAM Phase D recommendation). Addresses I/O-heavy checks at monorepo scale (200+ deps, 1,500+ deployed files).
Problem
Three CI checks are disk I/O bound at scale:
| Check | Current complexity | At scale (1500 files) | Bottleneck |
|---|---|---|---|
content-integrity |
O(F × C) | 1,500 file reads + char scans | Disk I/O |
deployed-files-present |
O(F_total) | 1,500 stat() calls |
Disk I/O |
unmanaged-files |
O(G) | 500 stat + path ops | Disk I/O |
Proposed Optimizations
1. Incremental content scanning
Only scan deployed files whose mtime is newer than lockfile's generated_at timestamp. For PR-triggered CI, this typically limits the scan to 2-5 changed files instead of 1,500.
Estimated latency reduction: ~90% for typical CI runs.
2. Batched stat() calls in _check_deployed_files_present
Walk unique parent directories once via os.scandir() to build a set of existing files. Replace 1,500 individual stat() calls with ~20 scandir() calls + O(1) set lookups.
Estimated improvement: ~10x on network-mounted CI filesystems.
Acceptance Criteria
- Incremental scanning uses lockfile
generated_atas baseline -
os.scandir()batch approach for file presence checks - Benchmarks comparing before/after at 1K+ file scale
- Falls back to full scan when lockfile has no
generated_at
Reactions are currently unavailable