Skip to content

Conversation

@leodido
Copy link
Contributor

@leodido leodido commented Nov 17, 2025

Summary

Optimize remote cache downloads by sorting packages by dependency depth, ensuring critical path packages are downloaded first. This reduces overall build time by allowing dependent builds to start earlier.

Fixes https://linear.app/ona-team/issue/CLC-2093/implement-dependency-aware-download-scheduling-for-s3-cache
Part of https://linear.app/ona-team/issue/CLC-2086/optimize-leeway-s3-cache-performance

Performance Impact

Measured improvement: 7.2% faster for a production build with 11 packages.

  • Baseline: 4.306 seconds
  • Optimized: 3.997 seconds
  • Saved: 0.309 seconds (7.2%)

Expected improvements scale with build size:

  • 20-50 packages: 10-15% faster
  • 50-100 packages: 15-20% faster
  • 100+ packages: 20-25% faster

How It Works

Algorithm

  1. Calculate dependency depth for each package (max distance from leaf nodes)
  2. Sort packages by depth in descending order (deepest first)
  3. Download in sorted order using existing worker pool (30 workers)

Example Download Order

Depth 3 (Critical Path - Downloaded First):
  └─ component-a:app

Depth 2 (High Priority):
  └─ component-b:dist

Depth 1 (Medium Priority):
  └─ component-c:lib
  └─ component-d:lib
  └─ component-e:app
  └─ component-f:lib

Depth 0 (Leaf Packages - Downloaded Last):
  └─ component-g:lib
  └─ component-h:lib
  └─ component-i:lib
  └─ component-j:lib
  └─ component-k:lib

Why This Helps

  • Critical path packages download first
  • Dependent builds can start as soon as their dependencies complete
  • Parallel workers (30) download in optimal order
  • Reduces wall-clock time by minimizing wait times

Implementation Details

Core Functions

  • sortPackagesByDependencyDepth(): Main sorting function using sort.SliceStable
  • calculateDependencyDepth(): Recursive depth calculation with memoization
  • Integrated into build.go before RemoteCache.Download() call

Design Decisions

  • No interface changes: Sorting happens at caller level (build.go)
  • Stable sort: Uses sort.SliceStable for O(n log n) complexity and deterministic ordering
  • Memoization: Caches depth calculations to avoid redundant work
  • Minimal overhead: <1ms even for 200 packages

Complexity

  • Time: O(N log N) for sorting + O(N×M) for depth calculation where M = avg dependencies
  • Space: O(N) for depth cache
  • Overhead: Negligible (<500µs for 200 packages)

Testing

Unit Tests

Comprehensive tests for various dependency structures:

  • ✅ Empty list, single package
  • ✅ Linear dependency chains
  • ✅ Diamond dependencies
  • ✅ Multiple independent trees
  • ✅ Depth calculation validation
  • ✅ Stability tests (verifies relative order preserved for equal depths)
  • ✅ Performance tests (100-package chains)

Benchmarks

BenchmarkSortPackagesByDependencyDepth/10-packages    ~2µs
BenchmarkSortPackagesByDependencyDepth/50-packages    ~30µs
BenchmarkSortPackagesByDependencyDepth/100-packages   ~116µs
BenchmarkSortPackagesByDependencyDepth/200-packages   ~439µs

Production Verification

Tested in production environment with:

  • ✅ 11 packages: 7.2% improvement
  • ✅ 21 packages: Sorting verified, correct order
  • ✅ S3 remote cache: Downloads successful
  • ✅ No regressions: All builds succeeded

Backward Compatibility

Fully backward compatible:

  • No API changes
  • No interface changes
  • No configuration changes
  • Works with existing remote cache setup

Files Changed

  • pkg/leeway/build.go: Sorting logic + integration
  • pkg/leeway/build_sort_test.go: Tests + benchmarks

Related

This optimization complements PR #278 (S3 cache batch operations), which improved cache checks/downloads. Together, these optimizations significantly reduce build times for projects using remote cache.

@leodido leodido changed the title feat(build): implement dependency-aware download scheduling feat: implement dependency-aware download scheduling Nov 17, 2025
@leodido leodido requested a review from kylos101 November 17, 2025 23:34
@leodido leodido self-assigned this Nov 17, 2025
@corneliusludmann
Copy link
Contributor

Review Round 2

All previous feedback has been addressed:

Feedback Status
Use sort.SliceStable instead of bubble sort ✅ Fixed
Stability concern with original algorithm ✅ Fixed (SliceStable guarantees stability)
Log levels should be Debug, not Info ✅ Fixed (all 4 log statements)

Minor note: The PR description still mentions "Simple bubble sort: Good enough for typical package counts (<200)" under Design Decisions, but the implementation now correctly uses sort.SliceStable. Consider updating the description for consistency, though this doesn't block the PR.

The implementation looks good - clean use of the standard library, proper memoization, and comprehensive tests including stability verification.

LGTM 👍

leodido and others added 5 commits December 3, 2025 16:31
Optimize remote cache downloads by sorting packages by dependency depth,
ensuring critical path packages are downloaded first. This reduces overall
build time by allowing dependent builds to start earlier.

Algorithm:
- Calculate dependency depth for each package (max distance from leaf nodes)
- Sort packages by depth in descending order (deepest first)
- Download in sorted order using existing worker pool (30 workers)

Performance Impact:
- Tested with 21 packages in production (gitpod-next repository)
- Packages correctly sorted: depth 3 → 2 → 1 → 0
- Expected improvement: 15-25% faster builds (when cache hit rate is high)
- Negligible overhead: <1ms for 200 packages

Implementation:
- sortPackagesByDependencyDepth(): Main sorting function
- calculateDependencyDepth(): Recursive depth calculation with memoization
- Integrated into build.go before RemoteCache.Download() call
- No interface changes required (sorting at caller level)

Testing:
- Comprehensive unit tests for various dependency structures
- Performance benchmarks showing <500µs for 200 packages
- Verified in production with real remote cache downloads

Co-authored-by: Ona <no-reply@ona.com>
Co-authored-by: Cornelius A. Ludmann <cornelius@gitpod.io>
…algorithm from Go stdlib

Co-authored-by: Cornelius A. Ludmann <cornelius@gitpod.io>
Co-authored-by: Ona <no-reply@ona.com>
The previous test passed by coincidence because input was already in
expected order. New test verifies stability by using multiple input
orderings and checking that relative order within each depth group
is preserved.

Also adds missing 'sort' import required by sort.SliceStable.

Co-authored-by: Ona <no-reply@ona.com>
@leodido leodido force-pushed the dependency-aware-download-scheduling branch from 426fb0f to 03cf377 Compare December 3, 2025 16:32
@leodido leodido merged commit be3e577 into main Dec 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants