🐛 Fix OOM: shard coverage into 4 parallel jobs#4181
Conversation
Single-process coverage OOMs at 5.5GB on 7GB runners even with maxWorkers=1. Now shards tests into 4 parallel jobs (~2000 tests each, ~4GB heap), merges coverage with nyc, then updates badge. Signed-off-by: Andrew Anderson <andy@clubanderson.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for kubestellarconsole canceled.
|
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
|
Thank you for your contribution! Your PR has been merged. Check out what's new:
Stay connected: Slack #kubestellar-dev | Multi-Cluster Survey |
There was a problem hiding this comment.
Pull request overview
This PR updates the repository’s scheduled coverage workflow to avoid GitHub Actions runner OOMs by splitting the Vitest coverage run into multiple parallel shards and then merging the resulting coverage outputs into a single report used to update the coverage badge gist.
Changes:
- Shard the Vitest coverage run into 4 parallel jobs (matrix) with lower per-job memory limits.
- Upload per-shard coverage artifacts and add a follow-up job to download and merge them into a single coverage summary via
nyc. - Update artifact naming/output to reflect merged coverage reporting.
| # Use nyc to merge and report | ||
| npx nyc report \ | ||
| --temp-dir .nyc_output \ | ||
| --report-dir coverage \ | ||
| --reporter=json-summary \ | ||
| --reporter=text |
There was a problem hiding this comment.
npx nyc report relies on nyc being available, but web/package.json doesn’t include nyc as a dependency. In GitHub Actions this can cause npx to fetch an arbitrary latest nyc from the registry (or fail if network is flaky), making coverage merges non-reproducible. Add nyc to web devDependencies (pinned) and run the local binary so merges are deterministic.
| NODE_VERSION: '22' | ||
| GIST_ID: 'b9a9ae8469f1897a22d5a40629bc1e82' | ||
| TOTAL_SHARDS: 4 | ||
|
|
There was a problem hiding this comment.
TOTAL_SHARDS is defined in env, but the shard matrix is hard-coded to [1, 2, 3, 4]. This duplication can drift (e.g., updating TOTAL_SHARDS but forgetting to update the matrix), breaking sharding/merge assumptions. Consider defining the shard count once (e.g., YAML anchor or generating the matrix from a single JSON string) so these stay in sync.
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: coverage-shard-${{ matrix.shard }} | ||
| path: web/coverage/coverage-final.json |
There was a problem hiding this comment.
The merge job will proceed and update the badge even if one or more shard coverage artifacts are missing (e.g., shard timeout/OOM leading to no coverage-final.json), because it merges whatever files it finds. If the badge should reflect full-suite coverage, consider setting if-no-files-found: error on the shard artifact upload and/or failing/skipping the merge when the number of found shard files doesn’t equal TOTAL_SHARDS.
| path: web/coverage/coverage-final.json | |
| path: web/coverage/coverage-final.json | |
| if-no-files-found: error |
🔄 Auto-Applying Copilot Code ReviewCopilot code review found 1 code suggestion(s) and 2 general comment(s). @copilot Please apply all of the following code review suggestions:
Also address these general comments:
Push all fixes in a single commit. Run Auto-generated by copilot-review-apply workflow. |
Single-process OOMs at 5.5GB. Shards into 4 parallel runners (~2K tests each), merges with nyc, updates badge.