Skip to content

Commit bc590bd

Browse files
authored
ci: use artifacts for e2e prep so job retries don't fail on cache eviction (#16310)
Switches the `e2e-prep` handoff from `actions/cache` to `actions/upload-artifact` + `actions/download-artifact`, so that re-running an individual failed E2E shard works without having to re-run the entire workflow (or push an empty commit to re-trigger CI, which is what we've been doing). This is also the pattern [recommended by GitHub](https://docs.github.com/en/actions/tutorials/store-and-share-data): cache is for dependencies reused across runs, artifacts are for passing data between jobs within a run. `e2e-prep` falls squarely into the second category. ### Why The Actions cache for this repo is permanently at its 10 GB limit, so the `e2e-prep-<sha>` entry gets evicted by LRU within a few hours of being saved. When that happens, any retry of a failed shard fails at the "Restore prepared test environment" step with `Failed to restore cache entry`, and the only recovery path is re-running the full workflow. Artifacts are scoped to the workflow run and not subject to the 10 GB cache budget, so they survive across attempts. Real example that motivated this: https://github.com/payloadcms/payload/actions/runs/24524398342/job/71809444756 ### Notes for reviewers - `include-hidden-files: true` is required because `test/node_modules` contains `.bin`, `.pnpm`, etc. Missing that flag silently produces a broken artifact. - Added an explicit "Verify prepared test environment" step with an actionable error, since `download-artifact` has no `fail-on-cache-miss` equivalent. - Left the default 90 day retention. This repo is public, so artifact storage is free and there's no reason to be aggressive. - Upload/download is slightly slower than cache (zip vs zstd), but the difference is in the order of tens of seconds on a job that takes minutes. Worth it for the reliability win. - `e2e-prep-<sha>` cache key is left behind on old runs; it will naturally age out on its own, no cleanup needed. This does not touch the separate `restore-build` cache used by `.github/actions/setup`. That one still has its own polling + full-build fallback, which is orthogonal to this change. Co-authored-by: German Jablonski <GermanJablo@users.noreply.github.com>
1 parent 1c1ed97 commit bc590bd

1 file changed

Lines changed: 32 additions & 19 deletions

File tree

.github/workflows/main.yml

Lines changed: 32 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -241,15 +241,17 @@ jobs:
241241
- name: Prepare prod test environment
242242
run: pnpm prepare-run-test-against-prod:ci
243243

244-
- name: Cache prepared test environment
245-
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
244+
# Tar first: upload-artifact is extremely slow with pnpm node_modules (many small files + symlinks).
245+
- name: Archive prepared test environment
246+
run: tar --zstd -cf e2e-prep.tar.zst test/packed test/node_modules test/package.json test/pnpm-lock.yaml
247+
248+
# Artifact (not cache) so that re-runs of individual E2E shards survive the 10 GB cache LRU eviction.
249+
- name: Upload prepared test environment
250+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
246251
with:
247-
path: |
248-
test/packed
249-
test/node_modules
250-
test/package.json
251-
test/pnpm-lock.yaml
252-
key: e2e-prep-${{ github.sha }}
252+
name: e2e-prep-${{ github.sha }}
253+
path: e2e-prep.tar.zst
254+
if-no-files-found: error
253255

254256
tests-e2e:
255257
runs-on: ubuntu-24.04
@@ -270,16 +272,27 @@ jobs:
270272
with:
271273
restore-build: true
272274

273-
- name: Restore prepared test environment
274-
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
275+
- name: Download prepared test environment
276+
uses: actions/download-artifact@37930b1c2abaa49bbe596cd826c3c89aef350131 # v7.0.0
275277
with:
276-
path: |
277-
test/packed
278-
test/node_modules
279-
test/package.json
280-
test/pnpm-lock.yaml
281-
key: e2e-prep-${{ github.sha }}
282-
fail-on-cache-miss: true
278+
name: e2e-prep-${{ github.sha }}
279+
path: .
280+
281+
- name: Extract prepared test environment
282+
shell: bash
283+
run: |
284+
if [ ! -f e2e-prep.tar.zst ]; then
285+
echo "::error::The 'e2e-prep-${{ github.sha }}' artifact did not contain e2e-prep.tar.zst."
286+
echo "::error::This usually means the 'E2E Prep' job did not run or failed to upload."
287+
echo "::error::Re-run the entire workflow so 'E2E Prep' regenerates the artifact."
288+
exit 1
289+
fi
290+
tar --zstd -xf e2e-prep.tar.zst
291+
rm e2e-prep.tar.zst
292+
if [ ! -d test/packed ] || [ ! -d test/node_modules ]; then
293+
echo "::error::Extracted artifact is missing expected directories (test/packed or test/node_modules)."
294+
exit 1
295+
fi
283296
284297
- name: Start services
285298
id: db
@@ -335,8 +348,8 @@ jobs:
335348
# To maintain serial execution for most suites, we pass --workers=1 when parallel=false.
336349
# Suites with parallel=true (e.g. LexicalFullyFeatured) use the default 16 workers.
337350
#
338-
# Note: The e2e-prep job runs prepare-run-test-against-prod:ci once and caches the result.
339-
# This job restores that cache, so we use test:e2e:prod:run which skips preparation.
351+
# Note: The e2e-prep job runs prepare-run-test-against-prod:ci once and uploads the result
352+
# as an artifact. This job downloads that artifact, so we use test:e2e:prod:run which skips preparation.
340353
run: PLAYWRIGHT_JSON_OUTPUT_NAME=results_${{ matrix.suite }}_${{ matrix.shard }}.json pnpm test:e2e:prod:run ${{ matrix.suite }} --shard=${{ matrix.shard }}/${{ matrix.total-shards }} --fully-parallel${{ matrix.parallel == false && ' --workers=1' || '' }}
341354
env:
342355
PLAYWRIGHT_JSON_OUTPUT_NAME: results_${{ matrix.suite }}_${{ matrix.shard }}.json

0 commit comments

Comments
 (0)