fix: replace hardcoded GCS cache keys with monthly-rotating dynamic keys#1297
fix: replace hardcoded GCS cache keys with monthly-rotating dynamic keys#1297kunal-10-cloud wants to merge 3 commits intomalariagen:masterfrom
Conversation
…eys (malariagen#1296) Replace the stale hardcoded cache key `gcs_cache_integration_tests_20240922` (unchanged since September 2024) with a dynamic key based on the current year-month. This ensures the GCS cache is refreshed monthly so that integration tests and notebook CI runs validate against current data. Changes: - integration_tests.yml: generate cache key via `date +%Y%m`, add restore-keys prefix fallback for partial hits - notebooks.yml: same dynamic key and restore-keys pattern
There was a problem hiding this comment.
Pull request overview
This PR updates the CI workflows to avoid immutable, long-lived GitHub Actions caches by generating monthly-rotating cache keys for the gcs_cache directory used during integration tests and notebook execution.
Changes:
- Add a step to generate a monthly cache key (
gcs_cache_*_YYYYMM) usingdate +%Y%m. - Update
actions/cache/restoreandactions/cache/savesteps to use the generated key. - Add
restore-keysprefix fallback to allow restoring an older cache when the exact monthly key doesn’t exist yet.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/workflows/integration_tests.yml |
Generates a monthly cache key and uses it for restore/save of the gcs_cache directory (adds restore prefix fallback). |
.github/workflows/notebooks.yml |
Mirrors the integration test workflow change: monthly cache key generation + restore/save updates (adds restore prefix fallback). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Generate cache key | ||
| id: cache-key | ||
| run: echo "key=gcs_cache_integration_tests_$(date +%Y%m)" >> "$GITHUB_OUTPUT" | ||
|
|
There was a problem hiding this comment.
The step id cache-key contains a hyphen, which cannot be referenced via dot-notation in expressions. ${{ steps.cache-key.outputs.key }} will be parsed incorrectly and the workflow may fail to evaluate the expression. Rename the step id to something like cache_key (and update references) or use bracket notation (steps['cache-key'].outputs.key).
| - name: Generate cache key | ||
| id: cache-key | ||
| run: echo "key=gcs_cache_notebooks_$(date +%Y%m)" >> "$GITHUB_OUTPUT" | ||
|
|
There was a problem hiding this comment.
The step id cache-key contains a hyphen, which cannot be referenced via dot-notation in expressions. ${{ steps.cache-key.outputs.key }} will be parsed incorrectly and the workflow may fail to evaluate the expression. Rename the step id to something like cache_key (and update references) or use bracket notation (steps['cache-key'].outputs.key).
| restore-keys: | | ||
| gcs_cache_integration_tests_ |
There was a problem hiding this comment.
restore-keys will cause the first run after this change to restore the old September 2024 cache (it matches the gcs_cache_integration_tests_ prefix). Because gcs_cache is used by fsspec simplecache (which doesn’t revalidate cached files by default), this can result in re-saving the same stale data under the new monthly key, so the intended cache refresh may never actually happen. Consider either removing restore-keys (force a cold start once per month) or scoping the fallback to the current year/month (e.g., generate a gcs_cache_integration_tests_YYYY prefix output) so you don’t pull in pre-rotation caches.
| restore-keys: | | |
| gcs_cache_integration_tests_ |
| restore-keys: | | ||
| gcs_cache_notebooks_ |
There was a problem hiding this comment.
restore-keys will cause the first run after this change to restore the old September 2024 cache (it matches the gcs_cache_notebooks_ prefix). Because gcs_cache is used by fsspec simplecache (which doesn’t revalidate cached files by default), this can result in re-saving the same stale data under the new monthly key, so the intended cache refresh may never actually happen. Consider either removing restore-keys (force a cold start once per month) or scoping the fallback to the current year/month (e.g., generate a gcs_cache_notebooks_YYYY prefix output) so you don’t pull in pre-rotation caches.
| restore-keys: | | |
| gcs_cache_notebooks_ |
Summary
Closes #1296
Both
integration_tests.ymlandnotebooks.ymlused a hardcoded GCS cache key (gcs_cache_integration_tests_20240922/gcs_cache_notebooks_20240922) that has not been updated since September 22, 2024. Since GitHub Actions caches are immutable once created, theactions/cache/savestep was effectively a no-op on every run, and integration tests have been validating against data that is ~18 months stale.This PR replaces the static keys with dynamically generated keys based on the current year and month (
date +%Y%m), ensuring the cache rotates automatically on the first CI run of each month.Changes
.github/workflows/integration_tests.ymlGenerate cache keystep that outputsgcs_cache_integration_tests_YYYYMMactions/cache/restoreandactions/cache/saveto use the dynamic keyrestore-keysprefix fallback (gcs_cache_integration_tests_) so that a previous month's cache can still be used as a warm start when the new month's cache hasn't been saved yet.github/workflows/notebooks.ymlrestore-keysprefix fallbackHow it works
restore-keysfalls back to the most recent prior cache (warm start).actions/cache/savewrites a new cache entry under the new month's key.Why monthly rotation
Test plan
masterpush that the cache key showsgcs_cache_integration_tests_202604(or current month)restore-keysfallback works when no exact match exists (first run of the month)