Skip to content

fix: replace hardcoded GCS cache keys with monthly-rotating dynamic keys#1297

Open
kunal-10-cloud wants to merge 3 commits intomalariagen:masterfrom
kunal-10-cloud:fix/issue-1296-dynamic-gcs-cache-key
Open

fix: replace hardcoded GCS cache keys with monthly-rotating dynamic keys#1297
kunal-10-cloud wants to merge 3 commits intomalariagen:masterfrom
kunal-10-cloud:fix/issue-1296-dynamic-gcs-cache-key

Conversation

@kunal-10-cloud
Copy link
Copy Markdown
Contributor

Summary

Closes #1296

Both integration_tests.yml and notebooks.yml used a hardcoded GCS cache key (gcs_cache_integration_tests_20240922 / gcs_cache_notebooks_20240922) that has not been updated since September 22, 2024. Since GitHub Actions caches are immutable once created, the actions/cache/save step was effectively a no-op on every run, and integration tests have been validating against data that is ~18 months stale.

This PR replaces the static keys with dynamically generated keys based on the current year and month (date +%Y%m), ensuring the cache rotates automatically on the first CI run of each month.

Changes

  • .github/workflows/integration_tests.yml

    • Added a Generate cache key step that outputs gcs_cache_integration_tests_YYYYMM
    • Updated both actions/cache/restore and actions/cache/save to use the dynamic key
    • Added restore-keys prefix fallback (gcs_cache_integration_tests_) so that a previous month's cache can still be used as a warm start when the new month's cache hasn't been saved yet
  • .github/workflows/notebooks.yml

    • Same changes: dynamic key generation, updated cache steps, and restore-keys prefix fallback

How it works

  1. On the first run of a new month, restore-keys falls back to the most recent prior cache (warm start).
  2. After the run completes, actions/cache/save writes a new cache entry under the new month's key.
  3. Subsequent runs in the same month hit the exact key (fast restore, save is a no-op as expected).
  4. The cycle repeats automatically each month — no manual key bumps required.

Why monthly rotation

  • Frequent enough to pick up new sample sets, metadata changes, and schema updates (e.g., Pf9 added in March 2026)
  • Infrequent enough to keep cache hit rates high for the majority of runs within a month
  • Zero maintenance — no manual date bumps or PRs needed to rotate the cache

Test plan

  • All 1159 unit tests pass locally (0 failures, 8 skipped)
  • Ruff lint clean
  • Verify on next master push that the cache key shows gcs_cache_integration_tests_202604 (or current month)
  • Verify restore-keys fallback works when no exact match exists (first run of the month)

…eys (malariagen#1296)

Replace the stale hardcoded cache key `gcs_cache_integration_tests_20240922`
(unchanged since September 2024) with a dynamic key based on the current
year-month. This ensures the GCS cache is refreshed monthly so that
integration tests and notebook CI runs validate against current data.

Changes:
- integration_tests.yml: generate cache key via `date +%Y%m`, add
  restore-keys prefix fallback for partial hits
- notebooks.yml: same dynamic key and restore-keys pattern
Copilot AI review requested due to automatic review settings April 14, 2026 15:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CI workflows to avoid immutable, long-lived GitHub Actions caches by generating monthly-rotating cache keys for the gcs_cache directory used during integration tests and notebook execution.

Changes:

  • Add a step to generate a monthly cache key (gcs_cache_*_YYYYMM) using date +%Y%m.
  • Update actions/cache/restore and actions/cache/save steps to use the generated key.
  • Add restore-keys prefix fallback to allow restoring an older cache when the exact monthly key doesn’t exist yet.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
.github/workflows/integration_tests.yml Generates a monthly cache key and uses it for restore/save of the gcs_cache directory (adds restore prefix fallback).
.github/workflows/notebooks.yml Mirrors the integration test workflow change: monthly cache key generation + restore/save updates (adds restore prefix fallback).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +45
- name: Generate cache key
id: cache-key
run: echo "key=gcs_cache_integration_tests_$(date +%Y%m)" >> "$GITHUB_OUTPUT"

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step id cache-key contains a hyphen, which cannot be referenced via dot-notation in expressions. ${{ steps.cache-key.outputs.key }} will be parsed incorrectly and the workflow may fail to evaluate the expression. Rename the step id to something like cache_key (and update references) or use bracket notation (steps['cache-key'].outputs.key).

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +41
- name: Generate cache key
id: cache-key
run: echo "key=gcs_cache_notebooks_$(date +%Y%m)" >> "$GITHUB_OUTPUT"

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step id cache-key contains a hyphen, which cannot be referenced via dot-notation in expressions. ${{ steps.cache-key.outputs.key }} will be parsed incorrectly and the workflow may fail to evaluate the expression. Rename the step id to something like cache_key (and update references) or use bracket notation (steps['cache-key'].outputs.key).

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +52
restore-keys: |
gcs_cache_integration_tests_
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore-keys will cause the first run after this change to restore the old September 2024 cache (it matches the gcs_cache_integration_tests_ prefix). Because gcs_cache is used by fsspec simplecache (which doesn’t revalidate cached files by default), this can result in re-saving the same stale data under the new monthly key, so the intended cache refresh may never actually happen. Consider either removing restore-keys (force a cold start once per month) or scoping the fallback to the current year/month (e.g., generate a gcs_cache_integration_tests_YYYY prefix output) so you don’t pull in pre-rotation caches.

Suggested change
restore-keys: |
gcs_cache_integration_tests_

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +48
restore-keys: |
gcs_cache_notebooks_
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore-keys will cause the first run after this change to restore the old September 2024 cache (it matches the gcs_cache_notebooks_ prefix). Because gcs_cache is used by fsspec simplecache (which doesn’t revalidate cached files by default), this can result in re-saving the same stale data under the new monthly key, so the intended cache refresh may never actually happen. Consider either removing restore-keys (force a cold start once per month) or scoping the fallback to the current year/month (e.g., generate a gcs_cache_notebooks_YYYY prefix output) so you don’t pull in pre-rotation caches.

Suggested change
restore-keys: |
gcs_cache_notebooks_

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration Test and Notebook CI Workflows Use Hardcoded GCS Cache Key From September 2024

2 participants