Problem
When a reusable workflow that imports skills via shared/apm.md is called from a downstream repo (i.e., the lock file lives in org/library, but the workflow runs in org/caller), the APM cache key collapses to a constant value across all APM-importing workflows in the library. As soon as the library has more than one such workflow with different packages:, they overwrite each other's cached bundles, and the consumer that lost the race ends up with Skill not found at runtime.
What the lock file generates
- id: apm_cache
uses: actions/cache/save@…
with:
key: apm-${{ needs.activation.outputs.engine_id }}-${{ hashFiles('.github/workflows/*.lock.yml') }}
path: /tmp/gh-aw/apm-workspace
hashFiles('.github/workflows/*.lock.yml') is evaluated against the caller's workspace — and most callers don't carry the library's lock files. They simply do:
jobs:
review:
uses: org/library/.github/workflows/foo.lock.yml@v1
so hashFiles(...) returns "" and the key resolves to literally apm-<engine>- with an empty trailing segment. Every APM-importing workflow in the library gets that same key.
Reproduction (real)
In elastic/docs-actions we have several reusable workflows that import skills via APM (docs-review, docs-frontmatter-sweep, docs-applies-to-sweep, docs-openings-sweep, docs-style-sweep). Each has a distinct packages: list. They are all invoked from elastic/docs-content, which has no *.lock.yml files in its own .github/workflows/.
In every run we observe (apm job, agent job, both):
key: apm-copilot-
Cache hit for: apm-copilot-
Cache restored from key: apm-copilot-
While only one APM-importing workflow existed, this was benign — same producer, same consumer, same bundle. After we added several more, the cached bundle on apm-copilot- is whichever workflow saved last. When the next workflow's agent job extracts that bundle, the skills it actually needs aren't there:
✗ skill(docs-check-style) Skill not found: docs-check-style
✗ skill(docs-flag-jargon-skill) Skill not found: docs-flag-jargon-skill
✗ skill(docs-frontmatter-audit) Skill not found: docs-frontmatter-audit
✗ skill(docs-content-type-checker) Skill not found: docs-content-type-checker
✗ skill(docs-applies-to-tagging) Skill not found: docs-applies-to-tagging
(Run: https://github.com/elastic/docs-content/actions/runs/25379248158, Docs AI / docs review / agent job — but you can see the same apm-copilot- key in any of our workflow runs.)
A docs-content PR run from late April with only one APM-importing workflow (docs-review, the same workflow currently failing) used the identical apm-copilot- key and worked, because no other workflow was overwriting that cache entry. So the regression is purely additive — adding a second APM-importing reusable workflow with a different package list silently breaks the first.
Suggested fix
Make the cache key reflect the bundle contents, not the caller's filesystem. Hashing AW_APM_PACKAGES (the inlined package list in the lock file) would be both correct and sufficient:
key: apm-${{ needs.activation.outputs.engine_id }}-${{ hashFiles('.github/workflows/*.lock.yml') || '' }}-${{ hashFiles_or_hash(AW_APM_PACKAGES) }}
Or simpler: include ${{ github.workflow }} (or a stable workflow_id if available) as a discriminator. Either approach prevents two workflows with different package lists from sharing a cache slot.
Workaround we are applying meanwhile
We are aligning every APM-importing workflow's packages: to the union of all skills, so that all five workflows pack the same bundle and cache collisions become benign. This works but is a maintenance tax — every new skill anywhere has to be added to every workflow.
Versions
- gh-aw: v0.71.1 (and v0.71.0, v0.71.4 — same key formula, all affected)
- caller:
elastic/docs-content
- library:
elastic/docs-actions
Problem
When a reusable workflow that imports skills via
shared/apm.mdis called from a downstream repo (i.e., the lock file lives inorg/library, but the workflow runs inorg/caller), the APM cache key collapses to a constant value across all APM-importing workflows in the library. As soon as the library has more than one such workflow with differentpackages:, they overwrite each other's cached bundles, and the consumer that lost the race ends up withSkill not foundat runtime.What the lock file generates
hashFiles('.github/workflows/*.lock.yml')is evaluated against the caller's workspace — and most callers don't carry the library's lock files. They simply do:so
hashFiles(...)returns""and the key resolves to literallyapm-<engine>-with an empty trailing segment. Every APM-importing workflow in the library gets that same key.Reproduction (real)
In
elastic/docs-actionswe have several reusable workflows that import skills via APM (docs-review,docs-frontmatter-sweep,docs-applies-to-sweep,docs-openings-sweep,docs-style-sweep). Each has a distinctpackages:list. They are all invoked fromelastic/docs-content, which has no*.lock.ymlfiles in its own.github/workflows/.In every run we observe (apm job, agent job, both):
While only one APM-importing workflow existed, this was benign — same producer, same consumer, same bundle. After we added several more, the cached bundle on
apm-copilot-is whichever workflow saved last. When the next workflow's agent job extracts that bundle, the skills it actually needs aren't there:(Run: https://github.com/elastic/docs-content/actions/runs/25379248158,
Docs AI / docs review / agentjob — but you can see the sameapm-copilot-key in any of our workflow runs.)A docs-content PR run from late April with only one APM-importing workflow (docs-review, the same workflow currently failing) used the identical
apm-copilot-key and worked, because no other workflow was overwriting that cache entry. So the regression is purely additive — adding a second APM-importing reusable workflow with a different package list silently breaks the first.Suggested fix
Make the cache key reflect the bundle contents, not the caller's filesystem. Hashing
AW_APM_PACKAGES(the inlined package list in the lock file) would be both correct and sufficient:Or simpler: include
${{ github.workflow }}(or a stableworkflow_idif available) as a discriminator. Either approach prevents two workflows with different package lists from sharing a cache slot.Workaround we are applying meanwhile
We are aligning every APM-importing workflow's
packages:to the union of all skills, so that all five workflows pack the same bundle and cache collisions become benign. This works but is a maintenance tax — every new skill anywhere has to be added to every workflow.Versions
elastic/docs-contentelastic/docs-actions