Skip to content

feat(setup): add toolcache resolver for Copilot CLI behind feature flag#34918

Draft
salmanmkc wants to merge 6 commits into
mainfrom
salmanmkc/setup-copilot-resolver
Draft

feat(setup): add toolcache resolver for Copilot CLI behind feature flag#34918
salmanmkc wants to merge 6 commits into
mainfrom
salmanmkc/setup-copilot-resolver

Conversation

@salmanmkc
Copy link
Copy Markdown
Collaborator

@salmanmkc salmanmkc commented May 26, 2026

Summary

Adds a Copilot CLI toolcache resolver to actions/setup so workflows running the Copilot engine pick up a compatible CLI baked into the runner image instead of running npm install -g @github/copilot at job start.

Why

Runtime npm install -g @github/copilot adds ~10s to every workflow run and depends on the npm registry being reachable. Hosted runners now bake @github/copilot into the tool cache ($RUNNER_TOOL_CACHE/copilot-cli/<version>/<arch>/). This change lets every Copilot-engine workflow pick up the cached binary, with a safe fallback when no compatible cached version is present.

How it works

For any workflow whose engine is copilot:

  1. The compiler emits one extra env var on the setup step: INPUT_GH_AW_VERSION=<compiler version>.
  2. setup.sh sees INPUT_GH_AW_VERSION is set and invokes the resolver, actions/setup/js/install_copilot_cli.cjs.
  3. The resolver:
    • Fetches the live compat matrix from gh-aw-actions/.github/aw/compat.json (5s timeout via AbortSignal.timeout).
    • Falls back to the bundled actions/setup/compat.json on any network/parse error.
    • Picks the first matrix row whose max-gh-aw covers the current compiler version (* always matches).
    • Selects the highest cached version in [min-agent, max-agent] under $RUNNER_TOOL_CACHE/copilot-cli/.
    • On a hit: appends <dir>/bin to $GITHUB_PATH, sets step outputs copilot-cached=true and copilot-path=<dir>.
    • On any miss / error / unparseable input: sets copilot-cached=false and exits 0.
  4. The compiler-emitted bash installer step is gated with if: steps.setup.outputs.copilot-cached != 'true', so it skips on a cache hit and runs as before on a miss.

The resolver uses Node 24 native fetch plus fs and path only — no new dependencies, no npm install.

Diff shape

  • actions/setup/action.yml — declare two new outputs: copilot-cached, copilot-path. No new inputs.
  • actions/setup/setup.sh — invoke the resolver when INPUT_GH_AW_VERSION is set. Wrapped in command -v node and || true so it cannot fail the setup step.
  • actions/setup/compat.json (new) — bundled fallback matrix.
  • actions/setup/js/install_copilot_cli.cjs (new) — the resolver (~270 LOC, zero deps).
  • actions/setup/js/install_copilot_cli.test.cjs (new) — 23 vitest cases.
  • pkg/workflow/copilot_engine_installation.gogateStepsOnCopilotCached helper wraps the Copilot install step with the if: gate.
  • pkg/workflow/compiler_yaml_step_generation.go — when engineID == "copilot", emit INPUT_GH_AW_VERSION on the setup step.

Backward compatibility

  • Non-Copilot workflows: compiler output is byte-identical. No env var emitted, resolver never invoked, no golden diff.
  • Copilot workflow, no cached version: resolver writes copilot-cached=false, bash installer runs as today.
  • Copilot workflow, node unavailable on a self-hosted runner: resolver is skipped (command -v node guard), bash installer runs as today.

Tests

  • Go: go test ./pkg/workflow/ -count=1 — passes after make update-golden and make update-wasm-golden.
  • Golden diff is confined to Copilot fixtures only: each setup step picks up INPUT_GH_AW_VERSION, each Copilot installer step picks up the if: gate.
  • JS: npx vitest run install_copilot_cli.test.cjs — 23 cases pass; covers semver parse/compare, matrix row matching, range selection, arch detection, and toolcache scanning (hit, miss, missing marker, missing binary, non-semver dir name, empty cache).

Adds an opt-in path that skips the runtime `npm install -g @github/copilot`
when a compatible build is already present in the runner tool cache.

Behavior changes:
- New `setup-copilot-resolver` feature flag (default off). When enabled in a
  workflow's frontmatter, the compiler emits `INPUT_INSTALL_COPILOT: 'true'`
  on the setup step and adds `if: steps.setup.outputs.copilot-cached != 'true'`
  to the compiler-emitted install step. With the flag off, the compiler emits
  identical YAML to before (verified: golden fixtures unchanged).
- New `actions/setup/js/install_copilot_cli.cjs` resolver runs from setup.sh
  when `INPUT_INSTALL_COPILOT=true`. On a hit, it appends the toolcache bin
  dir to $GITHUB_PATH and writes `copilot-cached=true`. On any miss or error
  (no toolcache entry, version out of range, network failure, malformed
  matrix, etc.) it writes `copilot-cached=false` and exits 0 so the existing
  bash installer runs as before.
- Resolver has zero npm dependencies (uses only fs/path/https) so it cannot
  introduce a new install step itself.
- Two new step outputs on the setup action: `copilot-cached` and
  `copilot-path`.

Resolution logic:
- Fetches compat matrix from gh-aw-actions main, falls back to bundled
  `actions/setup/compat.json` on any error (5s timeout).
- Picks the first matrix row whose `max-gh-aw` covers the current compiler
  version, then selects the highest cached version in
  [min-agent, max-agent].
- Toolcache layout matches runner-images convention:
  $RUNNER_TOOL_CACHE/copilot-cli/<version>/<arch>/{bin/copilot, ..}.complete

Tests: 23 vitest cases for the resolver covering semver parsing/comparison,
matrix row matching, range selection, arch detection, and toolcache scanning
(hit, miss, missing marker, missing binary, non-semver dir, empty cache).
All existing Go workflow tests pass unchanged.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

};
let req;
try {
req = https.get(COMPAT_URL, { timeout: FETCH_TIMEOUT_MS }, res => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fetch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3026304 — swapped https.get + buffer for fetch with AbortSignal.timeout(5000). Matches check_version_updates.cjs / send_otlp_span.cjs. Same contract (returns parsed JSON or null on any error, never throws); 23/23 unit tests still pass.

@pelikhan
Copy link
Copy Markdown
Collaborator

Just have it always on. It is more complicated to add a feature flag for this than anything.

Node 24 ships fetch globally, so the manual https.get + chunk buffer
+ timeout wiring isn't needed. Replace it with fetch + AbortSignal.timeout,
matching the convention in check_version_updates.cjs and send_otlp_span.cjs.

Same contract: returns parsed JSON on 2xx + valid JSON, returns null on
any error (network, timeout, non-200, parse failure). Never throws.
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

Hey @salmanmkc 👋 — thanks for putting together this toolcache resolver for the Copilot CLI setup step! The idea of skipping the npm install -g @github/copilot`` step when a compatible build is already baked into the runner image is a solid performance win, and the implementation is clearly well thought-out.

However, this repository has a specific contribution process for community members that this PR doesn't follow:

  • PRs from non-core-team members are not accepted directly. The core team (@dsyme, @eaftan, @pelikhan, @krzysztof-cieslak) uses agentic development to implement all changes. Community members are asked to open a detailed issue with an agentic plan describing what they want built — a core team member will then pick it up and implement it through an agent.

To get this idea into the codebase, please close this PR and open a GitHub Issue instead, describing the feature in detail (the why, the how, the constraints). Your write-up here is already excellent and would make a great agentic plan!

If you'd like a head start on drafting that issue, you can use this prompt with your AI assistant:

Create a detailed GitHub issue (agentic plan) for the gh-aw repository proposing an opt-in toolcache resolver for the Copilot CLI setup step.

The plan should cover:
1. Problem: `npm install -g `@github/copilot`` adds ~10s per workflow run and requires npm registry access. Hosted runners now bake the CLI into the tool cache.
2. Proposed solution: A new `setup-copilot-resolver` feature flag (default off). When enabled, the compiler emits extra env vars on the setup step; setup.sh invokes a zero-dependency Node.js CJS resolver that checks `$RUNNER_TOOL_CACHE/copilot-cli/<version>/<arch>/` against a compat matrix (fetched live with local fallback). On a cache hit the install step is gated with `if: steps.setup.outputs.copilot-cached != 'true'`; on any miss/error it falls back silently.
3. Files expected to change: actions/setup/action.yml, actions/setup/setup.sh, a new actions/setup/js/install_copilot_cli.cjs resolver, actions/setup/compat.json fallback matrix, pkg/constants/feature_constants.go, pkg/workflow/copilot_engine_installation.go, pkg/workflow/compiler_yaml_step_generation.go, and related compiler job builders.
4. Testing: unit tests for the resolver (happy path, cache miss, network error, version boundary) and Go tests for the feature-flag gating logic.
5. Rollout: flag off by default; golden fixtures must remain unchanged when flag is off.

Generated by ✅ Contribution Check · sonnet46 2.6M ·

Copy link
Copy Markdown
Collaborator

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type annotations and typecheck

const FETCH_TIMEOUT_MS = 5000;

function log(msg) {
console.log(`[install_copilot_cli] ${msg}`);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can require shim.cjs and use core.debug/info... for logging. Makes things more consistent for agents.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8a55a3a — require shim.cjs at the top, swapped console.log/console.error for core.info/core.warning. Same [install_copilot_cli] prefix preserved for grep-ability.

The setup-copilot-resolver feature flag added a layer of plumbing more
complex than the value it provided. Drop the flag, the boolean parameter
on generateSetupStep, and the dedicated INPUT_INSTALL_COPILOT env. The
compiler now emits INPUT_GH_AW_VERSION for any Copilot-engine workflow,
which setup.sh treats as the signal to invoke the resolver. The bash
installer step gate (skip on copilot-cached=true) is now always applied
for Copilot workflows, which is a no-op when the resolver reports a miss.

Non-Copilot workflows are unchanged: no env var emitted, no resolver run,
no golden diff.
@salmanmkc
Copy link
Copy Markdown
Collaborator Author

Done — dropped the feature flag entirely in fb6b0ef. The resolver now runs unconditionally for any Copilot-engine workflow (signal is the compiler-emitted INPUT_GH_AW_VERSION env). Non-Copilot workflows are unchanged (no env var, no resolver invocation, no golden diff). PR description updated to reflect the new shape.

@github-actions

This comment has been minimized.

@pelikhan
Copy link
Copy Markdown
Collaborator

Actually... my bad this feature is only needed to run in agent/detection jobs + copilot engine selected.

- Require shim.cjs for global core/context so logging routes through
  core.info / core.warning instead of console.log / console.error,
  matching the convention used across the other setup/js modules.
- Add // @ts-check directive (now passes tsc --noEmit with the existing
  strictNullChecks config) and JSDoc type annotations on every exported
  function plus the ParsedSemver and CompatRow shared typedefs.
- Tighten error narrowing in catch blocks (Error instance guards) and
  add explicit casts where the matrix payload is treated as unknown.

Resolver tests (23/23) and full typecheck still pass.
@github-actions

This comment has been minimized.

@salmanmkc
Copy link
Copy Markdown
Collaborator Author

Type annotations + typecheck addressed in 8a55a3a — added // @ts-check directive, JSDoc on every exported function, shared ParsedSemver and CompatRow typedefs, narrowed catch unknown to Error before reading .message. npm run typecheck is clean (the file was already in the tsconfig include glob; this just makes the contract explicit at the file level).

@github-actions

This comment has been minimized.

@salmanmkc
Copy link
Copy Markdown
Collaborator Author

On it — bringing the flag back as compiler-emitted INPUT_INSTALL_COPILOT: 'true' on the setup step, gated in setup.sh on that env. Compiler only emits it for jobs that actually run Copilot CLI (the main agent job and the threat-detection job, when the Copilot engine is selected). Other consumers of the setup action (cache, activation, pre_activation, safe-outputs, unlock, publish-assets, repo-memory, notify-comment, experiments) will not trigger the resolver. Pushing in a moment.

Re-introduce INPUT_INSTALL_COPILOT as the explicit opt-in for the
toolcache resolver in actions/setup/setup.sh. The compiler emits this
flag (alongside INPUT_GH_AW_VERSION) on the setup step env block only
for jobs that actually invoke the Copilot CLI: the main agent job and
the threat-detection job.

Other jobs that share the setup action (activation, pre-activation,
cache, unlock, safe-outputs, notify-comment, publish-assets,
repo-memory, experiments) opt out, so the resolver stays a no-op there.

Adds TestGenerateSetupStepEmitsInstallCopilotGate covering the three
cases: copilot engine + opt-in emits both env vars; copilot engine
without opt-in suppresses them; non-copilot engine ignores the opt-in.
@salmanmkc
Copy link
Copy Markdown
Collaborator Author

Pushed c806598:

  • Added installCopilot bool parameter to generateSetupStep; only compiler_main_job.go (agent) and threat_detection.go (detection) pass true. The other 9 callers pass false.
  • When installCopilot && setupEngineID == "copilot", the compiler emits both INPUT_INSTALL_COPILOT: 'true' and INPUT_GH_AW_VERSION: "..." on the setup step env block (script + dev/release modes).
  • setup.sh resolver gate restored to [ "${INPUT_INSTALL_COPILOT:-false}" = "true" ].
  • Goldens regenerated: both env vars now appear only on agent jobs.
  • Added TestGenerateSetupStepEmitsInstallCopilotGate (3 cases: copilot+opt-in, copilot+no-opt-in, non-copilot+opt-in). Full pkg/workflow test suite passes (260s).

Verified across 6 parallel review passes — no blockers; one acknowledged note that custom-command Copilot workflows (workflows that set engine.command:) will still receive the resolver opt-in but the resolver is a harmless no-op in that path since the custom script doesn't invoke the copilot binary. Happy to tighten further if you want.

@github-actions
Copy link
Copy Markdown
Contributor

✅ smoke-ci: safeoutputs CLI comment + comment-memory run (26456672392)

Generated by 🧪 Smoke CI for issue #34918 ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants