Skip to content

Fix Copilot runtime instrumentation during action extraction#3513

Merged
plengauer merged 4 commits into
mainfrom
copilot/fix-instrumentation-of-copilot-cloud-agent
May 29, 2026
Merged

Fix Copilot runtime instrumentation during action extraction#3513
plengauer merged 4 commits into
mainfrom
copilot/fix-instrumentation-of-copilot-cloud-agent

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 25, 2026

Copilot cloud-agent instrumentation could be skipped because the Copilot tar hook only patched scripts under ${RUNNER_TEMP}, while extracted *-action-* scripts are first present in the current working directory and only moved to ${RUNNER_TEMP} afterward.

  • Copilot script patching fixed for actual extraction flow

    • Updated agent.instrumentation.github.copilot.sh to patch both:
      • ./*-action-*/*/*.sh
      • ${RUNNER_TEMP}/*-action-*/*/*.sh
    • Added a file-exists guard so unmatched globs are safely ignored.
  • Alias registration behavior restored

    • Reverted to direct hook registration:
      _otel_alias_prepend tar _otel_inject_copilot

This keeps the fix focused on the real runtime path where Copilot setup scripts are extracted and executed.

@plengauer
Copy link
Copy Markdown
Owner

Hello, thanks for contributing for the first time!

Copilot AI changed the title [WIP] Fix instrumentation of copilot cloud agent Fix Copilot runtime instrumentation on cached startup path May 25, 2026
Copilot AI requested a review from plengauer May 25, 2026 15:21
Copy link
Copy Markdown
Owner

@plengauer plengauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot no, thats not it. first of all, drop the simulated test. second, if by cache u mean the caching behavior of this very project, then i dont see the reason it inteferes. if im missing it, expand you explanation (make it really for dummies) or start from scratch searching for the root cause. my thinking is it cant be it, because thats literally how all other special instrumentations work. i dont see why this one would be special / broken

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 25, 2026

@copilot no, thats not it. first of all, drop the simulated test. second, if by cache u mean the caching behavior of this very project, then i dont see the reason it inteferes. if im missing it, expan...

Addressed in 57b767a: I removed the simulated test.

Why this one is different: on cache hits, agent.sh reloads aliases from the cache file, and that cache is filtered by the startup hint tokens. Most special instrumentations still work because their command names are in that hint and get cached. Copilot’s hook targets tar used later in the downloaded action script, so tar is typically not in the hint-derived cache; then the cache-hit loader also strips literal _otel_alias_prepend ... lines from special instrumentation files, so that alias registration gets skipped unless called indirectly.

Copilot AI requested a review from plengauer May 25, 2026 15:44
Copy link
Copy Markdown
Owner

@plengauer plengauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot thanks for the explanation, i see what you mean now. but for this to be the root cause, there would have to be a cache in the first place. which would mean exact same executables, same path, .... and all the same for a cache to be actually active as per the cache definition. is that likely. look at the logs again.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 25, 2026

@copilot thanks for the explanation, i see what you mean now. but for this to be the root cause, there would have to be a cache in the first place. which would mean exact same executables, same path, ...

I re-checked the logs (including 26404485349 / 77724443974).

The cache conditions are likely met within the same job: the same instrumented shell path is reused many times (.../actions/instrument/job/bin/bash ...) and PATH is stable with the same injected .../actions/instrument/job/bin prefix. So this does not depend on cross-run reuse; it can hit during repeated startup in one run.

You’re right that the logs don’t print an explicit opentelemetry_shell_*_instrumentation_cache_* hit/miss line, so this is inferred from execution context rather than a direct cache-debug line.

Copilot AI requested a review from plengauer May 25, 2026 16:22
Copy link
Copy Markdown
Owner

@plengauer plengauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot i think you are on the wrong track. the path that is used for the cache key calculation is the path to the script, not to the shell executable. which means it will be different. compare the log and the code that is run (Its in the log) to the if checks here and figure out if it actually triggers

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 27, 2026

@copilot i think you are on the wrong track. the path that is used for the cache key calculation is the path to the script, not to the shell executable. which means it will be different. compare the l...

You’re right — on this run the cache-key path/hit is not the thing to rely on.

I checked run 26404485349 / job 77724443974 against the executed script: the Copilot step did run with the expected Copilot env (GITHUB_EVENT_NAME=dynamic, COPILOT_AGENT_RUNTIME_VERSION set, GITHUB_COPILOT_ACTION_DOWNLOAD_URL set, job=copilot), so the Copilot instrumentation condition itself is satisfiable.

The more direct drop-point is in the step wrapper: actions/instrument/job/decorate_action.sh loads instrumentation files via grep -v '_otel_alias_prepend ', so a literal _otel_alias_prepend ... registration line is stripped there before execution. That explains why this hook can be skipped even without needing a cache-hit explanation.

Copilot AI requested a review from plengauer May 27, 2026 16:12
Copy link
Copy Markdown
Owner

@plengauer plengauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot again, you are wrong. the decorate action does not instrument directly, it just calls a bash script where normal instrumentation is triggered. the fact that the decorate_action files only load a subset of insturmentation has nothing to do with it

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 27, 2026

@copilot again, you are wrong. the decorate action does not instrument directly, it just calls a bash script where normal instrumentation is triggered. the fact that the decorate_action files only loa...

You’re right. I rechecked the run path and fixed the actual drop point in 7f4f700.

The tar hook was only scanning ${RUNNER_TEMP} immediately after extraction, but at that moment the extracted *-action-* scripts can still be in the current working directory (before being moved). I updated the hook to patch both ./*-action-*/*/*.sh and ${RUNNER_TEMP}/*-action-*/*/*.sh, and added a file-exists guard for unmatched globs. I also reverted the indirect alias call back to direct _otel_alias_prepend tar _otel_inject_copilot.

Copilot AI changed the title Fix Copilot runtime instrumentation on cached startup path Fix Copilot runtime instrumentation during action extraction May 27, 2026
Copilot AI requested a review from plengauer May 27, 2026 20:02
@plengauer plengauer marked this pull request as ready for review May 27, 2026 20:03
@plengauer plengauer requested a review from moflwi as a code owner May 27, 2026 20:03
Copilot AI review requested due to automatic review settings May 27, 2026 20:03
@plengauer plengauer enabled auto-merge (squash) May 27, 2026 20:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Copilot runtime instrumentation by expanding the tar-extraction patching loop to scan both the current working directory and ${RUNNER_TEMP} for extracted action scripts, since Copilot scripts initially appear in ./ before being moved.

Changes:

  • Glob now covers ./*-action-*/*/*.sh in addition to ${RUNNER_TEMP}/*-action-*/*/*.sh.
  • Added [ -f "$script_file" ] || continue guard so unmatched globs don't fail.

@plengauer plengauer disabled auto-merge May 29, 2026 08:06
@plengauer plengauer merged commit 3020614 into main May 29, 2026
576 checks passed
@plengauer plengauer deleted the copilot/fix-instrumentation-of-copilot-cloud-agent branch May 29, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Instrumentaiton of copilot cloud agent is broken

3 participants