Skip to content

feat(otel): capture true Execute-Agent-CLI start time so gh-aw.agent span excludes pre-agent overhead#31808

Merged
pelikhan merged 7 commits into
mainfrom
copilot/otel-advisor-capture-agent-cli-start-time
May 13, 2026
Merged

feat(otel): capture true Execute-Agent-CLI start time so gh-aw.agent span excludes pre-agent overhead#31808
pelikhan merged 7 commits into
mainfrom
copilot/otel-advisor-capture-agent-cli-start-time

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

✨ Enhancement

What does this improve?

The gh-aw.<job>.agent OTel span started from GITHUB_AW_OTEL_JOB_START_MS (end of actions/setup), silently including workspace audit + CLI proxy startup (often 10–60 s of Docker pull) in the "LLM execution" duration. A DevOps engineer had no way to isolate true LLM time.

Why is this valuable?

After this change, the gh-aw.<job>.agent span starts precisely when the Execute Agent CLI step begins on the runner host — before the AWF container launches but after all pre-agent steps complete.

Implementation approach:

  • New constant AgentCLIStartMsPath = "/tmp/gh-aw/agent_cli_start_ms.txt" (engine_output.go)

  • BuildAWFCommand (awf_helpers.go) — prepends a timestamp write as the very first host command in every generated run: block, before PathSetup, before the AWF container launches:

    printf '%s' "$(date +%s%3N)" > /tmp/gh-aw/agent_cli_start_ms.txt
  • Non-AWF paths — same printf line added as first command in the else-branch of all six engine files: claude_engine.go, codex_engine.go, gemini_engine.go, pi_engine.go, crush_engine.go, copilot_engine_execution.go

  • sendJobConclusionSpan (send_otlp_span.cjs) — replaces the hard-coded agentStartMs = options.startMs with a read of agent_cli_start_ms.txt, falling back to options.startMs when the file is absent or invalid (backward-compatible with older compiled workflows):

    const agentStartMs = (() => {
      try {
        const raw = fs.readFileSync("/tmp/gh-aw/agent_cli_start_ms.txt", "utf8").trim();
        const ms = Number(raw);
        return Number.isFinite(ms) && ms > 0 ? ms : options.startMs;
      } catch {
        return options.startMs;
      }
    })();
  • Golden files updated for TestWasmGolden_CompileFixtures; new TestBuildAWFCommand_WritesAgentCLIStartTimestamp Go test verifies ordering (timestamp → PathSetup → AWF); three new JS tests cover the happy path, absent-file fallback, and invalid-content fallback.

Copilot AI and others added 2 commits May 12, 2026 23:08
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Write epoch-ms timestamp at the very beginning of the Execute Agent
CLI step (on the host, before the AWF container launches) so the
dedicated gh-aw.<job>.agent span excludes pre-agent overhead such as
workspace audit and CLI proxy startup.

Changes:
- Add AgentCLIStartMsPath constant (/tmp/gh-aw/agent_cli_start_ms.txt)
- BuildAWFCommand: prepend printf timestamp write as first host command
- Non-AWF paths (claude/codex/gemini/pi/crush/copilot engines): same
- sendJobConclusionSpan: read agent_cli_start_ms.txt as agentStartMs,
  fall back to options.startMs when absent or invalid
- Tests: 3 new JS tests + 1 new Go test (with 2 sub-tests)
- Update golden files for TestWasmGolden_CompileFixtures

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve OTel span to capture true Execute-Agent-CLI start time feat(otel): capture true Execute-Agent-CLI start time so gh-aw.agent span excludes pre-agent overhead May 12, 2026
Copilot AI requested a review from pelikhan May 12, 2026 23:18
@pelikhan pelikhan marked this pull request as ready for review May 12, 2026 23:20
Copilot AI review requested due to automatic review settings May 12, 2026 23:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts OpenTelemetry agent-span timing so gh-aw.<job>.agent starts at the true beginning of the “Execute Agent CLI” step (excluding pre-agent overhead like audits / proxy startup).

Changes:

  • Add AgentCLIStartMsPath and write an epoch-ms timestamp as the first command in generated Execute-Agent run blocks (AWF and non-AWF paths).
  • Update sendJobConclusionSpan to prefer /tmp/gh-aw/agent_cli_start_ms.txt for the dedicated agent span start time, with fallback to options.startMs.
  • Update workflow golden fixtures and add Go + JS tests covering ordering and fallback behaviors.
Show a summary per file
File Description
pkg/workflow/engine_output.go Introduces AgentCLIStartMsPath constant for host-written CLI start timestamp.
pkg/workflow/awf_helpers.go Prepends timestamp write as the first host command in AWF run: blocks.
pkg/workflow/claude_engine.go Adds timestamp write at start of non-AWF execution command.
pkg/workflow/codex_engine.go Adds timestamp write at start of non-AWF execution command (both harness and non-harness branches).
pkg/workflow/gemini_engine.go Adds timestamp write at start of non-AWF execution command.
pkg/workflow/pi_engine.go Adds timestamp write at start of non-AWF execution command.
pkg/workflow/crush_engine.go Adds timestamp write at start of non-AWF execution command.
pkg/workflow/copilot_engine_execution.go Adds timestamp write at start of non-AWF execution command.
actions/setup/js/send_otlp_span.cjs Reads CLI-start timestamp file to set dedicated agent span start time (with fallback).
actions/setup/js/send_otlp_span.test.cjs Adds tests for file-present and fallback behaviors for agent span start time.
pkg/workflow/awf_config_test.go Adds test ensuring timestamp write precedes PathSetup/config/AWF invocation in AWF command.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/with-imports.golden Updates golden output to include timestamp write line in run block.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/smoke-copilot.golden Updates golden output to include timestamp write line in run block.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/playwright-cli-mode.golden Updates golden output to include timestamp write line in run block.
pkg/workflow/testdata/TestWasmGolden_CompileFixtures/basic-copilot.golden Updates golden output to include timestamp write line in run block.
.github/workflows/technical-doc-writer.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/stale-repo-identifier.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/scout.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/safe-output-health.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/prompt-clustering-analysis.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/org-health-report.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/issue-arborist.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/glossary-maintainer.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/discussion-task-miner.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/deep-report.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/daily-safe-output-optimizer.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/daily-news.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/daily-issues-report.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/copilot-opt.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/cloclo.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/cli-version-checker.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/changeset.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/audit-workflows.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/api-consumption-report.lock.yml Regenerated lock workflow output (includes updated compiled content).
.github/workflows/agentic_commands.yml Updates embedded compiler version metadata for agentic commands payload.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 35/35 changed files
  • Comments generated: 1

process.env.GITHUB_AW_OTEL_TRACE_ID = "f".repeat(32);
process.env.GITHUB_AW_OTEL_PARENT_SPAN_ID = "abcdef1234567890";

// The CLI start time is earlier than the job start time passed as options.startMs,
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 12, 2026

@copilot review all comments

Fixed in 5b0d218 — updated the comment from "earlier" to "later" to correctly reflect that cliStartMs (1_700_000_030_000) is 30 s after jobStartMs (1_700_000_000_000).

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile.

Copilot AI and others added 3 commits May 12, 2026 23:52
…capture-agent-cli-start-time

# Conflicts:
#	.github/workflows/agentic_commands.yml
#	.github/workflows/api-consumption-report.lock.yml
#	.github/workflows/audit-workflows.lock.yml
#	.github/workflows/changeset.lock.yml
#	.github/workflows/cli-version-checker.lock.yml
#	.github/workflows/cloclo.lock.yml
#	.github/workflows/copilot-opt.lock.yml
#	.github/workflows/daily-cache-strategy-analyzer.lock.yml
#	.github/workflows/daily-fact.lock.yml
#	.github/workflows/daily-issues-report.lock.yml
#	.github/workflows/daily-news.lock.yml
#	.github/workflows/daily-observability-report.lock.yml
#	.github/workflows/daily-safe-output-optimizer.lock.yml
#	.github/workflows/deep-report.lock.yml
#	.github/workflows/discussion-task-miner.lock.yml
#	.github/workflows/duplicate-code-detector.lock.yml
#	.github/workflows/glossary-maintainer.lock.yml
#	.github/workflows/grumpy-reviewer.lock.yml
#	.github/workflows/issue-arborist.lock.yml
#	.github/workflows/necromancer.lock.yml
#	.github/workflows/org-health-report.lock.yml
#	.github/workflows/prompt-clustering-analysis.lock.yml
#	.github/workflows/safe-output-health.lock.yml
#	.github/workflows/schema-feature-coverage.lock.yml
#	.github/workflows/scout.lock.yml
#	.github/workflows/smoke-call-workflow.lock.yml
#	.github/workflows/smoke-codex.lock.yml
#	.github/workflows/stale-repo-identifier.lock.yml
#	.github/workflows/technical-doc-writer.lock.yml

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 12, 2026

@copilot merge main and recompile.

Done — merged origin/main, resolved conflicts in the generated lock files, then recompiled all 223 workflows with a clean binary. The updated lock files and agentic_commands.yml are in the latest commits.

@pelikhan pelikhan merged commit 6fcd6e2 into main May 13, 2026
@pelikhan pelikhan deleted the copilot/otel-advisor-capture-agent-cli-start-time branch May 13, 2026 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[otel-advisor] OTel improvement: capture true Execute-Agent-CLI start time so gh-aw.(job).agent span excludes pre-agent overhead

3 participants