Skip to content

feat: add OpenTelemetry distributed tracing to api-proxy sidecar#3470

Merged
lpcox merged 3 commits into
mainfrom
copilot/feat-add-opentelemetry-tracing
May 20, 2026
Merged

feat: add OpenTelemetry distributed tracing to api-proxy sidecar#3470
lpcox merged 3 commits into
mainfrom
copilot/feat-add-opentelemetry-tracing

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 20, 2026

The api-proxy sidecar captures rich token-usage telemetry but none of it participates in the distributed trace that gh-aw establishes per workflow run. This adds OTEL tracing so every proxied LLM request emits a CLIENT span as a child of the parent workflow trace, enabling end-to-end observability from GitHub Actions → api-proxy → LLM provider.

New module: containers/api-proxy/otel.js

  • ProxyAwareOtlpExporter — custom OTLP/HTTP+JSON exporter that routes through HttpsProxyAgent (Squid). The standard OTLPTraceExporter cannot accept a proxy agent, so this serializes spans manually to OTLP proto/JSON.
  • FileSpanExporter — NDJSON fallback to /var/log/api-proxy/otel.jsonl when no endpoint is configured; mirrors the MCPG otel.jsonl pattern.
  • Activation is opt-in: tracing is a no-op until OTEL_EXPORTER_OTLP_ENDPOINT is set.
  • Parent context constructed from GITHUB_AW_OTEL_TRACE_ID + GITHUB_AW_OTEL_PARENT_SPAN_ID (W3C TraceContext).

Span design (GenAI semantic conventions)

Each proxied request → one api_proxy.{provider}.request CLIENT span:

gen_ai.system, gen_ai.request.model, gen_ai.response.model
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens
awf.cache_read_tokens, awf.cache_write_tokens
http.request.method, http.response.status_code, url.path
awf.request_id, awf.streaming, awf.provider

gen_ai.usage event emitted at span end with full token breakdown.

Integration points

  • proxy-request.js — starts/ends span around the proxy lifecycle; wires onUsage and onSpanEnd callbacks.
  • token-tracker-http.js — new onSpanEnd callback fires at the absolute end of finalizeTracking(). This is necessary because for gzip-compressed responses the decompressor's end event fires after proxyRes.on('end'), so naïvely ending the span in the response end handler would close it before token attributes are set.
  • server.js — calls otel.shutdown() in graceful shutdown to flush the batch processor.
  • src/services/api-proxy-service.ts — forwards OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_SERVICE_NAME (default awf-api-proxy), GITHUB_AW_OTEL_TRACE_ID, and GITHUB_AW_OTEL_PARENT_SPAN_ID into the container.

Docs

  • docs/api-proxy-sidecar.md — new "OpenTelemetry distributed tracing" section: activation modes, env vars, full span/attribute/event tables, Squid allowlist note, gh-aw workflow integration.
  • docs/awf-config-spec.md — normative §9.2 item 6: implementations MUST forward the five OTEL variables into the api-proxy sidecar container.

Tests

29 new unit tests in otel.test.js cover header parsing, span attributes/events/status, parent trace propagation (valid and invalid hex), OTLP serialization shape, ProxyAwareOtlpExporter URL normalization, FileSpanExporter error resilience, and graceful shutdown.

Note on _provider.getTracer() vs trace.getTracer(): The module uses _provider.getTracer() directly rather than the global trace API. The OTEL global registry rejects duplicate provider registration (second load silently returns false), which would cause trace.getTracer() to return a tracer bound to a stale provider. Bypassing the global API makes the tracer reliably bound to _provider regardless of registration state.

Copilot AI changed the title [WIP] Add OpenTelemetry tracing support to api-proxy sidecar feat: add OpenTelemetry distributed tracing to api-proxy sidecar May 20, 2026
Copilot AI requested a review from lpcox May 20, 2026 22:22
Copilot finished work on behalf of lpcox May 20, 2026 22:22
@lpcox lpcox marked this pull request as ready for review May 20, 2026 22:24
Copilot AI review requested due to automatic review settings May 20, 2026 22:24
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Documentation Preview

Documentation build failed for this PR. View logs.

Built from commit 077f433

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Coverage Regression Detected

This PR decreases test coverage. Please add tests to maintain coverage levels.

Overall Coverage

Metric Base PR Delta
Lines 95.73% 95.80% 📈 +0.07%
Statements 95.56% 95.63% 📈 +0.07%
Functions 96.86% 96.86% ➡️ +0.00%
Branches 89.42% 89.34% 📉 -0.08%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 83.0% → 85.6% (+2.54%) 83.0% → 85.6% (+2.54%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry distributed tracing to the containers/api-proxy sidecar so proxied LLM requests can be correlated as child spans of the parent workflow trace (GitHub Actions → api-proxy → provider).

Changes:

  • Introduces a new containers/api-proxy/otel.js module (custom OTLP/HTTP JSON export + file fallback) and accompanying unit tests.
  • Wires span lifecycle into the proxy request path and ensures spans end after token tracking finalization (including gzip decompression).
  • Forwards required OTEL env vars into the api-proxy sidecar and documents the tracing contract in config/docs.
Show a summary per file
File Description
src/services/api-proxy-service.ts Forwards OTEL env vars (endpoint/headers/service name + parent context) into the api-proxy container.
docs/awf-config-spec.md Specifies the OTEL env vars that implementations must forward to the api-proxy sidecar.
docs/api-proxy-sidecar.md Documents activation, env vars, and span attributes/events for api-proxy tracing.
containers/api-proxy/token-tracker-http.js Adds onSpanEnd callback to end spans only after token usage extraction completes.
containers/api-proxy/server.js Flushes OTEL provider on graceful shutdown.
containers/api-proxy/proxy-request.js Starts/ends OTEL spans around proxied requests and attaches token usage attributes.
containers/api-proxy/package.json Adds OpenTelemetry SDK dependencies for tracing support.
containers/api-proxy/package-lock.json Locks new OpenTelemetry dependency graph.
containers/api-proxy/otel.js New tracing module: span creation, OTLP/JSON serialization, proxy-aware exporter, file exporter.
containers/api-proxy/otel.test.js New unit tests covering header parsing, parent context, attributes/events/status, exporter behaviors.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files not reviewed (1)
  • containers/api-proxy/package-lock.json: Language not supported
  • Files reviewed: 9/10 changed files
  • Comments generated: 10

Comment thread docs/api-proxy-sidecar.md Outdated
Comment on lines +929 to +936
Tracing is **opt-in** and activated solely by the presence of `OTEL_EXPORTER_OTLP_ENDPOINT`.
When the variable is absent the api-proxy still records spans to a local NDJSON fallback file
(`/var/log/api-proxy/otel.jsonl`) with zero network overhead.

| Mode | When | Behaviour |
|------|------|-----------|
| **OTLP/HTTP export** | `OTEL_EXPORTER_OTLP_ENDPOINT` is set | Spans exported via HTTP POST routed through the Squid proxy (so the domain whitelist is respected). |
| **File fallback** | Endpoint not set | Spans appended as NDJSON to `/var/log/api-proxy/otel.jsonl`. |
Comment thread docs/api-proxy-sidecar.md Outdated
| Variable | Required | Description |
|----------|----------|-------------|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | No | OTLP/HTTP collector URL (e.g. `https://otel.example.com:4318`). Activates network export. Must be in the Squid allowlist. |
| `OTEL_EXPORTER_OTLP_HEADERS` | No | Comma-separated `key=value` auth headers (e.g. `Authorization=****** |
Comment thread docs/api-proxy-sidecar.md Outdated
Comment on lines +966 to +967
| `gen_ai.request.model` | Model from request body |
| `gen_ai.response.model` | Model from response |
Comment on lines +216 to +222
// Start OTEL span (no-op when OTEL is not configured).
const span = otel.startRequestSpan({
provider,
method: req.method,
path: sanitizeForLog(req.url),
requestId,
});
Comment thread containers/api-proxy/otel.js Outdated
Comment on lines +16 to +19
* - Otherwise: writes span NDJSON to /var/log/api-proxy/otel.jsonl as a
* local fallback (mirrors the MCPG /tmp/gh-aw/otel.jsonl pattern).
* - Completely no-op (zero overhead) when OTEL_EXPORTER_OTLP_ENDPOINT is
* unset AND file logging is unavailable.
Comment on lines +192 to +199
constructor({ url, headers, httpsProxy, resource }) {
const normalised = url.endsWith('/v1/traces')
? url
: `${url.replace(/\/$/, '')}/v1/traces`;
this._parsedUrl = new URL(normalised);
this._headers = headers || {};
this._agent = httpsProxy ? new HttpsProxyAgent(httpsProxy) : undefined;
this._resource = resource;
Comment on lines +225 to +235
const reqOptions = {
hostname: this._parsedUrl.hostname,
port,
path: this._parsedUrl.pathname,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': bodyBuf.length,
...this._headers,
},
};
Comment thread containers/api-proxy/otel.js Outdated
Comment on lines +238 to +247
const req = Transport.request(reqOptions, (res) => {
res.on('data', () => {});
res.on('end', () => {
const ok = res.statusCode >= 200 && res.statusCode < 300;
resultCallback({ code: ok ? 0 : 1 });
});
});
req.on('error', (err) => { resultCallback({ code: 1, error: err }); });
req.write(bodyBuf);
req.end();
Comment on lines 10 to 16
"dependencies": {
"@opentelemetry/api": "^1.9.1",
"@opentelemetry/exporter-trace-otlp-http": "^0.57.2",
"@opentelemetry/resources": "^1.30.1",
"@opentelemetry/sdk-trace-node": "^1.30.1",
"@opentelemetry/semantic-conventions": "^1.41.1",
"https-proxy-agent": "^7.0.6"
Comment thread containers/api-proxy/otel.test.js Outdated
Comment on lines +110 to +111
expect(otel._parseOtlpHeaders('Authorization=******')).toEqual({
Authorization: '******',
@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented May 20, 2026

@copilot addresss the review feedback

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot addresss the review feedback

Addressed in c38b7d3.

Updated the OTEL exporter normalization/request handling, added timeout + sync-error protection, ensured url.path uses pathname-only in proxy-request, fixed and expanded OTEL tests, removed the unused @opentelemetry/exporter-trace-otlp-http dependency, and corrected the docs inconsistencies/table formatting.

Copilot finished work on behalf of lpcox May 20, 2026 22:41
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Coverage Regression Detected

This PR decreases test coverage. Please add tests to maintain coverage levels.

Overall Coverage

Metric Base PR Delta
Lines 95.73% 95.80% 📈 +0.07%
Statements 95.56% 95.63% 📈 +0.07%
Functions 96.86% 96.86% ➡️ +0.00%
Branches 89.42% 89.34% 📉 -0.08%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 83.0% → 85.6% (+2.54%) 83.0% → 85.6% (+2.54%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

GitHub API: 2 PR entries confirmed in recent-prs.json
Playwright: GitHub.com title contains "GitHub"
File verify: smoke-test-claude-26194143139.txt exists

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
1. GitHub MCP (list PRs)
2. GitHub.com connectivity ⚠️ pre-step data unavailable (unexpanded template)
3. File write/read ⚠️ pre-step data unavailable (unexpanded template)
4. BYOK inference (this response)

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com

Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PASS (core BYOK inference path confirmed working)

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions
Copy link
Copy Markdown
Contributor

🔬 Smoke Test Results

Test Result
GitHub MCP connectivity ✅ PR listed successfully
GitHub.com HTTP connectivity ⚠️ Pre-step vars not expanded (template unresolved)
File write/read ⚠️ Pre-step vars not expanded (template unresolved)

PR: feat: add OpenTelemetry distributed tracing to api-proxy sidecar
Author: @Copilot | Assignees: @lpcox, @Copilot

Overall: PARTIAL — MCP connectivity confirmed; pre-computed test data was not injected (workflow template variables unexpanded).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

Codex Smoke Test

✅ Merged PRs: [Test Coverage] Add test coverage for build-config and predownload commands; Align log discovery with canonical Squid container constant
❌ Safe Inputs GH CLI: safeinputs-gh unavailable; queried PR titles via gh: feat: add OpenTelemetry distributed tracing to api-proxy sidecar; Optimize Security Guard workflow cost profile (Sonnet 4.5, 3-turn cap, early noop gate)
✅ Playwright, file write/read, discussion comment, build
❌ Tavily Web Search: no callable tools exposed
Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — PASS 🎉

Generated by Build Test Suite for issue #3470 · ● 5M ·

@github-actions
Copy link
Copy Markdown
Contributor

Smoke test complete.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.15.0 v22.22.3
Go go1.22.12 go1.22.12

Result: ⚠️ 1/3 tests passed — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

  • Redis PING: ❌ (timeout — no response on host.docker.internal:6379)
  • PostgreSQL pg_isready: ❌ (no response on host.docker.internal:5432)
  • PostgreSQL SELECT 1: ❌ (timeout)

host.docker.internal resolves to 172.17.0.1 but both ports are unreachable (connection timeout). The service containers do not appear to be running or are not accessible from this environment.

Overall: FAIL

🔌 Service connectivity validated by Smoke Services

@lpcox lpcox merged commit 78380ff into main May 20, 2026
66 of 70 checks passed
@lpcox lpcox deleted the copilot/feat-add-opentelemetry-tracing branch May 20, 2026 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add OpenTelemetry tracing support to api-proxy sidecar

3 participants