feat: add OpenTelemetry distributed tracing to api-proxy sidecar#3470
Conversation
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 95.73% | 95.80% | 📈 +0.07% |
| Statements | 95.56% | 95.63% | 📈 +0.07% |
| Functions | 96.86% | 96.86% | ➡️ +0.00% |
| Branches | 89.42% | 89.34% | 📉 -0.08% |
📁 Per-file Coverage Changes (1 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/config-writer.ts |
83.0% → 85.6% (+2.54%) | 83.0% → 85.6% (+2.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry distributed tracing to the containers/api-proxy sidecar so proxied LLM requests can be correlated as child spans of the parent workflow trace (GitHub Actions → api-proxy → provider).
Changes:
- Introduces a new
containers/api-proxy/otel.jsmodule (custom OTLP/HTTP JSON export + file fallback) and accompanying unit tests. - Wires span lifecycle into the proxy request path and ensures spans end after token tracking finalization (including gzip decompression).
- Forwards required OTEL env vars into the api-proxy sidecar and documents the tracing contract in config/docs.
Show a summary per file
| File | Description |
|---|---|
| src/services/api-proxy-service.ts | Forwards OTEL env vars (endpoint/headers/service name + parent context) into the api-proxy container. |
| docs/awf-config-spec.md | Specifies the OTEL env vars that implementations must forward to the api-proxy sidecar. |
| docs/api-proxy-sidecar.md | Documents activation, env vars, and span attributes/events for api-proxy tracing. |
| containers/api-proxy/token-tracker-http.js | Adds onSpanEnd callback to end spans only after token usage extraction completes. |
| containers/api-proxy/server.js | Flushes OTEL provider on graceful shutdown. |
| containers/api-proxy/proxy-request.js | Starts/ends OTEL spans around proxied requests and attaches token usage attributes. |
| containers/api-proxy/package.json | Adds OpenTelemetry SDK dependencies for tracing support. |
| containers/api-proxy/package-lock.json | Locks new OpenTelemetry dependency graph. |
| containers/api-proxy/otel.js | New tracing module: span creation, OTLP/JSON serialization, proxy-aware exporter, file exporter. |
| containers/api-proxy/otel.test.js | New unit tests covering header parsing, parent context, attributes/events/status, exporter behaviors. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- containers/api-proxy/package-lock.json: Language not supported
- Files reviewed: 9/10 changed files
- Comments generated: 10
| Tracing is **opt-in** and activated solely by the presence of `OTEL_EXPORTER_OTLP_ENDPOINT`. | ||
| When the variable is absent the api-proxy still records spans to a local NDJSON fallback file | ||
| (`/var/log/api-proxy/otel.jsonl`) with zero network overhead. | ||
|
|
||
| | Mode | When | Behaviour | | ||
| |------|------|-----------| | ||
| | **OTLP/HTTP export** | `OTEL_EXPORTER_OTLP_ENDPOINT` is set | Spans exported via HTTP POST routed through the Squid proxy (so the domain whitelist is respected). | | ||
| | **File fallback** | Endpoint not set | Spans appended as NDJSON to `/var/log/api-proxy/otel.jsonl`. | |
| | Variable | Required | Description | | ||
| |----------|----------|-------------| | ||
| | `OTEL_EXPORTER_OTLP_ENDPOINT` | No | OTLP/HTTP collector URL (e.g. `https://otel.example.com:4318`). Activates network export. Must be in the Squid allowlist. | | ||
| | `OTEL_EXPORTER_OTLP_HEADERS` | No | Comma-separated `key=value` auth headers (e.g. `Authorization=****** | |
| | `gen_ai.request.model` | Model from request body | | ||
| | `gen_ai.response.model` | Model from response | |
| // Start OTEL span (no-op when OTEL is not configured). | ||
| const span = otel.startRequestSpan({ | ||
| provider, | ||
| method: req.method, | ||
| path: sanitizeForLog(req.url), | ||
| requestId, | ||
| }); |
| * - Otherwise: writes span NDJSON to /var/log/api-proxy/otel.jsonl as a | ||
| * local fallback (mirrors the MCPG /tmp/gh-aw/otel.jsonl pattern). | ||
| * - Completely no-op (zero overhead) when OTEL_EXPORTER_OTLP_ENDPOINT is | ||
| * unset AND file logging is unavailable. |
| constructor({ url, headers, httpsProxy, resource }) { | ||
| const normalised = url.endsWith('/v1/traces') | ||
| ? url | ||
| : `${url.replace(/\/$/, '')}/v1/traces`; | ||
| this._parsedUrl = new URL(normalised); | ||
| this._headers = headers || {}; | ||
| this._agent = httpsProxy ? new HttpsProxyAgent(httpsProxy) : undefined; | ||
| this._resource = resource; |
| const reqOptions = { | ||
| hostname: this._parsedUrl.hostname, | ||
| port, | ||
| path: this._parsedUrl.pathname, | ||
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'Content-Length': bodyBuf.length, | ||
| ...this._headers, | ||
| }, | ||
| }; |
| const req = Transport.request(reqOptions, (res) => { | ||
| res.on('data', () => {}); | ||
| res.on('end', () => { | ||
| const ok = res.statusCode >= 200 && res.statusCode < 300; | ||
| resultCallback({ code: ok ? 0 : 1 }); | ||
| }); | ||
| }); | ||
| req.on('error', (err) => { resultCallback({ code: 1, error: err }); }); | ||
| req.write(bodyBuf); | ||
| req.end(); |
| "dependencies": { | ||
| "@opentelemetry/api": "^1.9.1", | ||
| "@opentelemetry/exporter-trace-otlp-http": "^0.57.2", | ||
| "@opentelemetry/resources": "^1.30.1", | ||
| "@opentelemetry/sdk-trace-node": "^1.30.1", | ||
| "@opentelemetry/semantic-conventions": "^1.41.1", | ||
| "https-proxy-agent": "^7.0.6" |
| expect(otel._parseOtlpHeaders('Authorization=******')).toEqual({ | ||
| Authorization: '******', |
|
@copilot addresss the review feedback |
Addressed in c38b7d3. Updated the OTEL exporter normalization/request handling, added timeout + sync-error protection, ensured |
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 95.73% | 95.80% | 📈 +0.07% |
| Statements | 95.56% | 95.63% | 📈 +0.07% |
| Functions | 96.86% | 96.86% | ➡️ +0.00% |
| Branches | 89.42% | 89.34% | 📉 -0.08% |
📁 Per-file Coverage Changes (1 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/config-writer.ts |
83.0% → 85.6% (+2.54%) | 83.0% → 85.6% (+2.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
Smoke Test Results✅ GitHub API: 2 PR entries confirmed in recent-prs.json Overall: PASS ✅
|
🧪 Smoke Test: Copilot BYOK (Offline) Mode
Running in BYOK offline mode ( Author: Overall: PASS (core BYOK inference path confirmed working)
|
🔬 Smoke Test Results
PR: feat: add OpenTelemetry distributed tracing to api-proxy sidecar Overall: PARTIAL — MCP connectivity confirmed; pre-computed test data was not injected (workflow template variables unexpanded).
|
Codex Smoke Test✅ Merged PRs: [Test Coverage] Add test coverage for build-config and predownload commands; Align log discovery with canonical Squid container constant Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
🏗️ Build Test Suite Results
Overall: 8/8 ecosystems passed — PASS 🎉
|
|
Smoke test complete. Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
Chroot Version Comparison Results
Result:
|
Smoke Test Results
Overall: FAIL
|
The api-proxy sidecar captures rich token-usage telemetry but none of it participates in the distributed trace that gh-aw establishes per workflow run. This adds OTEL tracing so every proxied LLM request emits a CLIENT span as a child of the parent workflow trace, enabling end-to-end observability from GitHub Actions → api-proxy → LLM provider.
New module:
containers/api-proxy/otel.jsProxyAwareOtlpExporter— custom OTLP/HTTP+JSON exporter that routes throughHttpsProxyAgent(Squid). The standardOTLPTraceExportercannot accept a proxy agent, so this serializes spans manually to OTLP proto/JSON.FileSpanExporter— NDJSON fallback to/var/log/api-proxy/otel.jsonlwhen no endpoint is configured; mirrors the MCPGotel.jsonlpattern.OTEL_EXPORTER_OTLP_ENDPOINTis set.GITHUB_AW_OTEL_TRACE_ID+GITHUB_AW_OTEL_PARENT_SPAN_ID(W3C TraceContext).Span design (GenAI semantic conventions)
Each proxied request → one
api_proxy.{provider}.requestCLIENT span:gen_ai.usageevent emitted at span end with full token breakdown.Integration points
proxy-request.js— starts/ends span around the proxy lifecycle; wiresonUsageandonSpanEndcallbacks.token-tracker-http.js— newonSpanEndcallback fires at the absolute end offinalizeTracking(). This is necessary because for gzip-compressed responses the decompressor'sendevent fires afterproxyRes.on('end'), so naïvely ending the span in the responseendhandler would close it before token attributes are set.server.js— callsotel.shutdown()in graceful shutdown to flush the batch processor.src/services/api-proxy-service.ts— forwardsOTEL_EXPORTER_OTLP_ENDPOINT,OTEL_EXPORTER_OTLP_HEADERS,OTEL_SERVICE_NAME(defaultawf-api-proxy),GITHUB_AW_OTEL_TRACE_ID, andGITHUB_AW_OTEL_PARENT_SPAN_IDinto the container.Docs
docs/api-proxy-sidecar.md— new "OpenTelemetry distributed tracing" section: activation modes, env vars, full span/attribute/event tables, Squid allowlist note, gh-aw workflow integration.docs/awf-config-spec.md— normative §9.2 item 6: implementations MUST forward the five OTEL variables into the api-proxy sidecar container.Tests
29 new unit tests in
otel.test.jscover header parsing, span attributes/events/status, parent trace propagation (valid and invalid hex), OTLP serialization shape,ProxyAwareOtlpExporterURL normalization,FileSpanExportererror resilience, and graceful shutdown.