Capture per-endpoint OTLP export failure details on conclusion spans#32631
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds per-endpoint OTLP export failure detail capture and surfaces those details on conclusion spans.
Changes:
- Records terminal OTLP export failures with endpoint host, status, and reason in a JSONL file.
- Adds
gh-aw.otlp.export_error_detailsto conclusion spans from persisted failure details. - Updates tests for failure detail recording, JSONL mirror coexistence, and conclusion span attributes.
Show a summary per file
| File | Description |
|---|---|
actions/setup/js/send_otlp_span.cjs |
Implements OTLP export error detail persistence and conclusion span enrichment. |
actions/setup/js/send_otlp_span.test.cjs |
Adds and updates tests for export error details and mirror behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 6
| * @returns {entry is OTLPExportErrorDetail} | ||
| */ | ||
| function isValidOTLPExportErrorDetail(entry) { | ||
| return entry !== null && typeof entry === "object" && !Array.isArray(entry) && typeof entry.host === "string" && entry.host.trim() !== "" && typeof entry.reason === "string" && entry.reason.trim() !== ""; |
| return content | ||
| .split("\n") | ||
| .filter(line => line.trim() !== "") | ||
| .map(line => JSON.parse(line)) |
| recordOTLPExportError({ | ||
| endpoint, | ||
| ...(Number.isInteger(response.status) && response.status > 0 ? { status: response.status } : {}), | ||
| reason: response.statusText || msg, |
| console.warn(`OTLP export error after ${maxRetries + 1} attempts: ${msg}`); | ||
| recordOTLPExportError(); | ||
| recordOTLPExportError({ endpoint, reason: msg }); |
| if (details.length === 0) { | ||
| return ""; | ||
| } | ||
| return details.map(formatOTLPExportErrorDetail).join(" | "); |
| /** | ||
| * Path to the persisted OTLP export failure detail log. | ||
| * @type {string} | ||
| */ | ||
| const OTLP_EXPORT_ERROR_DETAILS_PATH = "/tmp/gh-aw/otlp-export-errors.jsonl"; |
🧪 Test Quality Sentinel ReportTest Quality Score: 92/100✅ Excellent test quality
Test Classification DetailsView test classification table
Scoring Breakdown
Final score: 92/100 Flagged Tests — Requires ReviewNo tests flagged. Both new tests are strong behavioral tests.
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §25964635944
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd (feature addition with tests) and a light /zoom-out pass (observability pipeline changes).
Key Themes
- Test coverage gap on the exception path: The new test exercises the HTTP-error branch of
sendOTLPSpanbut not thecatchbranch (network exception), which also callsrecordOTLPExportErrorwith different arguments. One more test would give full behavioral coverage of both paths. See inline comment. - Silent truncation of aggregate attribute:
formatOTLPExportErrorDetailshas no explicit length cap; the sanitization layer will silently truncate mid-entry when there are many failing endpoints. Adding an explicit.slice(0, MAX_ATTR_VALUE_LENGTH)(or capping entry count) makes the boundary intentional and testable. See inline comment. - Minor normalization inconsistency:
reasonis sliced but not trimmed, whilehostis trimmed. Easily fixed and worth a small test for the whitespace edge case.
Positive Highlights
- ✅ Clean decomposition into small, single-purpose helpers (
getOTLPExportErrorHost,isValidOTLPExportErrorStatus,normalizeOTLPExportErrorDetail,formatOTLPExportErrorDetail) — each is testable in isolation. - ✅ The fix to the existing JSONL-mirror retry test (using
findinstead oftoHaveBeenCalledOnce) is a nice improvement that makes that test correctly scoped to its one concern. - ✅ Graceful degradation: all recording is best-effort (
catch {}) with no change to export behavior. - ✅ The
isValidOTLPExportErrorDetailguard onreadOTLPExportErrorDetailsmeans a corrupted JSONL file can't cause a conclusion-span failure.
Verdict
No blocking issues; the two substantive observations (exception-path test, truncation boundary) are worth addressing but the core implementation is solid.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 5.3M
| expect(appendSpy).toHaveBeenCalledWith("/tmp/gh-aw/otlp-export-errors.jsonl", `${JSON.stringify({ host: "collector.example.com:4318", status: 401, reason: "Unauthorized" })}\n`); | ||
|
|
||
| appendSpy.mockRestore(); | ||
| }); |
There was a problem hiding this comment.
[/tdd] The new test covers the HTTP-error response path but not the network exception path (the catch block in sendOTLPSpan). Both branches now call recordOTLPExportError with slightly different arguments — the catch path omits status. A second test exercising a rejected fetch promise would complete the behavioral specification:
it("records OTLP export failure details on network error", async () => {
const mockFetch = vi.fn().mockRejectedValue(new Error("connect ECONNREFUSED"));
vi.stubGlobal("fetch", mockFetch);
const appendSpy = vi.spyOn(fs, "appendFileSync").mockImplementation(() => {});
await sendOTLPSpan("(collector.example.com/redacted)", {}, { maxRetries: 0, skipJSONL: true });
expect(appendSpy).toHaveBeenCalledWith(
"/tmp/gh-aw/otlp-export-errors.jsonl",
expect.stringContaining('"host":"collector.example.com:4318"')
);
appendSpy.mockRestore();
});Without this, a regression where only one branch records details would go undetected.
| if (details.length === 0) { | ||
| return ""; | ||
| } | ||
| return details.map(formatOTLPExportErrorDetail).join(" | "); |
There was a problem hiding this comment.
[/tdd] formatOTLPExportErrorDetails joins all endpoint strings with " | " but applies no explicit length cap before the value is passed to buildAttr. With enough failing endpoints the string can exceed MAX_ATTR_VALUE_LENGTH (1024), and it will be silently truncated mid-entry by the sanitizeOTLPPayload layer — leaving an operator with partial, potentially unparseable output.
Consider capping the aggregate string explicitly so the truncation is predictable:
function formatOTLPExportErrorDetails() {
const details = readOTLPExportErrorDetails();
if (details.length === 0) return "";
return details
.map(formatOTLPExportErrorDetail)
.join(" | ")
.slice(0, MAX_ATTR_VALUE_LENGTH);
}Alternatively, limit the number of entries read (e.g. details.slice(0, 10)) to keep each entry intact.
| return { | ||
| host: detail.host.trim(), | ||
| ...(isValidOTLPExportErrorStatus(detail.status) ? { status: detail.status } : {}), | ||
| reason: detail.reason.slice(0, MAX_ATTR_VALUE_LENGTH), |
There was a problem hiding this comment.
[/tdd] normalizeOTLPExportErrorDetail trims host (detail.host.trim()) but only slices reason — leaving potential leading/trailing whitespace in the stored value. This is a minor inconsistency but could produce unexpected output if a statusText value happens to contain surrounding spaces.
// suggested
reason: detail.reason.trim().slice(0, MAX_ATTR_VALUE_LENGTH),|
@copilot review all comments |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed the actionable review items in |
|
@copilot review all comments and review comments |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
This reverts commit 839423a. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot merge main and recompile |
Done in
Addressed in |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
This reverts commit 21a596d. Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…capture-otlp-export-details Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in |
OTLP export failures were only surfaced as a single counter, which made multi-endpoint failures opaque. When one backend rejected exports, operators could not tell which collector failed or why from the run artifacts or conclusion span.
Persist per-endpoint failure details
Surface details on conclusion spans
gh-aw.otlp.export_errorscount.gh-aw.otlp.export_error_detailsso the conclusion span carries endpoint-specific failure context alongside the aggregate count.Preserve current export behavior
Focused coverage