Skip to content

Add exception.type to OTel exception span events; sanitize event attributes#25972

Merged
pelikhan merged 3 commits intomainfrom
copilot/otel-advisor-add-exception-type
Apr 13, 2026
Merged

Add exception.type to OTel exception span events; sanitize event attributes#25972
pelikhan merged 3 commits intomainfrom
copilot/otel-advisor-add-exception-type

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 13, 2026

Exception span events in sendJobConclusionSpan only emit exception.messageexception.type is never set, violating OTel semantic conventions and preventing backends (Tempo, Honeycomb, Datadog) from classifying/grouping errors. Separately, sanitizeOTLPPayload redacts span.attributes but skips span.events[].attributes, so exception messages bypass sensitive-key redaction.

Changes

  • Add exception.type extraction — parse colon-prefixed error messages (push_to_pull_request_branch:Cannot push... → type gh-aw.push_to_pull_request_branch, message Cannot push...). Falls back to gh-aw.AgentError when no valid identifier prefix exists (must match ^[a-z_][a-z0-9_.]*$, length < 64). Mixed-case prefixes are accepted but normalized to lowercase for consistent grouping (e.g., Push_To_PR:...gh-aw.push_to_pr).

  • Sanitize event attributes — extend sanitizeOTLPPayload to apply sanitizeAttrs() to span.events[].attributes, closing the redaction gap.

// Before: anonymous exceptions, no redaction on events
attributes: [buildAttr("exception.message", msg.slice(0, MAX_ATTR_VALUE_LENGTH))]

// After: classified exceptions, events sanitized
attributes: [
  buildAttr("exception.type", exceptionType),
  buildAttr("exception.message", exceptionMessage),
]

8 new tests covering type extraction, lowercase normalization, fallback cases, and event attribute sanitization. All 159 tests pass.

…vent attributes

- Extract colon-prefixed type from error messages (e.g., "push_to_pull_request_branch:...")
  as "gh-aw.push_to_pull_request_branch"; fall back to "gh-aw.AgentError"
- Strip the type prefix from exception.message when extracted
- Extend sanitizeOTLPPayload to sanitize span.events[].attributes (redact
  sensitive keys, truncate long values) — closes the gap where event attributes
  bypassed redaction
- Add 7 new tests covering type extraction, fallback, invalid prefixes, and
  event attribute sanitization

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/71aeea08-33f4-412c-8716-0866e880672e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add exception.type to exception span events for error classification Add exception.type to OTel exception span events; sanitize event attributes Apr 13, 2026
Copilot AI requested a review from pelikhan April 13, 2026 02:29
@pelikhan pelikhan marked this pull request as ready for review April 13, 2026 02:34
Copilot AI review requested due to automatic review settings April 13, 2026 02:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates OTLP exception span events emitted by sendJobConclusionSpan to include exception.type (per OTel semantic conventions) and closes a redaction gap by sanitizing span.events[].attributes in sanitizeOTLPPayload.

Changes:

  • Emit exception.type alongside exception.message for exception events, with type extraction from colon-prefixed error messages and a fallback type.
  • Extend sanitizeOTLPPayload to apply attribute redaction/truncation to span.events[].attributes.
  • Add/extend tests covering event-attribute sanitization and exception.type extraction/fallback behavior.
Show a summary per file
File Description
actions/setup/js/send_otlp_span.cjs Adds event-attribute sanitization and emits exception.type for exception span events.
actions/setup/js/send_otlp_span.test.cjs Adds tests for event attribute sanitization and exception type extraction/fallback cases.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment on lines +785 to +788
// Extract colon-prefixed type when available ("push_to_pull_request_branch:...")
const colonIdx = msg.indexOf(":");
const exceptionType = colonIdx > 0 && colonIdx < 64 && /^[a-z_][a-z0-9_.]*$/i.test(msg.slice(0, colonIdx)) ? `gh-aw.${msg.slice(0, colonIdx)}` : "gh-aw.AgentError";
const exceptionMessage = (colonIdx > 0 && exceptionType !== "gh-aw.AgentError" ? msg.slice(colonIdx + 1).trim() : msg).slice(0, MAX_ATTR_VALUE_LENGTH);
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix validation regex is case-insensitive (/i), but the PR description/spec says the identifier must match ^[a-z_][a-z0-9_.]*$ (lowercase only). As-is, messages like Push_To_PR:... would be treated as a valid type and emitted as gh-aw.Push_To_PR, which can lead to inconsistent exception.type values and diverges from the documented behavior. Consider removing the i flag (or normalizing the extracted prefix to lowercase before validation/emission) and adding a test to cover uppercase prefixes (either expected rejection or normalization).

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot mentioned this pull request Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Excellent test quality

Metric Value
New/modified tests analyzed 7 (new functions) + 3 modified
✅ Design tests (behavioral contracts) 7 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 7 (100%)
Duplicate test clusters 0
Test inflation detected ✅ Yes (148 test lines / 24 production lines = 6.17:1)
🚨 Coding-guideline violations 0

Test Classification Details

📋 All 7 new test functions (click to expand)
Test File Classification Issues Detected
redacts sensitive keys in span event attributes send_otlp_span.test.cjs ✅ Design None — asserts both redaction and pass-through. Has assertion messages.
truncates long string values in span event attributes send_otlp_span.test.cjs ✅ Design None — tests the 1024-char truncation boundary.
preserves span events without attributes unchanged send_otlp_span.test.cjs ✅ Design Minor: second expect lacks an assertion message.
extracts exception.type from colon-prefixed error messages send_otlp_span.test.cjs ✅ Design Minor: most expect() calls lack assertion messages.
falls back to gh-aw.AgentError when message has no colon prefix send_otlp_span.test.cjs ✅ Design Minor: expect() calls lack assertion messages.
falls back to gh-aw.AgentError when colon prefix contains invalid characters send_otlp_span.test.cjs ✅ Design Minor: expect() calls lack assertion messages.
falls back to gh-aw.AgentError when colon prefix exceeds 64 characters send_otlp_span.test.cjs ✅ Design Minor: only checks exception.type; does not assert the full original message is preserved (see below).

Modified tests (3 existing tests had exception.type assertions added):

  • "handles agent error events" — two events now also assert exception.type === "gh-aw.AgentError"
  • "caps agent errors at 7 events" — loop now also asserts exception.type for all 7 ✅

Flagged Tests — Minor Issues Only

⚠️ falls back to gh-aw.AgentError when colon prefix exceeds 64 characters

Classification: Design test (passes)
Issue: The test only asserts on exception.type but does not assert that exception.message contains the full original message (\$\{longPrefix}:some error). The production code comment says "Full original message kept when type extraction fails," and test 6 checks this for the invalid-characters case — but test 7 doesn't verify the same invariant for the length-exceeded case.
Suggested improvement: Add an assertion that msgAttr.value.stringValue equals the full original message when the prefix is too long.

const msgAttr = span.events[0].attributes.find(a => a.key === "exception.message");
expect(msgAttr.value.stringValue, "full message preserved when prefix exceeds 64 chars")
  .toBe(`\$\{longPrefix}:some error`);

⚠️ Missing assertion messages (tests 4–7, partially 3)

Classification: Guideline note
Issue: Several expect() calls in the new sendJobConclusionSpan tests don't include a description string (e.g., expect(typeAttr.value.stringValue).toBe("gh-aw.AgentError") rather than expect(typeAttr.value.stringValue, "context").toBe(...)). Tests 1–2 model the correct pattern. Not a blocking issue, but adds diagnostic value when failures occur in CI.

⚠️ Test inflation (6.17:1)

Classification: Metric flag — not a quality concern in context
Explanation: 148 test lines were added vs 24 production lines. This triggers the mechanical inflation flag, but it is expected here: the production code adds a new branch (event-attribute sanitization + type extraction) with several distinct fallback conditions that genuinely require separate test cases. The ratio is high because the new tests cover previously-untested code paths rather than inflating coverage of unchanged code.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests (no Go test files changed)
  • 🟨 JavaScript (*.test.cjs): 7 new tests, 3 modified (vitest)

Score Breakdown

Component Points Notes
Behavioral Coverage (40 pts) 40 All 7 new tests are design tests
Error/Edge Case Coverage (30 pts) 30 All 7 include error paths or boundary conditions
Low Duplication (20 pts) 20 No duplicate clusters detected
Proportional Growth (10 pts) 0 Inflation flag triggered (6.17:1); context-appropriate but mechanically penalized
Total 90

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 7 new tests verify observable outputs and behavioral contracts of the new exception.type extraction logic and event-attribute sanitization.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 454K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 7 new tests verify behavioral contracts of the exception.type extraction and event-attribute sanitization features. Minor suggestions posted in the comment above.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

…ping

Addresses review feedback: the regex accepts mixed-case prefixes but the
extracted type is now lowercased before emission, ensuring consistent
exception.type values (e.g., "Push_To_PR:..." → "gh-aw.push_to_pr").
Adds a test covering uppercase prefix normalization.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d1e40e6f-5461-4f60-a911-7dda914b4b87

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

@copilot review all comments

Addressed the review feedback about the case-insensitive regex in 1ed0c1d — the extracted prefix is now normalized to lowercase before emission (e.g., Push_To_PR:...gh-aw.push_to_pr), ensuring consistent exception.type values for backend grouping. Added a test covering uppercase prefix normalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[otel-advisor] OTel improvement: add exception.type to exception span events to enable error classification

3 participants