Skip to content

fix(otlp): derive gh-aw.run.status and status.code from output errors when conclusion env var is absent#33037

Merged
pelikhan merged 2 commits into
mainfrom
copilot/grafana-otel-advisor-improve-gh-aw-run-status
May 18, 2026
Merged

fix(otlp): derive gh-aw.run.status and status.code from output errors when conclusion env var is absent#33037
pelikhan merged 2 commits into
mainfrom
copilot/grafana-otel-advisor-improve-gh-aw-run-status

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 18, 2026

gh-aw.run.status and status.code were hardcoded to "success"/STATUS_CODE_OK on every conclusion span because GH_AW_AGENT_CONCLUSION is empty in the agent job's own post-step — a job cannot observe its own needs.<job>.result. The result: every span in Tempo reported success even on genuinely failed runs, making failure-based alerting and dashboards useless.

Changes

  • send_otlp_span.cjs — after reading agent_output.json, if rawRunStatus is empty (both GH_AW_AGENT_CONCLUSION and workflowRunConclusion absent) and outputErrors.length > 0, fall back to observable evidence:

    // Before: runStatus always "success" when env var is absent
    let runStatus = "success";
    const rawRunStatus = agentConclusion || workflowRunConclusion;
    // …only failure when rawRunStatus is explicitly "failure"/"timed_out"
    
    // After: fallback to output errors when no explicit conclusion signal
    if (!rawRunStatus && outputErrors.length > 0) {
      runStatus = "failure";
      statusCode = 2; // STATUS_CODE_ERROR
      statusMessage = `errors detected: ${errorMessages[0]}`.slice(0, 256);
    }

    statusCode changed from const to let to allow the late update.

  • send_otlp_span.test.cjs — four new tests in a "run.status fallback from observable error signals" block: fallback fires correctly; statusMessage uses first error; explicit GH_AW_AGENT_CONCLUSION=success is not overridden; no-error path remains success/OK.

…nv var is absent

When GH_AW_AGENT_CONCLUSION and workflowRunConclusion are both empty (e.g.
in the agent job's own post-step where needs.<job>.result is not visible),
fall back to outputErrors from agent_output.json to set gh-aw.run.status,
status.code and statusMessage accurately.

- Change `const statusCode` to `let statusCode` to allow later update
- After the rawRunStatus block, add fallback: if rawRunStatus is empty and
  outputErrors.length > 0, set runStatus="failure", statusCode=2, and
  statusMessage from the first error message
- Add four new tests covering: fallback fires, statusMessage content,
  explicit "success" conclusion not overridden, and no-error path stays OK

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve OTel status reporting from observable failure signals fix(otlp): derive gh-aw.run.status and status.code from output errors when conclusion env var is absent May 18, 2026
Copilot AI requested a review from pelikhan May 18, 2026 13:22
@pelikhan pelikhan marked this pull request as ready for review May 18, 2026 13:22
Copilot AI review requested due to automatic review settings May 18, 2026 13:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug where conclusion spans always reported gh-aw.run.status=success and STATUS_CODE_OK even on failed runs, because GH_AW_AGENT_CONCLUSION is empty in the agent job's own post-step (a job can't read its own needs.<job>.result). When no explicit conclusion signal is available, the code now falls back to observable evidence in agent_output.json errors.

Changes:

  • In sendJobConclusionSpan, when both GH_AW_AGENT_CONCLUSION and workflowRunConclusion are absent but outputErrors.length > 0, set runStatus="failure", statusCode=2, and a truncated statusMessage derived from the first error.
  • Promote statusCode from const to let so the fallback branch can update it.
  • Add four unit tests covering: fallback firing, first-error selection for statusMessage, no override when GH_AW_AGENT_CONCLUSION=success, and no-error path staying success/OK.
Show a summary per file
File Description
actions/setup/js/send_otlp_span.cjs Adds error-based fallback for runStatus/statusCode/statusMessage when no conclusion env var is present.
actions/setup/js/send_otlp_span.test.cjs New test block covering the fallback behavior and its boundaries.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@github-actions github-actions Bot mentioned this pull request May 18, 2026
@pelikhan pelikhan merged commit 495ee38 into main May 18, 2026
4 checks passed
@pelikhan pelikhan deleted the copilot/grafana-otel-advisor-improve-gh-aw-run-status branch May 18, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[grafana-otel-advisor] OTel improvement: gh-aw.run.status silently reports 'success' on real agent failures

3 participants