Skip to content

Report per-skill turns and cost in PR body + dispatch summary#781

Closed
rdimitrov wants to merge 1 commit intomainfrom
skill-token-cost-metrics
Closed

Report per-skill turns and cost in PR body + dispatch summary#781
rdimitrov wants to merge 1 commit intomainfrom
skill-token-cost-metrics

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Context

`claude-code-action@v1` writes a `claude-execution-output.json` per invocation at `/home/runner/work/_temp/claude-execution-output.json` containing `num_turns`, `total_cost_usd`, and `permission_denials_count`. Today that info is only visible if you drill into the Actions Step Summary for the run — reviewers don't see it.

For reference, a recent `upstream-release-docs` run (24767008554) cost $6.04 total:

  • `skill_gen`: 89 turns, $5.77
  • `skill_review`: 4 turns, $0.27

Change

Two new capture steps parse the execution log right after each skill invocation — crucial because `skill_review` would otherwise overwrite `skill_gen`'s log. Outputs become `steps.skill_gen_stats.outputs.turns / cost_usd` and similarly for `skill_review_stats`.

Surfaced in two places:

1. PR body — new Run cost subsection

Inside the `` marker block. Renders per-session rows plus a Total when both reported. Applies to both `pull_request` and `workflow_dispatch` runs.

Session Turns Cost (USD)
Generation (`skill_gen`) 89 $5.7662
Editorial review (`skill_review`) 4 $0.2689
Total 93 $6.0351

2. workflow_dispatch summary comment — extended table

Existing step table picks up Turns and Cost columns, plus an explicit Total row and keeps the Autofix / Skill commits rows in their existing format (with `–` placeholders so all rows have 4 cells).

Why this matters

  • Cost visibility per PR — no need to click through to Actions.
  • Regression detection — a release that unexpectedly takes 200 turns (vs. the ~90 baseline) is immediately visible.
  • Monthly spend attribution — grep `total_cost_usd` across closed PRs.

Validation is the next `workflow_dispatch` (e.g. the currently-running e2e-test v3 run won't have this, since it was dispatched before merge, but the following one will).

Adds two capture steps that parse claude-code-action's execution
log (`/home/runner/work/_temp/claude-execution-output.json`) right
after each skill invocation, BEFORE the next one overwrites the
shared file. Exposes `turns`, `cost_usd`, and `permission_denials`
as step outputs for downstream use.

Surfaces the data in two places:

1. PR body: new "Run cost" subsection inside the upstream-release-
   docs marker block. Per-session rows plus a Total row when both
   sessions reported. Applies to both pull_request and
   workflow_dispatch runs.

2. workflow_dispatch summary comment: adds Turns and Cost columns
   to the existing step table, plus a Total row summing both
   sessions.

Useful for tracking per-release spend ($6 baseline) and catching
regressions -- e.g. a release that suddenly takes 10x the turns is
visible at a glance rather than requiring a drill-down into the
Actions Step Summary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 22, 2026 09:34
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs-website Ready Ready Preview, Comment Apr 22, 2026 9:34am

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the upstream-release-docs GitHub Actions workflow to capture per-skill Claude execution stats (turns and USD cost) and surface them directly in the PR body and the workflow_dispatch summary comment, improving cost visibility and regression detection for reviewers.

Changes:

  • Add post-skill capture steps that parse claude-execution-output.json into step outputs for skill_gen and skill_review.
  • Extend the PR body marker section to include a “Run cost” table (and totals when both sessions report).
  • Extend the workflow_dispatch summary comment table with Turns/Cost columns and a Total row.

# failed run still emits plausible outputs.
- name: Capture skill_gen stats
id: skill_gen_stats
if: always() && steps.skill_gen.conclusion == 'success'
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says missing-file defaults to 0 so a failed run still emits plausible outputs, but this step only runs when steps.skill_gen.conclusion == 'success'. If skill_gen fails (or is cancelled), the step won’t run and downstream GEN_TURNS/GEN_COST will be empty, so no stats will be reported. Consider running this capture step on any non-skipped outcome (and relying on the missing-file fallback), or update the comment/behavior to match the intended semantics.

Suggested change
if: always() && steps.skill_gen.conclusion == 'success'
if: always() && steps.skill_gen.conclusion != 'skipped'

Copilot uses AI. Check for mistakes.
# canonical log path, which skill_review overwrote on exit.
- name: Capture skill_review stats
id: skill_review_stats
if: always() && steps.skill_review.conclusion == 'success'
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as skill_gen_stats: this capture step only runs when steps.skill_review.conclusion == 'success', so it won’t emit outputs for failed/cancelled runs even though the script has a missing-file fallback to 0. If the goal is to report stats best-effort regardless of success, consider allowing this step to run for any non-skipped conclusion (or adjust the comments/expectations accordingly).

Suggested change
if: always() && steps.skill_review.conclusion == 'success'
if: always() && steps.skill_review.conclusion != 'skipped'

Copilot uses AI. Check for mistakes.
Comment on lines +1107 to +1113
| Step | Conclusion | Turns | Cost (USD) |
| --- | --- | ---: | ---: |
| Generation (\`skill_gen\`) | \`${GEN_CONCLUSION:-(not run)}\` | ${GEN_TURNS:-–} | ${GEN_COST:+\$$GEN_COST} |
| Editorial review (\`skill_review\`) | \`${REVIEW_CONCLUSION:-(not run)}\` | ${REVIEW_TURNS:-–} | ${REVIEW_COST:+\$$REVIEW_COST} |
| **Total** | | **${TOTAL_TURNS:-–}** | ${TOTAL_COST:+**\$$TOTAL_COST**} |
| Autofix (prettier/eslint) | \`${AUTOFIX_CONCLUSION:-(not run)}\` | – | – |
| Skill commits produced | \`${SKILL_COMMIT_COUNT:-?}\` | – | – |
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the workflow_dispatch summary table, the Cost cells use ${GEN_COST:+...} / ${REVIEW_COST:+...} / ${TOTAL_COST:+...}. When the variable is unset/empty this expands to an empty string (not the placeholder), so the markdown table will have blank cells and won’t match the stated “placeholders so all rows have 4 cells” behavior. Consider defaulting to an explicit placeholder (and only adding the $ prefix when a value is present).

Copilot uses AI. Check for mistakes.
Comment on lines +647 to +648
# per-invocation turns/cost in the PR body and the workflow_
# dispatch summary comment. Missing-file defaults to 0 so a
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the comment: workflow_ reads like an accidental truncation and is confusing in context. Consider changing it to workflow_dispatch to match the actual event name referenced elsewhere in this workflow.

Suggested change
# per-invocation turns/cost in the PR body and the workflow_
# dispatch summary comment. Missing-file defaults to 0 so a
# per-invocation turns/cost in the PR body and the workflow_dispatch
# summary comment. Missing-file defaults to 0 so a

Copilot uses AI. Check for mistakes.
@rdimitrov
Copy link
Copy Markdown
Member Author

Folding into #782 — the redesign depends on these changes, keeping them as separate PRs adds overhead without value.

@rdimitrov rdimitrov closed this Apr 22, 2026
@rdimitrov rdimitrov deleted the skill-token-cost-metrics branch April 22, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants