Skip to content

Revise blog post draft on GitHub Agentic Workflows#2176

Open
idan wants to merge 1 commit intotoken-efficiency-paperfrom
idan/optimization-post-copyedits
Open

Revise blog post draft on GitHub Agentic Workflows#2176
idan wants to merge 1 commit intotoken-efficiency-paperfrom
idan/optimization-post-copyedits

Conversation

@idan
Copy link
Copy Markdown
Contributor

@idan idan commented Apr 23, 2026

Revised the blog post draft to improve clarity and consistency, including formatting changes for workflow names and refining explanations of token efficiency and optimization processes.

Revised the blog post draft to improve clarity and consistency, including formatting changes for workflow names and refining explanations of token efficiency and optimization processes.
Copilot AI review requested due to automatic review settings April 23, 2026 23:20
@idan idan requested a review from Mossaka as a code owner April 23, 2026 23:20
@idan idan requested review from lpcox and removed request for Mossaka and Copilot April 23, 2026 23:20

**The workload is a live repository.** The workflows we optimize are not operating on consistent benchmark data. A workflow that processes a 200-line PR diff one day genuinely uses more tokens than one processing a 5-line fix a few hours later. The difference is correct behavior, not inefficiency. Raw token counts can conflate workload variation with efficiency changes. We try to normalize for this by tracking LLM API call counts alongside token counts; if the number of LLM turns per run stays constant while tokens-per-call falls, that's a genuine efficiency improvement. If both fall together, it could mean less work is being done.

**Does quality change?** This is the hardest question. A lighter model running a more constrained workflow might produce lower-quality output. We looked at the process-level signals like output tokens per LLM call, turn counts per run, and tool-call completion rates to approximate quality. For our optimized Smoke Copilot workflow all three remained stable across the optimization period even as token consumption fell. The workflow completes in exactly 5 LLM turns every run, before and after the optimizations. Of course, these are process signals, not outcome signals. We cannot directly observe whether the quality of agent output improved, degraded, or stayed flat, because we have no ground-truth labels for what "correct" output looks like. Measuring goodput—tokens per unit of correct work—requires additional instrumentation and thought.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a search/replace typo


The tools that we use to optimize our workflows like API-level observability, automated auditing workflows, MCP tool pruning, and CLI substitution are all available today in the Github Agentic Workflows framework. The measurement methodology (workload normalization, effective tokens) is documented in the [Effective Tokens specification](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/effective-tokens-specification.md) and the data and analysis scripts for this study are published on the [`token-efficiency-paper`](https://github.com/github/gh-aw-firewall/tree/token-efficiency-paper) branch.

The open questions are genuinely hard: measuring goodput requires outcome instrumentation that doesn't yet exist at scale for agentic CI workflows. We're building toward it. In the meantime, the proxy-level observability and the optimizer workflows have already changed how we develop and deploy new agentic automations—we add token monitoring from day one rather than retrofitting it later.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seemed like a typo

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

  • ❌ GitHub MCP: gh CLI failed (API connectivity limitation)
  • ✅ Playwright: github.com page title verified (contains "GitHub")
  • ✅ File Writing: Test file created and verified
  • ✅ Bash Tool: File read and verified successfully

Overall Status: PARTIAL — 3/4 tests passed; gh CLI unavailable

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

Smoke test report
PR titles:

  • feat: add Gemini engine smoke test workflow
  • chore: upgrade gh-aw to v0.69.3 and recompile workflows
    T1 ✅ T2 ❌ T3 ✅ T4 ❌
    T5 ✅ T6 ✅ T7 ✅ T8 ✅
    Overall status: FAIL

Warning

⚠️ Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant