Revise blog post draft on GitHub Agentic Workflows#2176
Open
idan wants to merge 1 commit intotoken-efficiency-paperfrom
Open
Revise blog post draft on GitHub Agentic Workflows#2176idan wants to merge 1 commit intotoken-efficiency-paperfrom
idan wants to merge 1 commit intotoken-efficiency-paperfrom
Conversation
Revised the blog post draft to improve clarity and consistency, including formatting changes for workflow names and refining explanations of token efficiency and optimization processes.
idan
commented
Apr 23, 2026
|
|
||
| **The workload is a live repository.** The workflows we optimize are not operating on consistent benchmark data. A workflow that processes a 200-line PR diff one day genuinely uses more tokens than one processing a 5-line fix a few hours later. The difference is correct behavior, not inefficiency. Raw token counts can conflate workload variation with efficiency changes. We try to normalize for this by tracking LLM API call counts alongside token counts; if the number of LLM turns per run stays constant while tokens-per-call falls, that's a genuine efficiency improvement. If both fall together, it could mean less work is being done. | ||
|
|
||
| **Does quality change?** This is the hardest question. A lighter model running a more constrained workflow might produce lower-quality output. We looked at the process-level signals like output tokens per LLM call, turn counts per run, and tool-call completion rates to approximate quality. For our optimized Smoke Copilot workflow all three remained stable across the optimization period even as token consumption fell. The workflow completes in exactly 5 LLM turns every run, before and after the optimizations. Of course, these are process signals, not outcome signals. We cannot directly observe whether the quality of agent output improved, degraded, or stayed flat, because we have no ground-truth labels for what "correct" output looks like. Measuring goodput—tokens per unit of correct work—requires additional instrumentation and thought. |
Contributor
Author
There was a problem hiding this comment.
This seemed like a search/replace typo
idan
commented
Apr 23, 2026
|
|
||
| The tools that we use to optimize our workflows like API-level observability, automated auditing workflows, MCP tool pruning, and CLI substitution are all available today in the Github Agentic Workflows framework. The measurement methodology (workload normalization, effective tokens) is documented in the [Effective Tokens specification](https://github.com/github/gh-aw/blob/main/docs/src/content/docs/reference/effective-tokens-specification.md) and the data and analysis scripts for this study are published on the [`token-efficiency-paper`](https://github.com/github/gh-aw-firewall/tree/token-efficiency-paper) branch. | ||
|
|
||
| The open questions are genuinely hard: measuring goodput requires outcome instrumentation that doesn't yet exist at scale for agentic CI workflows. We're building toward it. In the meantime, the proxy-level observability and the optimizer workflows have already changed how we develop and deploy new agentic automations—we add token monitoring from day one rather than retrofitting it later. |
Contributor
Author
There was a problem hiding this comment.
This also seemed like a typo
Contributor
Smoke Test Results
Overall Status: PARTIAL — 3/4 tests passed; gh CLI unavailable
|
This comment has been minimized.
This comment has been minimized.
Contributor
|
Smoke test report
Warning The following domain was blocked by the firewall during workflow execution:
To allow these domains, add them to the network:
allowed:
- defaults
- "registry.npmjs.org"See Network Configuration for more information.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Revised the blog post draft to improve clarity and consistency, including formatting changes for workflow names and refining explanations of token efficiency and optimization processes.