You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analysis of 196 lock files in .github/workflows/ as of 2026-04-20. Total corpus size: 14.7 MB. All files share schema version v3 and container image gh-aw-firewall/agent:0.25.25, indicating a unified, up-to-date fleet. Copilot is the dominant engine (65%), with Claude as a strong second (28%). Almost all workflows (~97%) are configured with create_report_incomplete_issue for error surfacing, and 97% can dispatch manually — hallmarks of a mature, observable pipeline.
Metric
Value
Total lock files
196
Total size
14.7 MB
Average file size
~77 KB
Smallest file
codex-github-remote-mcp-test (30 KB)
Largest file
smoke-claude (158 KB)
Schema version
v3 (100% of files)
Container version
gh-aw-firewall/agent:0.25.25 (100% of files)
Analysis date
2026-04-20
File Size Distribution
Size Range
Count
Percentage
< 10 KB
0
0.0%
10–50 KB
6
3.1%
50–100 KB
183
93.4%
> 100 KB
7
3.6%
The 50–100 KB range dominates overwhelmingly (93.4%), suggesting a relatively stable, template-driven structure across all workflows.
Largest and smallest files
Largest 5:
File
Size
smoke-claude
158 KB
smoke-copilot
126 KB
smoke-copilot-arm
124 KB
mcp-inspector
108 KB
issue-monster
107 KB
Smallest 5:
File
Size
codex-github-remote-mcp-test
30 KB
test-workflow
30 KB
example-permissions-warning
30 KB
firewall
31 KB
ace-editor
36 KB
Smoke test workflows tend to be largest (multi-agent, multi-step), while minimal test/example workflows are smallest.
Engine / Agent Distribution
Engine
Count
Percentage
copilot
128
65.3%
claude
55
28.1%
codex
11
5.6%
gemini
1
0.5%
crush
1
0.5%
Copilot is the dominant engine. Claude is a substantial second. Codex, Gemini, and Crush are minority/experimental engines.
Trigger Analysis
Most Popular Triggers
Trigger Type
Count
% of Workflows
workflow_dispatch
180
91.8%
schedule
133
67.9%
pull_request
33
16.8%
issue_comment
15
7.7%
issues
12
6.1%
pull_request_review_comment
6
3.1%
discussion
4
2.0%
discussion_comment
4
2.0%
workflow_call
2
1.0%
push
2
1.0%
workflow_run
1
0.5%
Most Common Trigger Combinations
Combination
Count
Percentage
schedule + workflow_dispatch
128
65.3%
pull_request + workflow_dispatch
22
11.2%
workflow_dispatch only
19
9.7%
issue_comment only
3
1.5%
issues only
2
1.0%
issue_comment + pull_request_review_comment
2
1.0%
Multi-event (4+ triggers)
2
1.0%
Schedule frequency patterns
Frequency Pattern
Count
Daily (any day)
82
Weekdays only
49
Sub-hourly
1
Monthly
1
The majority of scheduled workflows run daily, with a significant chunk restricted to weekdays only. Only one workflow runs sub-hourly, and one runs monthly.
Safe Outputs Analysis
Safe Output Types Distribution
Output Type
Workflows
Percentage
create_issue
65
33.2%
create_discussion
64
32.7%
add_comment
48
24.5%
create_pull_request
41
20.9%
upload_asset
22
11.2%
push_repo_memory
27
13.8%
add_labels
18
9.2%
push_to_pull_request_branch
8
4.1%
update_issue
7
3.6%
link_sub_issue
4
2.0%
update_pull_request
2
1.0%
hide_comment
2
1.0%
create_report_incomplete_issue
190
97.0%
create_report_incomplete_issue is effectively universal — a strong reliability pattern. noop with report-as-issue: true is present in 189 workflows (96.4%), ensuring no silent failures.
Discussion Categories Used
Category
Count
audits
46
announcements
5
reports
3
research
2
dev
2
artifacts
2
daily-news
1
agent-research
1
audits is by far the dominant discussion category (72% of create_discussion workflows), indicating a culture of regular automated auditing.
poem-bot stands out as the most feature-rich workflow, exercising the full breadth of the safe outputs system.
Structural Characteristics
Job Complexity
Metric
Value
Average jobs per workflow
6.1
Total job instances analyzed
1,188
Average steps per job
15.5
Maximum steps in a single job
65 (copilot-token-audit)
Most common job count
6 (85 workflows, 43.4%)
Job Count Distribution
Jobs per Workflow
Count
Percentage
2
4
2.0%
4
2
1.0%
5
46
23.5%
6
85
43.4%
7
44
22.4%
8
12
6.1%
9
2
1.0%
10
1
0.5%
6-job workflows are the clear plurality. The 5–7 job range covers 89.3% of all workflows.
Workflows with most steps in a single job (top 5)
Workflow
Max Steps
copilot-token-audit
65
daily-news
63
prompt-clustering-analysis
56
smoke-copilot-arm
55
smoke-claude
55
Permission Patterns
Most Commonly Requested Permissions (Job Level)
Permission
Occurrences
Type
contents:read
1,028
Read
issues:write
398
Write
actions:read
279
Read
discussions:write
222
Write
pull-requests:write
194
Write
pull-requests:read
178
Read
issues:read
174
Read
contents:write
143
Write
copilot-requests:write
99
Write
security-events:read
11
Read
security-events:write
9
Write
Permission Distribution
Category
Count
Percentage
Workflows with any write permission
189
96.4%
Read-only workflows
7
3.6%
contents:read is the single most common permission (present in ~5.2 instances per workflow on average), followed by write permissions for issues and discussions.
Tool & MCP Patterns
MCP Infrastructure
Feature
Workflows
Percentage
GitHub MCP Server
190
96.9%
MCP Gateway (gh-aw-mcpg)
196
100%
OTEL Observability
60
30.6%
Tavily Web Search
5
2.6%
All workflows run with the MCP Gateway (ghcr.io/github/gh-aw-mcpg:v0.2.25). The GitHub MCP Server is nearly universal (96.9%).
Timeout Configuration
Timeout (min)
Occurrences
Notes
15
217
Most common
20
215
Second most common
10
55
Short tasks
30
47
Longer tasks
45
17
60+
9
Long-running
Average timeout: 19.2 minutes
Range: 5–180 minutes
Total timeout configs: 575 across all jobs
Concurrency Groups
Pattern
Count
Description
gh-aw-$\{\{ workflow }}
148
Standard global lock
gh-aw-copilot-$\{\{ workflow }}
81
Copilot engine lock
gh-aw-claude-$\{\{ workflow }}
40
Claude engine lock
gh-aw-$\{\{ workflow }}-$\{\{ pr/issue }}
37
Per-PR/issue lock
push-repo-memory-*
10
Memory write serialization
gh-aw-codex-$\{\{ workflow }}
5
Codex engine lock
cancel-in-progress: true: 25 groups (12.3% — fast-feedback workflows)
cancel-in-progress: false: 220 groups (87.7% — durability preferred)
Interesting Findings
Total standardization: 100% of lock files use schema_version: v3 and gh-aw-firewall/agent:0.25.25, indicating the entire fleet was compiled/updated to the same spec version.
poem-bot is the most capable workflow: It exercises 8 distinct safe output types — the only workflow to touch nearly every output primitive (discussions, issues, PRs, comments, labels, sub-issues, branch pushes, and issue updates).
Near-universal error reporting: 97% of workflows configure create_report_incomplete_issue, and 96.4% configure noop with report-as-issue: true. Silent failures are structurally prevented across the fleet.
OTEL observability is a significant minority: 60 workflows (30.6%) instrument with OTEL telemetry — showing a meaningful investment in agentic observability, though not yet universal.
The 6-job archetype dominates: 43.4% of workflows have exactly 6 jobs, with 5-7 jobs covering 89.3% of all workflows — strong evidence of a template-driven scaffold with consistent structure.
Close-older-discussions is widely adopted: 57 of 64 discussion-creating workflows (89%) close older discussions, keeping the discussion board clean from historical runs.
Repo memory is niche but growing: 27 workflows (13.8%) use push_repo_memory for persistent state, clustering in analysis and research workflows that benefit from longitudinal context.
Recommendations
Expand OTEL coverage: Currently 30.6% adoption. Given the observable value in debugging agentic runs, extending to the remaining ~70% of workflows would improve fleet-wide observability.
Review read-only workflows: Only 7 workflows are genuinely read-only. Confirm these are intentional (e.g., firewall, codex-github-remote-mcp-test) and not misconfigured.
Audit high-step jobs: Jobs with 50+ steps (copilot-token-audit at 65, daily-news at 63) may benefit from decomposition into sub-workflows for maintainability.
Consider extending Tavily: Only 5 workflows use the Tavily web search capability. Workflows that do external research or news aggregation could benefit from this tool.
Evaluate crush/gemini adoption: With only 1 workflow each, crush and gemini engines appear experimental. A deliberate evaluation of their capabilities vs. copilot/claude could inform broader adoption or deprecation.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Analysis of 196 lock files in
.github/workflows/as of 2026-04-20. Total corpus size: 14.7 MB. All files share schema versionv3and container imagegh-aw-firewall/agent:0.25.25, indicating a unified, up-to-date fleet. Copilot is the dominant engine (65%), with Claude as a strong second (28%). Almost all workflows (~97%) are configured withcreate_report_incomplete_issuefor error surfacing, and 97% can dispatch manually — hallmarks of a mature, observable pipeline.codex-github-remote-mcp-test(30 KB)smoke-claude(158 KB)v3(100% of files)gh-aw-firewall/agent:0.25.25(100% of files)File Size Distribution
The 50–100 KB range dominates overwhelmingly (93.4%), suggesting a relatively stable, template-driven structure across all workflows.
Largest and smallest files
Largest 5:
smoke-claudesmoke-copilotsmoke-copilot-armmcp-inspectorissue-monsterSmallest 5:
codex-github-remote-mcp-testtest-workflowexample-permissions-warningfirewallace-editorSmoke test workflows tend to be largest (multi-agent, multi-step), while minimal test/example workflows are smallest.
Engine / Agent Distribution
Copilot is the dominant engine. Claude is a substantial second. Codex, Gemini, and Crush are minority/experimental engines.
Trigger Analysis
Most Popular Triggers
workflow_dispatchschedulepull_requestissue_commentissuespull_request_review_commentdiscussiondiscussion_commentworkflow_callpushworkflow_runMost Common Trigger Combinations
schedule + workflow_dispatchpull_request + workflow_dispatchworkflow_dispatchonlyissue_commentonlyissuesonlyissue_comment + pull_request_review_commentSchedule frequency patterns
The majority of scheduled workflows run daily, with a significant chunk restricted to weekdays only. Only one workflow runs sub-hourly, and one runs monthly.
Safe Outputs Analysis
Safe Output Types Distribution
create_issuecreate_discussionadd_commentcreate_pull_requestupload_assetpush_repo_memoryadd_labelspush_to_pull_request_branchupdate_issuelink_sub_issueupdate_pull_requesthide_commentcreate_report_incomplete_issuecreate_report_incomplete_issueis effectively universal — a strong reliability pattern.noopwithreport-as-issue: trueis present in 189 workflows (96.4%), ensuring no silent failures.Discussion Categories Used
auditsannouncementsreportsresearchdevartifactsdaily-newsagent-researchaudits is by far the dominant discussion category (72% of create_discussion workflows), indicating a culture of regular automated auditing.
Most common PR/issue labels
automationcookiedocumentationtestingai-generatedautomated-analysiscode-qualityrefactoringsecuritysafe-outputsdependenciesMost complex safe output configurations
Workflows with 4+ distinct output types:
poem-botsmoke-claudesmoke-copilot,smoke-copilot-arm,smoke-codexsmoke-projectpoem-botstands out as the most feature-rich workflow, exercising the full breadth of the safe outputs system.Structural Characteristics
Job Complexity
copilot-token-audit)Job Count Distribution
6-job workflows are the clear plurality. The 5–7 job range covers 89.3% of all workflows.
Workflows with most steps in a single job (top 5)
copilot-token-auditdaily-newsprompt-clustering-analysissmoke-copilot-armsmoke-claudePermission Patterns
Most Commonly Requested Permissions (Job Level)
contents:readissues:writeactions:readdiscussions:writepull-requests:writepull-requests:readissues:readcontents:writecopilot-requests:writesecurity-events:readsecurity-events:writePermission Distribution
contents:readis the single most common permission (present in ~5.2 instances per workflow on average), followed by write permissions for issues and discussions.Tool & MCP Patterns
MCP Infrastructure
All workflows run with the MCP Gateway (
ghcr.io/github/gh-aw-mcpg:v0.2.25). The GitHub MCP Server is nearly universal (96.9%).Timeout Configuration
Concurrency Groups
gh-aw-$\{\{ workflow }}gh-aw-copilot-$\{\{ workflow }}gh-aw-claude-$\{\{ workflow }}gh-aw-$\{\{ workflow }}-$\{\{ pr/issue }}push-repo-memory-*gh-aw-codex-$\{\{ workflow }}cancel-in-progress: true: 25 groups (12.3% — fast-feedback workflows)cancel-in-progress: false: 220 groups (87.7% — durability preferred)Interesting Findings
Total standardization: 100% of lock files use
schema_version: v3andgh-aw-firewall/agent:0.25.25, indicating the entire fleet was compiled/updated to the same spec version.poem-botis the most capable workflow: It exercises 8 distinct safe output types — the only workflow to touch nearly every output primitive (discussions, issues, PRs, comments, labels, sub-issues, branch pushes, and issue updates).Near-universal error reporting: 97% of workflows configure
create_report_incomplete_issue, and 96.4% configurenoopwithreport-as-issue: true. Silent failures are structurally prevented across the fleet.OTEL observability is a significant minority: 60 workflows (30.6%) instrument with OTEL telemetry — showing a meaningful investment in agentic observability, though not yet universal.
The 6-job archetype dominates: 43.4% of workflows have exactly 6 jobs, with 5-7 jobs covering 89.3% of all workflows — strong evidence of a template-driven scaffold with consistent structure.
Close-older-discussions is widely adopted: 57 of 64 discussion-creating workflows (89%) close older discussions, keeping the discussion board clean from historical runs.
Repo memory is niche but growing: 27 workflows (13.8%) use
push_repo_memoryfor persistent state, clustering in analysis and research workflows that benefit from longitudinal context.Recommendations
Expand OTEL coverage: Currently 30.6% adoption. Given the observable value in debugging agentic runs, extending to the remaining ~70% of workflows would improve fleet-wide observability.
Review read-only workflows: Only 7 workflows are genuinely read-only. Confirm these are intentional (e.g.,
firewall,codex-github-remote-mcp-test) and not misconfigured.Audit high-step jobs: Jobs with 50+ steps (
copilot-token-auditat 65,daily-newsat 63) may benefit from decomposition into sub-workflows for maintainability.Consider extending Tavily: Only 5 workflows use the Tavily web search capability. Workflows that do external research or news aggregation could benefit from this tool.
Evaluate crush/gemini adoption: With only 1 workflow each,
crushandgeminiengines appear experimental. A deliberate evaluation of their capabilities vs. copilot/claude could inform broader adoption or deprecation.Methodology
.github/workflows/*.lock.yml)yq v4.52.5,bash,awk,grepyq, metadata extraction from comment headers/tmp/gh-aw/cache-memory/scripts/, historical data at/tmp/gh-aw/cache-memory/history/2026-04-20.jsonReferences:
Beta Was this translation helpful? Give feedback.
All reactions