📊 Agentic Workflow Lock File Statistics Analysis

2025-12-02T03:36:22Z

github-actions[bot]
Bot Dec 2, 2025

Analysis Date: December 2, 2025
Repository: githubnext/gh-aw
Lockfiles Analyzed: 100

This report provides comprehensive statistical insights into the structure, patterns, and characteristics of agentic workflow lockfiles (.lock.yml files) in the gh-aw repository.

Executive Summary

The gh-aw repository contains 100 lock files representing a diverse ecosystem of agentic workflows. These workflows collectively define 903 jobs executing 6,159 steps, with an average workflow containing approximately 9 jobs and 62 steps. The most common workflow trigger is pull_request (93 occurrences), followed by issues (84) and workflow_dispatch (79). For safe outputs, create-discussion is overwhelmingly popular (used in 36 workflows), with GitHub MCP being the dominant external integration (3,695 references).

Full Statistical Analysis

File Size Distribution

Overview Statistics

Metric	Value
Total Files	100
Total Size	29.3 MB (30,746,131 bytes)
Average File Size	300 KB (307,461 bytes)
Median Size	306 KB (313,471 bytes)
Minimum Size	80 KB (82,160 bytes)
Maximum Size	604 KB (618,615 bytes)

Size Distribution

Size Range	Count	Percentage
<100 KB	4	4%
100-300 KB	42	42%
300-500 KB	53	53%
>500 KB	1	1%

File Size Extremes

Smallest Lock Files:

arxiv.lock.yml - 81 KB (shared MCP configuration)
context7.lock.yml - 81 KB (shared MCP configuration)
test-skip-if-match-object.lock.yml - 86 KB
test-firewall-default.lock.yml - 97 KB
example-permissions-warning.lock.yml - 106 KB

Largest Lock Files:

poem-bot.lock.yml - 605 KB (outlier, 2x larger than average)
pr-nitpick-reviewer.lock.yml - 398 KB
copilot-session-insights.lock.yml - 396 KB
q.lock.yml - 395 KB
smoke-copilot-no-firewall.lock.yml - 386 KB

Insight: Over half (54%) of lock files fall in the 300-500KB range, establishing this as the "standard" size for a fully-featured agentic workflow. The poem-bot.lock.yml is a notable outlier at 605KB, potentially due to extensive prompt engineering or complex multi-step logic.

Workflow Trigger Analysis

Trigger Type Distribution

Trigger	Count	Percentage	Description
pull_request	93	93%	Triggered on PR events
issues	84	84%	Triggered on issue events
workflow_dispatch	79	79%	Manual trigger capability
schedule	61	61%	Cron-scheduled workflows
issue_comment	12	12%	Triggered on issue/PR comments
push	4	4%	Triggered on git push
workflow_run	2	2%	Triggered after another workflow

Note: Percentages exceed 100% because workflows commonly use multiple triggers.

Common Trigger Combinations

Most workflows employ multiple trigger types to provide flexibility:

Issues + Pull Requests + Workflow Dispatch - Interactive workflows that handle both issues and PRs
Schedule + Workflow Dispatch - Daily/periodic workflows with manual override capability
Pull Request + Workflow Dispatch - PR-focused workflows with manual testing capability

Schedule Patterns

Scheduled Workflows: 61 workflows run on cron schedules

Common Schedules:

0 9 * * * - Daily at 9:00 AM UTC (1 workflow)
0 10 * * * - Daily at 10:00 AM UTC (1 workflow)
Various other daily/weekly patterns

Insight: The heavy use of multiple triggers (93% with pull_request, 84% with issues) indicates workflows designed for broad applicability. The high manual trigger adoption (79% with workflow_dispatch) demonstrates a preference for testability and on-demand execution.

Safe Outputs Analysis

Safe outputs define how agents communicate results back to GitHub, enabling controlled, secure interactions.

Safe Output Type Distribution

Safe Output Type	Count	Usage
create-discussion	36	Most popular - used for reports, analyses, and summaries
add-comment	17	Interactive workflows that respond to issues/PRs
create-issue	15	Workflows that create actionable issues
create-pull-request	14	Automated code changes and documentation updates

Total Workflows Using Safe Outputs: 82 (82% of all workflows)

Discussion Categories

When workflows create discussions, they use these categories:

Category	Count	Usage Pattern
audits	13	Code analysis, security scans, workflow audits
General	8	General-purpose reports and findings
Audits	4	(Capitalized variant)
dev	2	Development-focused discussions
artifacts	2	Build artifacts and summaries
security	1	Security-related findings
research	1	Research and investigation results
reports	1	Report summaries
daily-news	1	Daily digest
announcements	1	Team announcements

Insight: The "audits" category (17 total including variants) is heavily favored for automated analysis results, reflecting the repository's focus on code quality and observability.

Safe Output Patterns

Report Workflows: Typically use create-discussion to post comprehensive findings
Interactive Assistants: Use add-comment to respond directly to user requests
Quality Enforcers: Use create-issue to track problems that need attention
Automated Maintainers: Use create-pull-request for documentation fixes, code updates

Structural Characteristics

Job and Step Complexity

Metric	Value
Total Jobs	903
Total Steps	6,159
Average Jobs per Workflow	9.03
Average Steps per Workflow	61.59
Average Steps per Job	6.82

Insights:

Workflows are moderately complex, averaging 9 jobs
High step count (avg. 62 steps per workflow) indicates detailed orchestration
Average of ~7 steps per job suggests granular, well-decomposed tasks

Permission Patterns

GitHub Actions permissions granted across workflows:

Permission	Count	Typical Access Level
contents	89	Read/Write to repository content
pull-requests	81	Create/modify PRs and comments
issues	81	Create/modify issues and comments
actions	40	Access to GitHub Actions resources
discussions	13	Create/modify discussions
security-events	6	Access security scan results
repository-projects	3	Access to Projects

Insight: The high prevalence of contents, pull-requests, and issues permissions (80-90% of workflows) reflects the interactive, code-modifying nature of these agentic workflows.

Timeout Configuration

Timeout Distribution

Timeout (minutes)	Workflows	Percentage
10	255 occurrences	53%
20	90 occurrences	19%
15	19 occurrences	4%
30	11 occurrences	2%
5	8 occurrences	2%
45	3 occurrences	<1%

Statistics:

Average Timeout: 10.7 minutes
Most Common: 10 minutes (default for quick operations)
Longest: 45 minutes (for comprehensive analyses)

Insight: The 10-minute default timeout is most common, with longer timeouts (20-45 min) reserved for complex analysis workflows like daily-firewall-report and daily-repo-chronicle.

MCP Server Usage

MCP (Model Context Protocol) servers provide external capabilities to agentic workflows.

Most Used MCP Servers

MCP Server	References	Primary Use Case
github	3,695	GitHub API interactions (issues, PRs, repos)
playwright	210	Browser automation and web scraping
deepwiki	6	Deep Wikipedia research
arxiv	6	Academic paper retrieval
context	4	Context7 API for content
tavily	2	Web search API
microsoftdocs	2	Microsoft documentation access
markitdown	2	Markdown processing
ast-grep	2	AST-based code search

Insight: The GitHub MCP server dominates with 3,695 references (94% of all MCP usage), which makes sense given these workflows operate within GitHub Actions. Playwright (210 references) is the second most popular, enabling workflows to interact with web UIs and gather data from external sources.

Concurrency Patterns

Workflows with Concurrency Control: 100 (100%)

All workflows implement concurrency groups to prevent race conditions and resource conflicts. The standard pattern is:

concurrency:
  group: gh-aw-${{ github.workflow }}-${{ github.event.issue.number || github.event.pull_request.number }}
  cancel-in-progress: true

This ensures that only one instance of a workflow runs per issue/PR, with new runs canceling in-progress ones.

Engine Distribution

While most workflows use the standard gh-aw engine, some explicitly specify alternative engines:

Engine	Workflows	Examples
copilot	3	`glossary-maintainer`, `poem-bot`, `technical-doc-writer`
claude	5	`cloclo`, `daily-multi-device-docs-tester`, `unbloat-docs`, `smoke-claude`
codex	2	`changeset`, `daily-fact`

Insight: While the default engine handles most workflows, specialized engines are used for specific tasks - Copilot for creative/documentation work, Claude for complex analysis, and Codex for code-focused tasks.

Interesting Findings

1. Test Workflows Are Smallest

The smallest lock files are test and example workflows (81-106 KB), which makes sense as they validate specific features rather than implementing complex logic.

2. "Poem Bot" is an Outlier

At 605 KB, poem-bot.lock.yml is 2x larger than the average lock file. This suggests either extensive creative prompting, complex multi-turn interactions, or comprehensive example dialogues.

3. High Manual Trigger Adoption

79% of workflows support manual triggering via workflow_dispatch, indicating a strong emphasis on testability and developer control.

4. Discussion-First Philosophy

With 36 workflows using create-discussion (44% of safe-output workflows), there's a clear preference for creating durable, threaded discussions over ephemeral comments.

5. Comprehensive Concurrency Control

100% of workflows implement concurrency groups, demonstrating mature workflow orchestration practices to prevent resource conflicts.

6. GitHub MCP Dominance

The GitHub MCP server appears 3,695 times across workflows, showing how deeply these agentic workflows integrate with GitHub's ecosystem.

Average Lockfile Profile

Based on statistical medians and modes, a "typical" agentic workflow lock file has:

Size: ~300 KB
Jobs: 9 jobs
Steps: 62 steps (7 steps per job)
Triggers: pull_request, issues, workflow_dispatch
Timeout: 10 minutes
Safe Output: create-discussion
Permissions: contents (read/write), issues (write), pull-requests (write)
Concurrency: Enabled with cancel-in-progress
MCP Servers: GitHub (primary), possibly Playwright (secondary)

Recommendations

1. Standardize Discussion Categories

Consider consolidating "audits", "Audits", and "audit" into a single canonical category for consistency.

2. Investigate Large Lock Files

Review workflows >400 KB (like poem-bot) to determine if size optimizations are possible without sacrificing functionality.

3. Document Engine Selection Criteria

Create guidelines for when to use Copilot vs. Claude vs. Codex engines based on task characteristics.

4. Leverage Schedule Patterns

With 61 scheduled workflows, ensure cron schedules are distributed across time slots to avoid resource contention.

5. MCP Server Documentation

Document the 9 MCP servers in use and provide examples of when each should be employed.

Methodology

Analysis Tools:

Bash scripts for data extraction and statistical analysis
AWK for numerical aggregation and distribution calculations
Grep for pattern matching across YAML structures

Data Sources:

100 .lock.yml files from .github/workflows/ directory
Cached analysis results from previous runs for historical context

Cache Memory:

Analysis scripts persisted to /tmp/gh-aw/cache-memory/scripts/
Historical data maintained in /tmp/gh-aw/cache-memory/data/
Pattern library built for reusable analysis components

Lockfile Statistics Analysis Agent | Automated Statistical Analysis of Agentic Workflow Patterns

AI generated by Lockfile Statistics Analysis Agent

2025-12-06T00:20:30Z

github-actions[bot]
Bot Dec 6, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 3 days ago.

0 replies

Uh oh!

📊 Agentic Workflow Lock File Statistics Analysis - December 2025 #5262

Uh oh!

github-actions[bot] Bot Dec 2, 2025

📊 Agentic Workflow Lock File Statistics Analysis

Executive Summary

File Size Distribution

Overview Statistics

Size Distribution

File Size Extremes

Workflow Trigger Analysis

Trigger Type Distribution

Common Trigger Combinations

Schedule Patterns

Safe Outputs Analysis

Safe Output Type Distribution

Discussion Categories

Safe Output Patterns

Structural Characteristics

Job and Step Complexity

Permission Patterns

Timeout Configuration

Timeout Distribution

MCP Server Usage

Most Used MCP Servers

Concurrency Patterns

Engine Distribution

Interesting Findings

1. Test Workflows Are Smallest

2. "Poem Bot" is an Outlier

3. High Manual Trigger Adoption

4. Discussion-First Philosophy

5. Comprehensive Concurrency Control

6. GitHub MCP Dominance

Average Lockfile Profile

Recommendations

1. Standardize Discussion Categories

2. Investigate Large Lock Files

3. Document Engine Selection Criteria

4. Leverage Schedule Patterns

5. MCP Server Documentation

Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Dec 6, 2025 Author

github-actions[bot]
Bot Dec 2, 2025

github-actions[bot]
Bot Dec 6, 2025
Author