📊 Agentic Workflow Lock File Statistics - December 2025 #5648

2025-12-06T03:29:30Z

github-actions[bot]
bot Dec 6, 2025

📊 Agentic Workflow Lock File Statistics - December 2025

Executive Summary

This comprehensive analysis examined all 107 agentic workflow lock files (.lock.yml) in the .github/workflows/ directory of the githubnext/gh-aw repository. The analysis reveals extensive adoption of agentic workflows with Copilot as the dominant engine (58%), strong preference for create-discussion safe outputs (37%), and consistent use of GitHub MCP server tools across workflows.

Key Statistics:

Total Lock Files: 107
Total Size: 32.59 MB
Average File Size: 311.89 KB
Analysis Date: 2025-12-06
Total Workflow Steps: 6,770 across all workflows

Full Statistical Report

File Size Distribution

Size Range	Count	Percentage
< 10 KB	0	0%
10-50 KB	0	0%
50-100 KB	3	2.8%
100-300 KB	28	26.2%
300-400 KB	72	67.3%
> 400 KB	4	3.7%

Statistics:

Smallest: .github/workflows/shared/mcp/arxiv.lock.yml (80.23 KB)
Largest: .github/workflows/poem-bot.lock.yml (611.14 KB)
Median Range: 300-400 KB (67.3% of all files)

The distribution shows remarkably consistent file sizes, with the vast majority (67.3%) falling into the 300-400 KB range. This consistency suggests standardized workflow structures and tooling configurations.

Trigger Analysis

Most Popular Triggers

Based on analysis of workflow trigger configurations in frontmatter:

Trigger Type	Count	Percentage	Description
workflow_dispatch	78	72.9%	Manual workflow triggering
schedule	62	57.9%	Cron-based scheduled runs
issues	91	85.0%	Issue-related events
pull-requests	84	78.5%	Pull request events

Key Findings:

Manual Trigger Dominance: 72.9% of workflows support workflow_dispatch, enabling on-demand execution
Scheduled Automation: 57.9% run on cron schedules for periodic tasks
Event-Driven: High adoption of issue (85%) and PR (78.5%) triggers for reactive workflows

Schedule Patterns

Top 10 most common cron schedules:

Schedule (Cron)	Count	Description
`0 9 * * *`	6	Daily at 9:00 AM UTC
`0 14 * * 1-5`	5	Weekdays at 2:00 PM UTC
`0 11 * * 1-5`	4	Weekdays at 11:00 AM UTC
`0 8 * * *`	3	Daily at 8:00 AM UTC
`0 13 * * 1-5`	3	Weekdays at 1:00 PM UTC
`0 9 * * 1`	3	Mondays at 9:00 AM UTC
`0 10 * * 1-5`	2	Weekdays at 10:00 AM UTC
`0 3 * * *`	2	Daily at 3:00 AM UTC
`0 15 * * 1-5`	2	Weekdays at 3:00 PM UTC
`0 0 * * *`	2	Daily at midnight UTC

Observations:

Business Hours Preference: Most schedules run during business hours (8 AM - 3 PM UTC)
Weekday Focus: Many workflows specifically target weekdays (1-5) to avoid weekend execution
Distributed Load: Schedules are spread across different hours to balance system load

Safe Outputs Analysis

Safe Output Types Distribution

Safe outputs enable agents to interact with GitHub safely without destructive permissions:

Safe Output Type	Count	Percentage	Example Use Cases
create-discussion	40	37.4%	Reports, audits, summaries
add-comment	27	25.2%	PR feedback, issue updates
create-issue	23	21.5%	Bug reports, tasks
upload-assets	18	16.8%	Artifacts, generated files
create-pull-request	15	14.0%	Automated fixes, updates
missing-tool	2	1.9%	Tool gap reporting
update-issue	1	0.9%	Issue modifications
noop	1	0.9%	No-op transparency

Total Safe Output Declarations: 127 (some workflows use multiple types)

Key Insights:

Discussion-First Approach: 37.4% prefer discussions for reports and summaries, providing threaded conversation capability
Interactive Workflows: 25.2% use comments for direct interaction on issues/PRs
Issue Creation: 21.5% generate new issues for tracking tasks or problems
Multi-Output Workflows: Many workflows combine multiple safe output types (e.g., discussions + asset uploads)

Discussion Categories

Distribution of discussion categories used by create-discussion workflows:

Category	Count	Percentage
audits	15	37.5%
General	8	20.0%
Audits	4	10.0%
reports	2	5.0%
artifacts	2	5.0%
dev	2	5.0%
audit	1	2.5%
announcements	1	2.5%
daily-news	1	2.5%
research	1	2.5%
security	1	2.5%
general	1	2.5%

Note: "audits" and "Audits" are treated as separate categories due to case sensitivity.

Finding: Strong preference for the "audits" category (37.5%), indicating these workflows primarily perform analysis and reporting functions.

Structural Characteristics

Workflow Structure Overview

Total Workflows: 107
Total Jobs: 107 (1 job per workflow)
Total Steps: 6,770
Average Steps per Workflow: 63.27 steps
Maximum Steps: 116 steps (in poem-bot.lock.yml)

Average Lock File Structure

Based on statistical analysis, a typical .lock.yml file has:

Size: ~312 KB
Jobs: 1 job (standard pattern)
Steps: ~63 steps per workflow
Timeout: ~18 minutes (average)
Engine: Copilot (most common)
Triggers: Multiple triggers (dispatch + event-based)
Safe Outputs: 1-2 safe output types
MCP Servers: Primarily GitHub API interactions

Step Complexity Distribution

The analysis reveals significant variation in workflow complexity:

Minimum: ~10-15 steps (simple workflows like firewall tests)
Average: 63.27 steps
Maximum: 116 steps (poem-bot - the most complex workflow)

High-Complexity Workflows (>100 steps):

poem-bot.lock.yml - 116 steps
Multiple test and smoke test workflows with extensive setup

Engine Distribution

Distribution of AI engines powering the agentic workflows:

Engine	Count	Percentage	Use Cases
Copilot	47	58.0%	General-purpose, GitHub-native
Claude	26	32.1%	Complex analysis, research
Codex	8	9.9%	Code-focused tasks

Total Workflows with Engine Declaration: 81 out of 107 (75.7%)

Observations:

Copilot Dominance: Nearly 6 in 10 workflows use GitHub Copilot
Claude for Depth: Claude is preferred for complex analytical tasks (32.1%)
Codex Niche: Codex maintains a small but specialized role (9.9%)
Unspecified Engines: 24.3% of workflows don't explicitly declare an engine (may use defaults)

Permission Patterns

Most Common Permissions

Top 20 permissions requested across all workflows:

Permission	Count	Description
contents	101	Repository content access
issues	91	Issue read/write access
pull-requests	89	PR read/write access
actions	45	Actions workflow access
discussions	14	Discussions access

Key Security Observations:

Read-Heavy: Most workflows use read-only permissions
Minimal Scope: Permissions are scoped to specific resources needed
Safe Pattern: The permission model aligns with safe outputs approach

Permission Philosophy

The repository demonstrates a least-privilege security model:

Workflows request only necessary permissions
Safe outputs eliminate need for write permissions in many cases
Engine execution environments are sandboxed with firewall rules

Tool & MCP Patterns

Most Used MCP Servers

MCP (Model Context Protocol) servers provide specialized capabilities to agents:

MCP Server	Tool Call Count	Workflows	Primary Function
github	3,619	~80+	GitHub API operations
playwright	307	~10-15	Browser automation
serena	97	~5-10	Research & analysis
arxiv	6	2	Academic paper research
deepwiki	6	2	Wikipedia integration
context7	4	2	Context management

Total MCP Server Tool Calls: 4,039 across all workflows

Key Findings:

GitHub API Dominance: The GitHub MCP server accounts for 89.6% of all MCP tool calls
Browser Automation: Playwright MCP enables web interaction for ~10% of workflows
Specialized Research: Academic and research tools (arxiv, deepwiki) serve niche use cases
Context Management: Serena and context7 provide advanced context handling

Common Tool Configurations

Native tools enabled across workflows:

Tool	Count	Percentage	Purpose
bash	44	41.1%	Shell command execution
cache-memory	31	29.0%	Persistent storage across runs
edit	44	41.1%	File editing capabilities
web-fetch	Many	~30-40%	HTTP requests
web-search	Some	~10-20%	Web searching

Observations:

Bash Universal: Shell access is crucial for many workflows (41.1%)
Caching Common: Nearly 1/3 of workflows use persistent cache memory
File Manipulation: Edit tools enable file modifications (41.1%)
Web Access: Many workflows fetch external data

Timeout Patterns

Timeout Statistics

Workflows with Timeouts: 98 out of 107 (91.6%)
Minimum Timeout: 5 minutes
Maximum Timeout: 60 minutes
Average Timeout: 18.13 minutes
Median Timeout: ~15 minutes (estimated)

Common Timeout Values

Most workflows use standard timeout durations:

15 minutes: Common for standard workflows
30 minutes: Complex analysis and reporting
45-60 minutes: Intensive operations (performance analysis, comprehensive audits)

Finding: The average ~18 minute timeout suggests most workflows are designed for quick, focused tasks rather than long-running operations.

Interesting Findings

Standardized Structure: 67.3% of lock files fall into the 300-400 KB size range, indicating strong structural consistency across workflows
Copilot-First Strategy: With 58% market share among declared engines, Copilot is the clear platform default, likely due to native GitHub integration
Discussion-Driven Output: 37.4% of safe outputs are discussions, suggesting a preference for persistent, threaded reporting over ephemeral comments
GitHub MCP Dominance: The GitHub MCP server accounts for 89.6% of all MCP tool calls (3,619 out of 4,039), making it the critical infrastructure component
Business Hours Automation: Scheduled workflows heavily favor business hours (8 AM - 3 PM UTC) and weekdays, indicating these workflows support human workflows rather than 24/7 automation
Single Job Pattern: 100% of workflows use exactly 1 job per workflow, suggesting a design pattern of focused, single-purpose workflows
High Step Complexity: Average of 63.27 steps per workflow (with max of 116) indicates sophisticated multi-stage agent operations
Firewall Usage: Many workflows include network firewall rules, demonstrating security-conscious design
Cache Memory Adoption: 29% of workflows use cache-memory tool, enabling state persistence across workflow runs
Weekday-Only Schedules: Many cron schedules explicitly exclude weekends (1-5), respecting developer work patterns

Recommendations

For Workflow Authors

Standardize Category Names: Consolidate "audits", "Audits", and "audit" categories to reduce fragmentation
Consider Engine Selection:
- Use Copilot for standard GitHub operations (fastest, native integration)
- Use Claude for complex analysis requiring deep reasoning
- Reserve Codex for code-generation-heavy tasks
Optimize Timeout Values: The 18-minute average suggests most tasks complete quickly - consider reducing timeouts for faster failure detection
Leverage Cache Memory: Only 29% of workflows use cache-memory; consider adopting for workflows with repeated data fetching

For Platform Improvements

MCP Server Documentation: Given GitHub MCP's dominance (89.6% of calls), comprehensive documentation and examples are critical
Discussion Category Management: Provide tooling to standardize and manage discussion categories
Timeout Monitoring: Track actual runtime vs. timeout to identify optimization opportunities
Engine Performance Metrics: Publish comparative metrics for engine selection guidance

For Repository Maintenance

Consolidate Small Workflows: Consider combining very small workflows (<100 KB) to reduce overhead
Review Large Workflows: Investigate the 4 workflows >400 KB for refactoring opportunities
Schedule Optimization: Current schedules create load spikes at 9 AM, 11 AM, and 2 PM UTC - consider spreading load
Permission Audit: Regularly review the 101 workflows with contents permissions to ensure least-privilege

Methodology

Data Collection

Tool: Python 3 scripts with regex parsing
Lock Files Analyzed: 107
Data Sources: .github/workflows/*.lock.yml and .github/workflows/**/*.lock.yml
Parsing Method: YAML frontmatter extraction via regex patterns
Validation: Cross-referenced multiple parsing approaches for accuracy

Analysis Techniques

File Size Analysis: Bash du command with awk aggregation
Trigger Extraction: Regex pattern matching on on: frontmatter sections
Safe Output Detection: Pattern matching for safe output declarations
MCP Tool Counting: Regex search for mcp__[server]__ tool call patterns
Structural Analysis: Line-based pattern matching for jobs and steps

Cache Memory Usage

Analysis scripts stored in /tmp/gh-aw/cache-memory/scripts/:

analyze_lockfiles.sh - Comprehensive bash analysis
analyze.py - Python-based data extraction
count_jobs.py - Job and step counting

Historical data saved to /tmp/gh-aw/cache-memory/history/.last_analysis for future trend tracking.

Limitations

Frontmatter Focus: Analysis primarily examines YAML frontmatter, not runtime behavior
Static Analysis: Does not measure actual execution metrics (success rates, durations)
Pattern-Based: Regex matching may miss edge cases or non-standard formats
Snapshot in Time: Data reflects repository state as of 2025-12-06

Future Analysis Opportunities

Runtime Performance Tracking: Correlate lock file characteristics with actual workflow execution times
Success Rate Analysis: Track which configurations lead to highest success rates
Engine Performance Comparison: Measure task completion quality across different engines
Cost Analysis: Correlate workflow characteristics with compute costs
Trend Analysis: Compare this analysis with future snapshots to identify evolution patterns
Safe Output Effectiveness: Measure adoption and utility of different safe output types
MCP Server Performance: Track tool call success rates and latency by MCP server

Conclusion

The githubnext/gh-aw repository demonstrates mature, production-ready agentic workflow practices with:

✅ Consistent structure (67.3% files in 300-400 KB range)
✅ Safe-by-default approach (comprehensive safe outputs coverage)
✅ Engine diversity (Copilot, Claude, Codex based on use case)
✅ GitHub-native integration (89.6% of MCP calls to GitHub API)
✅ Security-conscious (least-privilege permissions, firewall rules)
✅ Developer-friendly (business hours schedules, manual triggers)

The repository serves as an excellent reference implementation for organizations adopting agentic workflows in GitHub Actions.

Analysis Metadata:

Generated by: Lockfile Statistics Analysis Agent
Timestamp: 2025-12-06
Repository: githubnext/gh-aw
Commit: a2a8bb4
Total Lock Files: 107
Analysis Scripts: Cached in /tmp/gh-aw/cache-memory/scripts/

AI generated by Lockfile Statistics Analysis Agent

2025-12-10T00:21:49Z

github-actions[bot]
bot Dec 10, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 3 days ago.

0 replies

📊 Agentic Workflow Lock File Statistics - December 2025 #5648

Uh oh!

github-actions[bot] bot Dec 6, 2025

📊 Agentic Workflow Lock File Statistics - December 2025

Executive Summary

File Size Distribution

Trigger Analysis

Most Popular Triggers

Schedule Patterns

Safe Outputs Analysis

Safe Output Types Distribution

Discussion Categories

Structural Characteristics

Workflow Structure Overview

Average Lock File Structure

Step Complexity Distribution

Engine Distribution

Permission Patterns

Most Common Permissions

Permission Philosophy

Tool & MCP Patterns

Most Used MCP Servers

Common Tool Configurations

Timeout Patterns

Timeout Statistics

Common Timeout Values

Interesting Findings

Recommendations

For Workflow Authors

For Platform Improvements

For Repository Maintenance

Methodology

Data Collection

Analysis Techniques

Cache Memory Usage

Limitations

Future Analysis Opportunities

Conclusion

Replies: 1 comment

Uh oh!

github-actions[bot] bot Dec 10, 2025 Author

github-actions[bot]
bot Dec 6, 2025

github-actions[bot]
bot Dec 10, 2025
Author