fix: count caps, trim-by-date, upload-asset limit, & timeout#28916
fix: count caps, trim-by-date, upload-asset limit, & timeout#28916
Conversation
- agentic-optimization-kit: raise timeout 35→50 min - agentic-optimization-kit: add upload-asset max:5 allowed-exts:[.png] to safe-outputs - agentic-optimization-kit: Phase 3 logs count 400→1000 - agentic-optimization-kit: replace hardcoded date/workflow in optimization-log example with placeholders - agentic-optimization-kit: trim optimization-log by 30-day date window instead of 30-entry count - agentic-optimization-kit: add Phase Timing Checkpoints section - agentic-optimization-kit: specify Jaccard similarity (not "Jaccard or cosine") - agentic-optimization-kit: add gh api example for reading workflow source - copilot-token-optimizer: add // 0 null guard after jq add in pre-aggregation - copilot-token-optimizer: fetch count -c 50 → -c 200 - copilot-token-optimizer: trim optimization-log by 30-day date window instead of count - agentic-observability-kit: schedule monday → tuesday to stagger MCP rate-limit load Agent-Logs-Url: https://github.com/github/gh-aw/sessions/d5b6bfb2-47bf-4aa2-a24c-223cd4459d6a Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes multiple correctness and operational issues in the Agentic Optimization/Observability kits and the Copilot token optimizer workflows (null-safe jq aggregation, higher run caps, safer examples/log retention guidance, asset upload limits, timeout tuning, and schedule deconfliction).
Changes:
- Increased
gh aw logsfetch limits and made jq aggregations null-safe (add // 0) for Copilot token optimizer. - Updated Agentic Optimization Kit guardrails: timeout, upload-asset constraints, larger Phase 3 count, placeholder-based examples, date-window trimming guidance, and added phase timing checkpoints + workflow-source fetch example.
- Moved Agentic Observability Kit weekly schedule from Monday to Tuesday to avoid collisions.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/copilot-token-optimizer.md | Raises audit run cap, hardens jq aggregation, updates optimization-log retention guidance |
| .github/workflows/copilot-token-optimizer.lock.yml | Regenerated workflow lock with updated log cap and jq aggregation changes |
| .github/workflows/agentic-optimization-kit.md | Adds upload-asset limit + timeout increase, doc/guardrail refinements, phase timing guidance, placeholders |
| .github/workflows/agentic-optimization-kit.lock.yml | Regenerated lock reflecting timeout and upload-asset restriction changes |
| .github/workflows/agentic-observability-kit.md | Shifts weekly schedule to Tuesday |
| .github/workflows/agentic-observability-kit.lock.yml | Regenerated lock reflecting new Tuesday cron schedule |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 4
| To fetch a workflow file, run (replacing `WORKFLOW_NAME` with the actual filename without extension): | ||
|
|
||
| ```bash | ||
| gh api repos/{owner}/{repo}/contents/.github/workflows/WORKFLOW_NAME.md \ |
There was a problem hiding this comment.
The gh api repos/{owner}/{repo}/... example won’t work when copy/pasted unless {owner} and {repo} are also replaced. Consider either explicitly calling out that all placeholders must be replaced, or use an example that derives the repo from context (e.g., repos/$GITHUB_REPOSITORY/...) so only WORKFLOW_NAME needs editing.
| To fetch a workflow file, run (replacing `WORKFLOW_NAME` with the actual filename without extension): | |
| ```bash | |
| gh api repos/{owner}/{repo}/contents/.github/workflows/WORKFLOW_NAME.md \ | |
| To fetch a workflow file, run (replacing `WORKFLOW_NAME` with the actual filename without extension; the repository is derived from `$GITHUB_REPOSITORY`): | |
| ```bash | |
| gh api "repos/$GITHUB_REPOSITORY/contents/.github/workflows/WORKFLOW_NAME.md" \ |
| ] | ||
| | group_by(.workflow_name) | ||
| | map({ | ||
| workflow_name: .[0].workflow_name, |
There was a problem hiding this comment.
jq's group_by(.workflow_name) requires the input array to be sorted by the grouping key; otherwise runs with the same workflow name that are non-adjacent will end up in different groups and totals/averages will be wrong. Add a sort_by(.workflow_name) (or equivalent) immediately before group_by(.workflow_name) in this pipeline (as done in agentic-optimization-kit.md).
See below for a potential fix:
]
| sort_by(.workflow_name)
| run: "set -euo pipefail\nmkdir -p /tmp/gh-aw/token-audit\n\necho \"📥 Downloading Copilot workflow logs (last 7 days)...\"\n\nLOGS_EXIT=0\ngh aw logs \\\n --engine copilot \\\n --start-date -7d \\\n --json \\\n -c 200 \\\n > /tmp/gh-aw/token-audit/all-runs.json || LOGS_EXIT=$?\n\nif [ -s /tmp/gh-aw/token-audit/all-runs.json ]; then\n TOTAL=$(jq '.runs | length' /tmp/gh-aw/token-audit/all-runs.json)\n echo \"✅ Downloaded $TOTAL Copilot workflow runs (last 7 days)\"\n if [ \"$LOGS_EXIT\" -ne 0 ]; then\n echo \"⚠️ gh aw logs exited with code $LOGS_EXIT (partial results — likely API rate limit)\"\n fi\nelse\n echo \"❌ No log data downloaded (exit code $LOGS_EXIT)\"\n echo '{\"runs\":[],\"summary\":{}}' > /tmp/gh-aw/token-audit/all-runs.json\nfi\n" | ||
| - name: Pre-aggregate top workflows by token usage | ||
| run: "set -euo pipefail\nmkdir -p /tmp/gh-aw/token-audit\n\njq '{\n generated_at: (now | todateiso8601),\n window_days: 7,\n top_workflows: (\n [.runs[]\n | select(.status == \"completed\")\n | {\n workflow_name: .workflow_name,\n tokens: (.token_usage // 0),\n cost: (.estimated_cost // 0),\n turns: (.turns // 0),\n action_minutes: (.action_minutes // 0)\n }\n ]\n | group_by(.workflow_name)\n | map({\n workflow_name: .[0].workflow_name,\n run_count: length,\n total_tokens: (map(.tokens) | add),\n avg_tokens: ((map(.tokens) | add) / length),\n total_cost: (map(.cost) | add),\n total_turns: (map(.turns) | add),\n total_action_minutes: (map(.action_minutes) | add)\n })\n | sort_by(.total_tokens)\n | reverse\n | .[:10]\n )\n}' /tmp/gh-aw/token-audit/all-runs.json > /tmp/gh-aw/token-audit/top-workflows.json\n\necho \"✅ Generated top workflow summary at /tmp/gh-aw/token-audit/top-workflows.json\"\njq '.top_workflows' /tmp/gh-aw/token-audit/top-workflows.json\n" | ||
| run: "set -euo pipefail\nmkdir -p /tmp/gh-aw/token-audit\n\njq '{\n generated_at: (now | todateiso8601),\n window_days: 7,\n top_workflows: (\n [.runs[]\n | select(.status == \"completed\")\n | {\n workflow_name: .workflow_name,\n tokens: (.token_usage // 0),\n cost: (.estimated_cost // 0),\n turns: (.turns // 0),\n action_minutes: (.action_minutes // 0)\n }\n ]\n | group_by(.workflow_name)\n | map({\n workflow_name: .[0].workflow_name,\n run_count: length,\n total_tokens: (map(.tokens) | add // 0),\n avg_tokens: ((map(.tokens) | add // 0) / length),\n total_cost: (map(.cost) | add // 0),\n total_turns: (map(.turns) | add // 0),\n total_action_minutes: (map(.action_minutes) | add // 0)\n })\n | sort_by(.total_tokens)\n | reverse\n | .[:10]\n )\n}' /tmp/gh-aw/token-audit/all-runs.json > /tmp/gh-aw/token-audit/top-workflows.json\n\necho \"✅ Generated top workflow summary at /tmp/gh-aw/token-audit/top-workflows.json\"\njq '.top_workflows' /tmp/gh-aw/token-audit/top-workflows.json\n" |
There was a problem hiding this comment.
The pre-aggregation jq snippet uses group_by(.workflow_name) without first sorting by workflow_name. In jq, group_by only groups adjacent items, so totals can be incorrect unless the array is sorted by the same key first. Insert | sort_by(.workflow_name) before | group_by(.workflow_name) inside this jq program.
| run: "set -euo pipefail\nmkdir -p /tmp/gh-aw/token-audit\n\njq '{\n generated_at: (now | todateiso8601),\n window_days: 7,\n top_workflows: (\n [.runs[]\n | select(.status == \"completed\")\n | {\n workflow_name: .workflow_name,\n tokens: (.token_usage // 0),\n cost: (.estimated_cost // 0),\n turns: (.turns // 0),\n action_minutes: (.action_minutes // 0)\n }\n ]\n | group_by(.workflow_name)\n | map({\n workflow_name: .[0].workflow_name,\n run_count: length,\n total_tokens: (map(.tokens) | add // 0),\n avg_tokens: ((map(.tokens) | add // 0) / length),\n total_cost: (map(.cost) | add // 0),\n total_turns: (map(.turns) | add // 0),\n total_action_minutes: (map(.action_minutes) | add // 0)\n })\n | sort_by(.total_tokens)\n | reverse\n | .[:10]\n )\n}' /tmp/gh-aw/token-audit/all-runs.json > /tmp/gh-aw/token-audit/top-workflows.json\n\necho \"✅ Generated top workflow summary at /tmp/gh-aw/token-audit/top-workflows.json\"\njq '.top_workflows' /tmp/gh-aw/token-audit/top-workflows.json\n" | |
| run: "set -euo pipefail\nmkdir -p /tmp/gh-aw/token-audit\n\njq '{\n generated_at: (now | todateiso8601),\n window_days: 7,\n top_workflows: (\n [.runs[]\n | select(.status == \"completed\")\n | {\n workflow_name: .workflow_name,\n tokens: (.token_usage // 0),\n cost: (.estimated_cost // 0),\n turns: (.turns // 0),\n action_minutes: (.action_minutes // 0)\n }\n ]\n | sort_by(.workflow_name)\n | group_by(.workflow_name)\n | map({\n workflow_name: .[0].workflow_name,\n run_count: length,\n total_tokens: (map(.tokens) | add // 0),\n avg_tokens: ((map(.tokens) | add // 0) / length),\n total_cost: (map(.cost) | add // 0),\n total_turns: (map(.turns) | add // 0),\n total_action_minutes: (map(.action_minutes) | add // 0)\n })\n | sort_by(.total_tokens)\n | reverse\n | .[:10]\n )\n}' /tmp/gh-aw/token-audit/all-runs.json > /tmp/gh-aw/token-audit/top-workflows.json\n\necho \"✅ Generated top workflow summary at /tmp/gh-aw/token-audit/top-workflows.json\"\njq '.top_workflows' /tmp/gh-aw/token-audit/top-workflows.json\n" |
|
|
||
| ## Phase Timing Checkpoints | ||
|
|
||
| At the start of every phase, log a timestamp with `echo "⏱️ Phase N start: $(date -u +%H:%M:%SZ)"`. At the end of each phase, log elapsed time with `echo "✅ Phase N complete: $(date -u +%H:%M:%SZ)"`. This helps identify which phases consume the most wall-clock time so future timeouts can be sized accurately. |
There was a problem hiding this comment.
This section says to "log elapsed time" at phase end, but the suggested command logs only another timestamp (no duration). Either adjust the wording to “log a timestamp” or change the guidance to actually compute and print elapsed time (e.g., capture a start epoch/SECONDS and subtract at phase end).
| At the start of every phase, log a timestamp with `echo "⏱️ Phase N start: $(date -u +%H:%M:%SZ)"`. At the end of each phase, log elapsed time with `echo "✅ Phase N complete: $(date -u +%H:%M:%SZ)"`. This helps identify which phases consume the most wall-clock time so future timeouts can be sized accurately. | |
| At the start of every phase, log a timestamp with `echo "⏱️ Phase N start: $(date -u +%H:%M:%SZ)"`. At the end of each phase, log an end timestamp with `echo "✅ Phase N complete: $(date -u +%H:%M:%SZ)"`. These checkpoints help identify which phases consume the most wall-clock time by comparing start and end times so future timeouts can be sized accurately. |
Eight correctness bugs across
agentic-optimization-kit.md,copilot-token-optimizer.md, andagentic-observability-kit.md— silent data truncation, a cooldown window that breaks under load, and a Monday schedule collision causing concurrent MCP rate-limit pressure.agentic-optimization-kit.mdmax: 5, allowed-exts: [.png]); Phase 4 uploads 5 charts but gateway limit was absent — silent drop on the 5th<TODAY>/<WORKFLOW_NAME>/<TOTAL_TOKENS>/ etc. placeholders — the literal2026-04-28/daily-observability-reportwas today's date and copyable verbatimdatelogging at phase start/end to get empirical wall-clock data before any future timeout tuningworkflow_overlap_scoregh apiexample added for reading workflow source files:gh api repos/{owner}/{repo}/contents/.github/workflows/WORKFLOW_NAME.md \ --jq '.content' | base64 -dcopilot-token-optimizer.mdmap(.tokens) | addreturnsnullon empty input; added// 0after everyaddin the pre-aggregation script (AOK already had this fix)-c 50→-c 200; 50 runs covers ~3 days, biasing target selectionagentic-observability-kit.mdweekly on monday→weekly on tuesday— eliminates the concurrent Monday MCPlogscall collision withagentic-optimization-kitat zero structural cost