Skip to content

[q] Optimize close-old-discussions workflow with pre-filtered data #4633

@github-actions

Description

@github-actions

Q Workflow Optimization Report

Issues Found (from live data)

close-old-discussions

  • Log Analysis: Run #19622681948 (2025-11-24T03:46:55Z)
  • Run URL: https://github.com/githubnext/gh-aw/actions/runs/19622681948
  • Issues Identified:
    • Excessive MCP calls: Agent made 2+ paginated calls to list_discussions API
    • Large data transfer: Each call returned up to 100 discussions, sending potentially large amounts of data to LLM
    • Token inefficiency: Discussion data (titles, bodies, metadata) sent to LLM even when most discussions don't match criteria
    • Performance: Multiple round-trips to GitHub API during agent execution

Evidence from logs:

"name": "github-list_discussions",
"arguments": "{\"owner\": \"githubnext\", \"repo\": \"gh-aw\", \"perPage\": 100}"

"name": "github-list_discussions", 
"arguments": "{\"owner\": \"githubnext\", \"repo\": \"gh-aw\", \"perPage\": 100, \"after\": \"Y3Vyc29yOnYyOpK0MjAyNS0xMS0xN1QwOToxNjoxOVrOAIuXUw==\"}"

Changes Made

close-old-discussions.md

Added custom step to pre-download and filter discussions:

  1. GraphQL query: Uses GitHub GraphQL API to fetch discussions in batches of 100
  2. Server-side filtering: Filters discussions by:
    • Author: github-actions[bot] only
    • Age: Created more than 7 days ago
  3. Data reduction with jq: Reduces each discussion to only essential fields:
    • number, title, createdAt
    • Removes large fields like body, comments, reactions
  4. JSONL output: Saves filtered data to /tmp/gh-aw/filtered-discussions.jsonl

Updated agent instructions:

  • Agent now reads pre-filtered JSONL file instead of calling GitHub API
  • No filtering logic needed in agent - data is already filtered
  • Simple task: Read file and generate close_discussion outputs

Key optimization points:

  • Pagination handled upfront: Custom step handles all pagination before agent runs
  • Smart filtering with jq: Only necessary fields included in data sent to LLM
  • Zero GitHub API calls from agent: All data pre-fetched and filtered
  • Reduced token usage: LLM receives only matching discussions with minimal fields

Expected Improvements

Performance Metrics

  • API calls reduced: From 2+ paginated calls during agent execution to 0
  • Token usage reduced: Estimated 60-80% reduction by:
    • Pre-filtering non-matching discussions
    • Reducing data to essential fields only
    • Eliminating API response overhead from LLM context
  • Execution time: Faster agent execution (no API waiting)
  • Reliability: More consistent performance regardless of total discussion count

Scalability

  • Workflow now handles repositories with 100+ discussions efficiently
  • LLM receives manageable data size regardless of total discussions
  • Pagination handled once upfront vs multiple times during agent turns

Validation

✅ Workflow compiled successfully using gh aw compile:

✓ .github/workflows/close-old-discussions.md (238.0 KB)
[
  {
    "workflow": "close-old-discussions.md",
    "valid": true,
    "errors": [],
    "warnings": []
  }
]

Note: .lock.yml file will be generated automatically after merge.

Implementation Details

Custom Step Logic

The custom step uses a bash script that:

  1. Calculates cutoff date (7 days ago) using date command
  2. Iterates through discussion pages using GraphQL pagination
  3. Filters discussions inline using jq:
    jq -r --arg cutoff "$CUTOFF_DATE" '
      .data.repository.discussions.nodes 
      | map(select(
          .author.login == "github-actions[bot]" and 
          .createdAt < $cutoff
        ))
      | map({number, title, createdAt, author: .author.login})
    '
  4. Merges results and removes duplicates
  5. Outputs JSONL format for easy parsing by agent

Security Considerations

  • Uses ${{ github.token }} with existing permissions (discussions: read)
  • No elevated permissions required
  • GraphQL query is safe and read-only
  • Pagination safety limit: 10 pages (1000 discussions max)

References

Testing Recommendations

After merge, test the workflow with:

  1. Manual trigger via workflow_dispatch
  2. Verify discussions are correctly filtered in custom step logs
  3. Confirm agent successfully reads /tmp/gh-aw/filtered-discussions.jsonl
  4. Check that only matching discussions are closed
  5. Monitor token usage compared to previous runs

AI generated by Q


Note

This was originally intended as a pull request, but the git push operation failed.

Workflow Run: View run details and download patch artifact

The patch file is available as an artifact (aw.patch) in the workflow run linked above.
To apply the patch locally:

# Download the artifact from the workflow run https://github.com/githubnext/gh-aw/actions/runs/19622754502
# (Use GitHub MCP tools if gh CLI is not available)
gh run download 19622754502 -n aw.patch
# Apply the patch
git am aw.patch
Show patch preview (196 of 196 lines)
From 7ed7c3a19b456a19688d6b34d9232024192c448a Mon Sep 17 00:00:00 2001
From: "github-actions[bot]" <github-actions[bot]@users.noreply.github.com>
Date: Mon, 24 Nov 2025 03:56:00 +0000
Subject: [PATCH] Optimize close-old-discussions workflow with pre-filtered
 data

- Add custom step to pre-download discussions using GraphQL API
- Filter discussions by author (github-actions[bot]) and age (>7 days) before sending to LLM
- Use jq to reduce data size and prevent overwhelming the LLM
- Remove need for agent to make multiple paginated API calls
- Agent now only reads pre-filtered JSONL file

This addresses issue #4630 by reducing token usage and API calls.
---
 .github/workflows/close-old-discussions.md | 146 +++++++++++++++------
 1 file changed, 107 insertions(+), 39 deletions(-)

diff --git a/.github/workflows/close-old-discussions.md b/.github/workflows/close-old-discussions.md
index 3f428a8..c074db8 100644
--- a/.github/workflows/close-old-discussions.md
+++ b/.github/workflows/close-old-discussions.md
@@ -19,63 +19,131 @@ safe-outputs:
     max: 100
 timeout-minutes: 10
 strict: true
+steps:
+  - name: Fetch open discussions
+    id: fetch-discussions
+    run: |
+      # Use GraphQL to fetch all open discussions in one query
+      # Filter to only get discussions created by github-actions[bot]
+      # Calculate cutoff date (7 days ago)
+      CUTOFF_DATE=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)
+      
+      # Fetch discussions with pagination
+      DISCUSSIONS_FILE="/tmp/gh-aw/discussions.json"
+      echo '[]' > "$DISCUSSIONS_FILE"
+      
+      CURSOR=""
+      HAS_NEXT_PAGE=true
+      
+      while [ "$HAS_NEXT_PAGE" = "true" ]; do
+        if [ -z "$CURSOR" ]; then
+          CURSOR_ARG=""
+        else
+          CURSOR_ARG=", after: \"$CURSOR\""
+        fi
+        
+        RESULT=$(gh api graphql -f query="
+          query {
+            repository(owner: \"${{ github.repository_owner }}\", name: \"${{ github.event.repository.name }}\") {
+ 
... (truncated)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions