Refactor changeset workflow to scheduled batch processing with cache-based deduplication#3757
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the changeset workflow from a manual, label-triggered single-PR processor to an automated batch processor that runs every 2 hours to generate changesets for recently merged PRs.
Key Changes:
- Trigger mechanism: Changed from
pull_requestwith label filtering to scheduled cron execution (0 */2 * * *) - Processing model: Shifted from single PR context to batch processing of all PRs merged in the last 2 hours
- Deduplication: Introduced cache-based tracking using
cache-memoryto prevent duplicate changeset generation - Output mechanism: Replaced
push-to-pull-request-branchwithcreate-pull-requestfor submitting changeset files
Reviewed Changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
.github/workflows/changeset.md |
Removed PR trigger and label conditions; added schedule trigger; introduced cache-memory for tracking processed PRs; added pre-step to fetch merged PR data; updated agent instructions for batch processing; changed safe-output from push-to-pull-request-branch to create-pull-request; duplicated workflow content exists (lines 74-202 and 203-298) |
.github/workflows/changeset.lock.yml |
Compiled workflow reflecting all changes from the markdown source; removed pre_activation job and associated permission checks; removed push_to_pull_request_branch job; added create_pull_request job; added cache-memory setup with restore keys; simplified concurrency group; duplicated prompt content in lines 1288-1415 and 1417-1512 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| changeset-processed-prs- | ||
| changeset-processed- | ||
| changeset- |
There was a problem hiding this comment.
[nitpick] The restore-keys for the cache include keys with incomplete prefixes (changeset-processed-, changeset-) that don't match the actual cache key pattern changeset-processed-prs-${{ github.workflow }}.
The restore keys should follow a hierarchical fallback pattern that matches the actual key structure:
restore-keys: |
changeset-processed-prs-${{ github.workflow }}-The current restore-keys would only be useful if there were other cache entries with those prefixes, but based on the configuration, only keys starting with changeset-processed-prs- exist.
| changeset-processed-prs- | |
| changeset-processed- | |
| changeset- |
| - Format as a clear table or list | ||
|
|
There was a problem hiding this comment.
[nitpick] The instruction states "Format as a clear table or list" for the PR description but doesn't provide a specific format example. Given the structured nature of the data (PR numbers, titles, changeset files), providing a template would help ensure consistency.
Recommendation: Add an example PR description format:
## Changesets Generated
| PR | Title | Changeset File |
|----|-------|----------------|
| #123 | Fix rendering bug | `patch-pr-123-fix-rendering-bug.md` |
| #456 | Update documentation | `patch-pr-456-update-docs.md` |
Generated changesets for 2 merged PRs from the last 2 hours.| - Format as a clear table or list | |
| - Format as a clear table or list | |
| - Example PR description format: | |
| ```markdown | |
| ## Changesets Generated | |
| | PR | Title | Changeset File | | |
| |----|-------|----------------| | |
| | #123 | Fix rendering bug | `patch-pr-123-fix-rendering-bug.md` | | |
| | #456 | Update documentation | `patch-pr-456-update-docs.md` | | |
| Generated changesets for 2 merged PRs from the last 2 hours. | |
| ``` |
| # Changeset Generator for Merged PRs | ||
|
|
||
| You are the Changeset Generator agent - responsible for automatically creating changeset files for recently merged pull requests. | ||
|
|
||
| ## Mission | ||
|
|
||
| When pull requests are merged to the default branch, analyze the changes and create properly formatted changeset files that document the changes according to the changeset specification. | ||
|
|
||
| ## Current Context | ||
|
|
||
| - **Repository**: ${{ github.repository }} | ||
| - **Analysis Period**: Last 2 hours | ||
| - **Cache Location**: `/tmp/gh-aw/cache-memory/` - Used to track which PRs have been processed | ||
| - **PR Data Location**: `/tmp/gh-aw/pr-data/recent-merged-prs.json` - Pre-fetched merged PR data | ||
|
|
||
| ## Task Overview | ||
|
|
||
| ### Phase 1: Load and Filter PR Data | ||
|
|
||
| 1. **Load the pre-fetched PR data** from `/tmp/gh-aw/pr-data/recent-merged-prs.json` | ||
| 2. **Check the cache** in `/tmp/gh-aw/cache-memory/` to identify which PRs have already been processed | ||
| - The cache should contain a file or data structure tracking processed PR numbers | ||
| - Create a simple tracking mechanism (e.g., a JSON file with processed PR numbers) | ||
| 3. **Filter out already-processed PRs** to get the list of PRs that need changeset files | ||
|
|
||
| ### Phase 2: Generate Changeset Files | ||
|
|
||
| For each unprocessed merged PR: | ||
|
|
||
| 1. **Analyze the Pull Request**: Review the PR title and body to understand what has been modified | ||
| 2. **Use the repository name as the package identifier** (gh-aw) | ||
| 3. **Determine the Change Type**: | ||
| - **major**: Major breaking changes (X.0.0) - Very unlikely, probably should be **minor** | ||
| - **minor**: Breaking changes in the CLI (0.X.0) - indicated by "BREAKING CHANGE" or major API changes | ||
| - **patch**: Bug fixes, docs, refactoring, internal changes, tooling, new shared workflows (0.0.X) | ||
|
|
||
| **Important**: Internal changes, tooling, and documentation are always "patch" level. | ||
|
|
||
| 4. **Generate the Changeset File**: | ||
| - Create file in `.changeset/` directory (already created by pre-step) | ||
| 4. **Generate ONE Changeset File per PR**: | ||
| - Create file in `.changeset/` directory | ||
| - Use format from the changeset format reference above | ||
| - Filename: `<type>-<short-description>.md` (e.g., `patch-fix-bug.md`) | ||
|
|
||
| 5. **Commit and Push Changes**: | ||
| - Git is already configured by pre-step | ||
| - Add and commit the changeset file using git commands: | ||
| ```bash | ||
| git add .changeset/<filename> && git commit -m "Add changeset" | ||
| ``` | ||
| - **CRITICAL**: You MUST call the `push_to_pull_request_branch` tool to push your changes: | ||
| ```javascript | ||
| push_to_pull_request_branch({ | ||
| message: "Add changeset for this pull request" | ||
| }) | ||
| ``` | ||
| - The `branch` parameter is optional - it will automatically detect the current PR branch | ||
| - This tool call is REQUIRED for your changes to be pushed to the pull request | ||
| - **WARNING**: If you don't call this tool, your changeset file will NOT be pushed and the job will be skipped | ||
| - Filename: `<type>-pr-<pr-number>-<short-description>.md` (e.g., `patch-pr-123-fix-bug.md`) | ||
| - Include PR number in the changeset description for traceability | ||
|
|
||
| 5. **Update the cache** to mark this PR as processed: | ||
| - Add the PR number to your tracking file in `/tmp/gh-aw/cache-memory/` | ||
|
|
||
| ### Phase 3: Create Pull Request | ||
|
|
||
| After generating all changeset files: | ||
|
|
||
| 1. **Git operations are already configured** by the pre-step | ||
| 2. **Stage and commit all changeset files**: | ||
| ```bash | ||
| git add .changeset/*.md | ||
| git commit -m "Add changesets for merged PRs" | ||
| ``` | ||
|
|
||
| 3. **The safe-outputs create-pull-request will automatically**: | ||
| - Create a new branch | ||
| - Push your changes | ||
| - Create a PR with the changeset files | ||
| - Use title: `[changeset] Add changesets for merged PRs` | ||
|
|
||
| 4. **Include in the PR description**: | ||
| - List of PRs processed with their numbers and titles | ||
| - Summary of changeset files created | ||
| - Any notes about the changes | ||
|
|
||
| ## Guidelines | ||
|
|
||
| - **Be Accurate**: Analyze the PR content carefully to determine the correct change type | ||
| - **Be Clear**: The changeset description should clearly explain what changed | ||
| - **Be Accurate**: Analyze each PR content carefully to determine the correct change type | ||
| - **Be Clear**: Each changeset description should clearly explain what changed | ||
| - **Be Concise**: Keep descriptions brief but informative | ||
| - **Follow Conventions**: Use the exact changeset format specified above | ||
| - **Single Package Default**: If unsure about package structure, default to "gh-aw" | ||
| - **Smart Naming**: Use descriptive filenames that indicate the change (e.g., `patch-fix-rendering-bug.md`) | ||
| - **Single Package Default**: Always use "gh-aw" as the package identifier | ||
| - **Track Progress**: Always update the cache after processing each PR | ||
| - **One File Per PR**: Each PR gets exactly one changeset file | ||
| - **Smart Naming**: Include PR number in filename for easy tracking (e.g., `patch-pr-456-update-docs.md`) | ||
|
|
||
| ## Example Changeset File | ||
|
|
||
| ```markdown | ||
| --- | ||
| "gh-aw": patch | ||
| --- | ||
|
|
||
| Fixed rendering bug in console output (PR #456) | ||
| ``` | ||
|
|
||
| ## Important Notes | ||
|
|
||
| - The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json` | ||
| - Use the cache to avoid processing the same PR twice | ||
| - Process all unprocessed PRs in a single workflow run | ||
| - If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional) |
There was a problem hiding this comment.
The workflow markdown content contains a complete duplication of the agent instructions. Lines 74-202 are nearly identical to lines 203-298, with only minor variations in wording. This duplication significantly increases the token count sent to the AI agent and could cause confusion.
Recommendation: Remove the duplicate section (either lines 74-202 or 203-298) and keep only one consolidated version of the instructions.
| 3. **Filter out already-processed PRs** to get the list of PRs that need changeset files | ||
|
|
||
| ### Phase 2: Generate Changeset Files | ||
|
|
||
| For each unprocessed merged PR: | ||
|
|
||
| 1. **Analyze the Pull Request**: Review the PR title and body to understand what has been modified | ||
| 2. **Use the repository name as the package identifier** (gh-aw) | ||
| 3. **Determine the Change Type**: | ||
| - **major**: Major breaking changes (X.0.0) - Very unlikely, probably should be **minor** | ||
| - **minor**: Breaking changes in the CLI (0.X.0) - indicated by "BREAKING CHANGE" or major API changes | ||
| - **patch**: Bug fixes, docs, refactoring, internal changes, tooling, new shared workflows (0.0.X) | ||
|
|
||
| **Important**: Internal changes, tooling, and documentation are always "patch" level. | ||
|
|
||
| 4. **Generate the Changeset File**: | ||
| - Create file in `.changeset/` directory (already created by pre-step) | ||
| - Use format from the changeset format reference above | ||
| - Filename: `<type>-<short-description>.md` (e.g., `patch-fix-bug.md`) | ||
|
|
||
| 5. **Commit and Push Changes**: | ||
| - Git is already configured by pre-step | ||
| - Add and commit the changeset file using git commands: | ||
| ```bash | ||
| git add .changeset/<filename> && git commit -m "Add changeset" | ||
| ``` | ||
| - **CRITICAL**: You MUST call the `push_to_pull_request_branch` tool to push your changes: | ||
| ```javascript | ||
| push_to_pull_request_branch({ | ||
| message: "Add changeset for this pull request" | ||
| }) | ||
| ``` | ||
| - The `branch` parameter is optional - it will automatically detect the current PR branch | ||
| - This tool call is REQUIRED for your changes to be pushed to the pull request | ||
| - **WARNING**: If you don't call this tool, your changeset file will NOT be pushed and the job will be skipped | ||
| 4. **Generate ONE Changeset File per PR**: | ||
| - Create file in `.changeset/` directory | ||
| - Use format from the changeset format reference above | ||
| - Filename: `<type>-pr-<pr-number>-<short-description>.md` (e.g., `patch-pr-123-fix-bug.md`) | ||
| - Include PR number in the changeset description for traceability | ||
|
|
||
| 5. **Update the cache** to mark this PR as processed: | ||
| - Add the PR number to your tracking file in `/tmp/gh-aw/cache-memory/` | ||
|
|
||
| ### Phase 3: Create Pull Request | ||
|
|
||
| After generating all changeset files: | ||
|
|
||
| 1. **Git operations are already configured** by the pre-step | ||
| 2. **Stage and commit all changeset files**: | ||
| ```bash | ||
| git add .changeset/*.md | ||
| git commit -m "Add changesets for merged PRs" | ||
| ``` | ||
|
|
||
| 3. **The safe-outputs create-pull-request will automatically**: | ||
| - Create a new branch | ||
| - Push your changes | ||
| - Create a PR with the changeset files | ||
| - Use title: `[changeset] Add changesets for merged PRs` | ||
|
|
||
| 4. **Include in the PR description**: | ||
| - List of PRs processed with their numbers and titles | ||
| - Summary of changeset files created | ||
| - Any notes about the changes | ||
|
|
||
| ## Guidelines | ||
|
|
||
| - **Be Accurate**: Analyze each PR content carefully to determine the correct change type | ||
| - **Be Clear**: Each changeset description should clearly explain what changed | ||
| - **Be Concise**: Keep descriptions brief but informative | ||
| - **Follow Conventions**: Use the exact changeset format specified above | ||
| - **Single Package Default**: Always use "gh-aw" as the package identifier | ||
| - **Track Progress**: Always update the cache after processing each PR | ||
| - **One File Per PR**: Each PR gets exactly one changeset file | ||
| - **Smart Naming**: Include PR number in filename for easy tracking (e.g., `patch-pr-456-update-docs.md`) | ||
|
|
||
| ## Example Changeset File | ||
|
|
||
| ```markdown | ||
| --- | ||
| "gh-aw": patch | ||
| --- | ||
|
|
||
| Fixed rendering bug in console output (PR #456) | ||
| ``` | ||
|
|
||
| ## Guidelines | ||
| ## Important Notes | ||
|
|
||
| - **Be Accurate**: Analyze the PR content carefully to determine the correct change type | ||
| - **Be Clear**: The changeset description should clearly explain what changed | ||
| - **Be Concise**: Keep descriptions brief but informative | ||
| - **Follow Conventions**: Use the exact changeset format specified above | ||
| - **Single Package Default**: If unsure about package structure, default to "gh-aw" | ||
| - **Smart Naming**: Use descriptive filenames that indicate the change (e.g., `patch-fix-rendering-bug.md`) | ||
| - The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json` | ||
| - Use the cache to avoid processing the same PR twice | ||
| - Process all unprocessed PRs in a single workflow run | ||
| - If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional) |
There was a problem hiding this comment.
The workflow markdown content is duplicated in the compiled workflow. Lines 1288-1415 are nearly identical to lines 1417-1512. This duplication wastes token budget when the prompt is sent to the AI agent and could lead to conflicting instructions.
Recommendation: Remove the duplicate section in the source markdown file to prevent it from appearing twice in the compiled workflow.
See below for a potential fix:
# (Lines 1417-1474 removed; no replacement needed. The workflow continues with the next step.)
| @@ -54,31 +51,15 @@ permissions: | |||
| pull-requests: read | |||
|
|
|||
| concurrency: | |||
| group: "gh-aw-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" | |||
| cancel-in-progress: true | |||
| group: "gh-aw-${{ github.workflow }}" | |||
There was a problem hiding this comment.
[nitpick] The concurrency group has changed from gh-aw-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} to gh-aw-${{ github.workflow }}. This means only one instance of the entire workflow can run at a time, rather than allowing concurrent runs for different PRs.
While this may be intentional for a scheduled workflow that processes all PRs in batch, the cancel-in-progress: true setting was also removed. This could lead to multiple scheduled runs queuing up if a previous run takes longer than 2 hours.
Recommendation: Consider adding cancel-in-progress: true to prevent queue buildup, or document why queuing multiple runs is acceptable.
| group: "gh-aw-${{ github.workflow }}" | |
| group: "gh-aw-${{ github.workflow }}" | |
| cancel-in-progress: true |
| - Filename: `<type>-pr-<pr-number>-<short-slug>.md` (e.g., `patch-pr-123-fix-bug.md`) | ||
| - Use lowercase and hyphens for the slug | ||
| - Keep the slug short (2-4 words max) |
There was a problem hiding this comment.
[nitpick] The filename pattern instruction includes <short-slug> but then the example shows patch-pr-123-fix-bug.md where "fix-bug" is the slug. However, line 130 says "Use format from the changeset format reference above" but doesn't specify what the slug should be derived from.
Recommendation: Clarify that the slug should be derived from the PR title (as mentioned in line 175: "Use the PR title") and provide clearer guidance:
- Filename:
<type>-pr-<pr-number>-<short-slug>.md - Slug: Derived from PR title, lowercase, hyphens, 2-4 words max
- Example: PR Network Permissions Testing Report - MCP Fetch Tool Analysis #123 "Fix rendering bug" →
patch-pr-123-fix-rendering-bug.md
| - Filename: `<type>-pr-<pr-number>-<short-slug>.md` (e.g., `patch-pr-123-fix-bug.md`) | |
| - Use lowercase and hyphens for the slug | |
| - Keep the slug short (2-4 words max) | |
| - Filename: `<type>-pr-<pr-number>-<short-slug>.md` | |
| - **Derive `<short-slug>` from the PR title**: | |
| - Take the PR title, convert to lowercase | |
| - Replace spaces and punctuation with hyphens | |
| - Use 2-4 words max for brevity | |
| - Example: PR #123 "Fix rendering bug" → `patch-pr-123-fix-rendering-bug.md` |
| - The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json` | ||
| - Use the cache to avoid processing the same PR twice | ||
| - Process all unprocessed PRs in a single workflow run | ||
| - If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional) |
There was a problem hiding this comment.
The "Important Notes" section (line 295) states "If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)" but there is no conditional check in the workflow that would prevent the job from running. The workflow relies on the agent to perform early exits, not on workflow-level conditionals.
This statement is misleading and inconsistent with lines 97-100 and 110-113 which correctly describe early exit behavior within the agent execution.
Recommendation: Update line 298 to match the accurate description: "If there are no unprocessed PRs, just exit without doing anything - this is normal and expected"
| - If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional) | |
| - If there are no unprocessed PRs, just exit without doing anything - this is normal and expected |
| (needs.pre_activation.outputs.activated == 'true') && (((github.event.pull_request.base.ref == github.event.repository.default_branch) && | ||
| ((github.event_name != 'pull_request') || (github.event.pull_request.head.repo.id == github.repository_id))) && | ||
| ((github.event_name != 'pull_request') || ((github.event.action != 'labeled') || (github.event.label.name == 'changeset' || | ||
| github.event.label.name == 'smoke')))) | ||
| runs-on: ubuntu-slim |
There was a problem hiding this comment.
The activation job no longer has an if condition to control when it runs. Previously it checked needs.pre_activation.outputs.activated == 'true' and various event conditions. Now the job always runs on schedule, which may be intentional but could lead to unnecessary executions.
Recommendation: Consider adding a condition like if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' to explicitly control when the activation job runs, or document why the unconditional execution is desired.
| runs-on: ubuntu-slim | |
| runs-on: ubuntu-slim | |
| if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' |
| 2. **Stage and commit all changeset files**: | ||
| ```bash | ||
| git add .changeset/*.md | ||
| git add /tmp/gh-aw/cache-memory/processed-prs.json |
There was a problem hiding this comment.
The instruction to add /tmp/gh-aw/cache-memory/processed-prs.json to the git commit (line 148) is incorrect. The cache-memory directory is outside the repository working directory and should not be committed to the PR. Only the changeset files in .changeset/ should be committed.
The cache is persisted via GitHub Actions cache mechanism, not via git commits.
Recommendation: Remove line 148 from the instructions. The correct git operations should be:
git add .changeset/*.md
git commit -m "Add changesets for merged PRs"| git add /tmp/gh-aw/cache-memory/processed-prs.json |
|
|
||
| **IMPORTANT - Token Optimization**: The pull request content above is already sanitized and available. DO NOT use `pull_request_read` or similar GitHub API tools to fetch PR details - you already have everything you need in the context above. Using API tools wastes 40k+ tokens per call. | ||
| 4. **Filter out already-processed PRs** to get the list of PRs that need changeset files | ||
| - Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select([.number] | inside($processed) | not)]' /tmp/gh-aw/pr-data/recent-merged-prs.json` |
There was a problem hiding this comment.
The jq filter for removing already-processed PRs uses the inside function incorrectly. The expression [.number] | inside($processed) checks if the array [.number] exists inside the processed array, but this is backwards.
Current (incorrect): [.number] | inside($processed) - checks if single-element array is inside processed
Should be: .number as $n | $processed | contains([$n]) | not - checks if processed contains the number
Alternatively, use: select(.number as $n | ($processed | index($n)) == null)
Example of the bug:
- If
processed = [123, 456]and.number = 123 [123] | inside([123, 456])returnsfalse(because the array [123] is not a subset)- The correct check should return
true(because 123 is in the processed list)
| - Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select([.number] | inside($processed) | not)]' /tmp/gh-aw/pr-data/recent-merged-prs.json` | |
| - Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select(.number as $n | ($processed | index($n)) == null)]' /tmp/gh-aw/pr-data/recent-merged-prs.json` |
Changeset generation previously required manual PR labeling and processed one PR at a time. Refactored to run every 2 hours, batch-process merged PRs, and track processed PRs via cache memory.
Changes
Trigger & Scheduling
pull_requestlabel-based trigger toschedule: cron: "0 */2 * * *"workflow_dispatchfor manual executionPR Query & Filtering
gh search prs/tmp/gh-aw/pr-data/recent-merged-prs.jsonCache-Based Deduplication
cache-memorywith keychangeset-processed-prs-${{ github.workflow }}/tmp/gh-aw/cache-memory/processed-prs.jsonas JSON array of processed PR numbersBatch Processing
<type>-pr-<number>-<slug>.mdSafe Output
push-to-pull-request-branchtocreate-pull-request[changeset]and labels[changeset, automation]Example
Workflow now processes multiple PRs automatically:
Agent creates changesets like
patch-pr-456-update-docs.mdand updates cache to prevent reprocessing.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.