Skip to content

Refactor changeset workflow to scheduled batch processing with cache-based deduplication#3757

Merged
pelikhan merged 2 commits into
mainfrom
copilot/refactor-agentic-workflow
Nov 12, 2025
Merged

Refactor changeset workflow to scheduled batch processing with cache-based deduplication#3757
pelikhan merged 2 commits into
mainfrom
copilot/refactor-agentic-workflow

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 12, 2025

Changeset generation previously required manual PR labeling and processed one PR at a time. Refactored to run every 2 hours, batch-process merged PRs, and track processed PRs via cache memory.

Changes

Trigger & Scheduling

  • Changed from pull_request label-based trigger to schedule: cron: "0 */2 * * *"
  • Retained workflow_dispatch for manual execution

PR Query & Filtering

  • Added step to query merged PRs from last 4 hours using gh search prs
  • Filters to 2-hour window for primary processing (4-hour window catches stragglers)
  • Saves PR data to /tmp/gh-aw/pr-data/recent-merged-prs.json

Cache-Based Deduplication

  • Added cache-memory with key changeset-processed-prs-${{ github.workflow }}
  • Agent maintains /tmp/gh-aw/cache-memory/processed-prs.json as JSON array of processed PR numbers
  • Skips already-processed PRs to prevent duplicate changesets

Batch Processing

  • Agent processes all unprocessed PRs in single run
  • Generates one changeset file per PR: <type>-pr-<number>-<slug>.md
  • Implements early exit if no PRs need processing

Safe Output

  • Changed from push-to-pull-request-branch to create-pull-request
  • Creates PR with title prefix [changeset] and labels [changeset, automation]

Example

Workflow now processes multiple PRs automatically:

# Query merged PRs
gh search prs --repo ${{ github.repository }} \
  --merged --merged ">=${FOUR_HOURS_AGO}" \
  --json number,title,mergedAt,body,labels,url,author

# Filter to 2-hour window
jq '[.[] | select(.mergedAt >= $two_hours_ago)]' all-merged-prs.json

# Check cache and skip processed
jq '[.[] | select([.number] | inside($processed) | not)]' recent-prs.json

Agent creates changesets like patch-pr-456-update-docs.md and updates cache to prevent reprocessing.

Original prompt

Refactor the changeset agentic workflow to run every 2 hours and look at the latest merge pull requests and generate changeset for each of them.

  • add a custom job that queries pull requests from the last 2 hours and creates an output whether there is work to do. The agent task "if" must check that outpjt
  • only process at most the last 4h
  • use cache memory to determine if PR was processed
  • 1 changeset file per pull request
  • create pull requests with files if any

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor changeset workflow to run every 2 hours Refactor changeset workflow to scheduled batch processing with cache-based deduplication Nov 12, 2025
Copilot AI requested a review from pelikhan November 12, 2025 19:19
@pelikhan pelikhan marked this pull request as ready for review November 12, 2025 21:59
Copilot AI review requested due to automatic review settings November 12, 2025 21:59
@pelikhan pelikhan merged commit 1149aa2 into main Nov 12, 2025
96 checks passed
@pelikhan pelikhan deleted the copilot/refactor-agentic-workflow branch November 12, 2025 21:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the changeset workflow from a manual, label-triggered single-PR processor to an automated batch processor that runs every 2 hours to generate changesets for recently merged PRs.

Key Changes:

  • Trigger mechanism: Changed from pull_request with label filtering to scheduled cron execution (0 */2 * * *)
  • Processing model: Shifted from single PR context to batch processing of all PRs merged in the last 2 hours
  • Deduplication: Introduced cache-based tracking using cache-memory to prevent duplicate changeset generation
  • Output mechanism: Replaced push-to-pull-request-branch with create-pull-request for submitting changeset files

Reviewed Changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 12 comments.

File Description
.github/workflows/changeset.md Removed PR trigger and label conditions; added schedule trigger; introduced cache-memory for tracking processed PRs; added pre-step to fetch merged PR data; updated agent instructions for batch processing; changed safe-output from push-to-pull-request-branch to create-pull-request; duplicated workflow content exists (lines 74-202 and 203-298)
.github/workflows/changeset.lock.yml Compiled workflow reflecting all changes from the markdown source; removed pre_activation job and associated permission checks; removed push_to_pull_request_branch job; added create_pull_request job; added cache-memory setup with restore keys; simplified concurrency group; duplicated prompt content in lines 1288-1415 and 1417-1512

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +197 to +199
changeset-processed-prs-
changeset-processed-
changeset-
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The restore-keys for the cache include keys with incomplete prefixes (changeset-processed-, changeset-) that don't match the actual cache key pattern changeset-processed-prs-${{ github.workflow }}.

The restore keys should follow a hierarchical fallback pattern that matches the actual key structure:

restore-keys: |
  changeset-processed-prs-${{ github.workflow }}-

The current restore-keys would only be useful if there were other cache entries with those prefixes, but based on the configuration, only keys starting with changeset-processed-prs- exist.

Suggested change
changeset-processed-prs-
changeset-processed-
changeset-

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +163
- Format as a clear table or list

Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The instruction states "Format as a clear table or list" for the PR description but doesn't provide a specific format example. Given the structured nature of the data (PR numbers, titles, changeset files), providing a template would help ensure consistency.

Recommendation: Add an example PR description format:

## Changesets Generated

| PR | Title | Changeset File |
|----|-------|----------------|
| #123 | Fix rendering bug | `patch-pr-123-fix-rendering-bug.md` |
| #456 | Update documentation | `patch-pr-456-update-docs.md` |

Generated changesets for 2 merged PRs from the last 2 hours.
Suggested change
- Format as a clear table or list
- Format as a clear table or list
- Example PR description format:
```markdown
## Changesets Generated
| PR | Title | Changeset File |
|----|-------|----------------|
| #123 | Fix rendering bug | `patch-pr-123-fix-rendering-bug.md` |
| #456 | Update documentation | `patch-pr-456-update-docs.md` |
Generated changesets for 2 merged PRs from the last 2 hours.
```

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +298
# Changeset Generator for Merged PRs

You are the Changeset Generator agent - responsible for automatically creating changeset files for recently merged pull requests.

## Mission

When pull requests are merged to the default branch, analyze the changes and create properly formatted changeset files that document the changes according to the changeset specification.

## Current Context

- **Repository**: ${{ github.repository }}
- **Analysis Period**: Last 2 hours
- **Cache Location**: `/tmp/gh-aw/cache-memory/` - Used to track which PRs have been processed
- **PR Data Location**: `/tmp/gh-aw/pr-data/recent-merged-prs.json` - Pre-fetched merged PR data

## Task Overview

### Phase 1: Load and Filter PR Data

1. **Load the pre-fetched PR data** from `/tmp/gh-aw/pr-data/recent-merged-prs.json`
2. **Check the cache** in `/tmp/gh-aw/cache-memory/` to identify which PRs have already been processed
- The cache should contain a file or data structure tracking processed PR numbers
- Create a simple tracking mechanism (e.g., a JSON file with processed PR numbers)
3. **Filter out already-processed PRs** to get the list of PRs that need changeset files

### Phase 2: Generate Changeset Files

For each unprocessed merged PR:

1. **Analyze the Pull Request**: Review the PR title and body to understand what has been modified
2. **Use the repository name as the package identifier** (gh-aw)
3. **Determine the Change Type**:
- **major**: Major breaking changes (X.0.0) - Very unlikely, probably should be **minor**
- **minor**: Breaking changes in the CLI (0.X.0) - indicated by "BREAKING CHANGE" or major API changes
- **patch**: Bug fixes, docs, refactoring, internal changes, tooling, new shared workflows (0.0.X)

**Important**: Internal changes, tooling, and documentation are always "patch" level.

4. **Generate the Changeset File**:
- Create file in `.changeset/` directory (already created by pre-step)
4. **Generate ONE Changeset File per PR**:
- Create file in `.changeset/` directory
- Use format from the changeset format reference above
- Filename: `<type>-<short-description>.md` (e.g., `patch-fix-bug.md`)

5. **Commit and Push Changes**:
- Git is already configured by pre-step
- Add and commit the changeset file using git commands:
```bash
git add .changeset/<filename> && git commit -m "Add changeset"
```
- **CRITICAL**: You MUST call the `push_to_pull_request_branch` tool to push your changes:
```javascript
push_to_pull_request_branch({
message: "Add changeset for this pull request"
})
```
- The `branch` parameter is optional - it will automatically detect the current PR branch
- This tool call is REQUIRED for your changes to be pushed to the pull request
- **WARNING**: If you don't call this tool, your changeset file will NOT be pushed and the job will be skipped
- Filename: `<type>-pr-<pr-number>-<short-description>.md` (e.g., `patch-pr-123-fix-bug.md`)
- Include PR number in the changeset description for traceability

5. **Update the cache** to mark this PR as processed:
- Add the PR number to your tracking file in `/tmp/gh-aw/cache-memory/`

### Phase 3: Create Pull Request

After generating all changeset files:

1. **Git operations are already configured** by the pre-step
2. **Stage and commit all changeset files**:
```bash
git add .changeset/*.md
git commit -m "Add changesets for merged PRs"
```

3. **The safe-outputs create-pull-request will automatically**:
- Create a new branch
- Push your changes
- Create a PR with the changeset files
- Use title: `[changeset] Add changesets for merged PRs`

4. **Include in the PR description**:
- List of PRs processed with their numbers and titles
- Summary of changeset files created
- Any notes about the changes

## Guidelines

- **Be Accurate**: Analyze the PR content carefully to determine the correct change type
- **Be Clear**: The changeset description should clearly explain what changed
- **Be Accurate**: Analyze each PR content carefully to determine the correct change type
- **Be Clear**: Each changeset description should clearly explain what changed
- **Be Concise**: Keep descriptions brief but informative
- **Follow Conventions**: Use the exact changeset format specified above
- **Single Package Default**: If unsure about package structure, default to "gh-aw"
- **Smart Naming**: Use descriptive filenames that indicate the change (e.g., `patch-fix-rendering-bug.md`)
- **Single Package Default**: Always use "gh-aw" as the package identifier
- **Track Progress**: Always update the cache after processing each PR
- **One File Per PR**: Each PR gets exactly one changeset file
- **Smart Naming**: Include PR number in filename for easy tracking (e.g., `patch-pr-456-update-docs.md`)

## Example Changeset File

```markdown
---
"gh-aw": patch
---

Fixed rendering bug in console output (PR #456)
```

## Important Notes

- The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json`
- Use the cache to avoid processing the same PR twice
- Process all unprocessed PRs in a single workflow run
- If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow markdown content contains a complete duplication of the agent instructions. Lines 74-202 are nearly identical to lines 203-298, with only minor variations in wording. This duplication significantly increases the token count sent to the AI agent and could cause confusion.

Recommendation: Remove the duplicate section (either lines 74-202 or 203-298) and keep only one consolidated version of the instructions.

Copilot uses AI. Check for mistakes.
Comment on lines +1417 to +1512
3. **Filter out already-processed PRs** to get the list of PRs that need changeset files

### Phase 2: Generate Changeset Files

For each unprocessed merged PR:

1. **Analyze the Pull Request**: Review the PR title and body to understand what has been modified
2. **Use the repository name as the package identifier** (gh-aw)
3. **Determine the Change Type**:
- **major**: Major breaking changes (X.0.0) - Very unlikely, probably should be **minor**
- **minor**: Breaking changes in the CLI (0.X.0) - indicated by "BREAKING CHANGE" or major API changes
- **patch**: Bug fixes, docs, refactoring, internal changes, tooling, new shared workflows (0.0.X)

**Important**: Internal changes, tooling, and documentation are always "patch" level.

4. **Generate the Changeset File**:
- Create file in `.changeset/` directory (already created by pre-step)
- Use format from the changeset format reference above
- Filename: `<type>-<short-description>.md` (e.g., `patch-fix-bug.md`)

5. **Commit and Push Changes**:
- Git is already configured by pre-step
- Add and commit the changeset file using git commands:
```bash
git add .changeset/<filename> && git commit -m "Add changeset"
```
- **CRITICAL**: You MUST call the `push_to_pull_request_branch` tool to push your changes:
```javascript
push_to_pull_request_branch({
message: "Add changeset for this pull request"
})
```
- The `branch` parameter is optional - it will automatically detect the current PR branch
- This tool call is REQUIRED for your changes to be pushed to the pull request
- **WARNING**: If you don't call this tool, your changeset file will NOT be pushed and the job will be skipped
4. **Generate ONE Changeset File per PR**:
- Create file in `.changeset/` directory
- Use format from the changeset format reference above
- Filename: `<type>-pr-<pr-number>-<short-description>.md` (e.g., `patch-pr-123-fix-bug.md`)
- Include PR number in the changeset description for traceability

5. **Update the cache** to mark this PR as processed:
- Add the PR number to your tracking file in `/tmp/gh-aw/cache-memory/`

### Phase 3: Create Pull Request

After generating all changeset files:

1. **Git operations are already configured** by the pre-step
2. **Stage and commit all changeset files**:
```bash
git add .changeset/*.md
git commit -m "Add changesets for merged PRs"
```

3. **The safe-outputs create-pull-request will automatically**:
- Create a new branch
- Push your changes
- Create a PR with the changeset files
- Use title: `[changeset] Add changesets for merged PRs`

4. **Include in the PR description**:
- List of PRs processed with their numbers and titles
- Summary of changeset files created
- Any notes about the changes

## Guidelines

- **Be Accurate**: Analyze each PR content carefully to determine the correct change type
- **Be Clear**: Each changeset description should clearly explain what changed
- **Be Concise**: Keep descriptions brief but informative
- **Follow Conventions**: Use the exact changeset format specified above
- **Single Package Default**: Always use "gh-aw" as the package identifier
- **Track Progress**: Always update the cache after processing each PR
- **One File Per PR**: Each PR gets exactly one changeset file
- **Smart Naming**: Include PR number in filename for easy tracking (e.g., `patch-pr-456-update-docs.md`)

## Example Changeset File

```markdown
---
"gh-aw": patch
---

Fixed rendering bug in console output (PR #456)
```

## Guidelines
## Important Notes

- **Be Accurate**: Analyze the PR content carefully to determine the correct change type
- **Be Clear**: The changeset description should clearly explain what changed
- **Be Concise**: Keep descriptions brief but informative
- **Follow Conventions**: Use the exact changeset format specified above
- **Single Package Default**: If unsure about package structure, default to "gh-aw"
- **Smart Naming**: Use descriptive filenames that indicate the change (e.g., `patch-fix-rendering-bug.md`)
- The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json`
- Use the cache to avoid processing the same PR twice
- Process all unprocessed PRs in a single workflow run
- If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow markdown content is duplicated in the compiled workflow. Lines 1288-1415 are nearly identical to lines 1417-1512. This duplication wastes token budget when the prompt is sent to the AI agent and could lead to conflicting instructions.

Recommendation: Remove the duplicate section in the source markdown file to prevent it from appearing twice in the compiled workflow.

See below for a potential fix:

# (Lines 1417-1474 removed; no replacement needed. The workflow continues with the next step.)

Copilot uses AI. Check for mistakes.
@@ -54,31 +51,15 @@ permissions:
pull-requests: read

concurrency:
group: "gh-aw-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}"
cancel-in-progress: true
group: "gh-aw-${{ github.workflow }}"
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The concurrency group has changed from gh-aw-${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} to gh-aw-${{ github.workflow }}. This means only one instance of the entire workflow can run at a time, rather than allowing concurrent runs for different PRs.

While this may be intentional for a scheduled workflow that processes all PRs in batch, the cancel-in-progress: true setting was also removed. This could lead to multiple scheduled runs queuing up if a previous run takes longer than 2 hours.

Recommendation: Consider adding cancel-in-progress: true to prevent queue buildup, or document why queuing multiple runs is acceptable.

Suggested change
group: "gh-aw-${{ github.workflow }}"
group: "gh-aw-${{ github.workflow }}"
cancel-in-progress: true

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +132
- Filename: `<type>-pr-<pr-number>-<short-slug>.md` (e.g., `patch-pr-123-fix-bug.md`)
- Use lowercase and hyphens for the slug
- Keep the slug short (2-4 words max)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The filename pattern instruction includes <short-slug> but then the example shows patch-pr-123-fix-bug.md where "fix-bug" is the slug. However, line 130 says "Use format from the changeset format reference above" but doesn't specify what the slug should be derived from.

Recommendation: Clarify that the slug should be derived from the PR title (as mentioned in line 175: "Use the PR title") and provide clearer guidance:

Suggested change
- Filename: `<type>-pr-<pr-number>-<short-slug>.md` (e.g., `patch-pr-123-fix-bug.md`)
- Use lowercase and hyphens for the slug
- Keep the slug short (2-4 words max)
- Filename: `<type>-pr-<pr-number>-<short-slug>.md`
- **Derive `<short-slug>` from the PR title**:
- Take the PR title, convert to lowercase
- Replace spaces and punctuation with hyphens
- Use 2-4 words max for brevity
- Example: PR #123 "Fix rendering bug" → `patch-pr-123-fix-rendering-bug.md`

Copilot uses AI. Check for mistakes.
- The PR data is already fetched - it's in `/tmp/gh-aw/pr-data/recent-merged-prs.json`
- Use the cache to avoid processing the same PR twice
- Process all unprocessed PRs in a single workflow run
- If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Important Notes" section (line 295) states "If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)" but there is no conditional check in the workflow that would prevent the job from running. The workflow relies on the agent to perform early exits, not on workflow-level conditionals.

This statement is misleading and inconsistent with lines 97-100 and 110-113 which correctly describe early exit behavior within the agent execution.

Recommendation: Update line 298 to match the accurate description: "If there are no unprocessed PRs, just exit without doing anything - this is normal and expected"

Suggested change
- If there are no unprocessed PRs, simply do nothing (the job won't run due to conditional)
- If there are no unprocessed PRs, just exit without doing anything - this is normal and expected

Copilot uses AI. Check for mistakes.
(needs.pre_activation.outputs.activated == 'true') && (((github.event.pull_request.base.ref == github.event.repository.default_branch) &&
((github.event_name != 'pull_request') || (github.event.pull_request.head.repo.id == github.repository_id))) &&
((github.event_name != 'pull_request') || ((github.event.action != 'labeled') || (github.event.label.name == 'changeset' ||
github.event.label.name == 'smoke'))))
runs-on: ubuntu-slim
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The activation job no longer has an if condition to control when it runs. Previously it checked needs.pre_activation.outputs.activated == 'true' and various event conditions. Now the job always runs on schedule, which may be intentional but could lead to unnecessary executions.

Recommendation: Consider adding a condition like if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' to explicitly control when the activation job runs, or document why the unconditional execution is desired.

Suggested change
runs-on: ubuntu-slim
runs-on: ubuntu-slim
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'

Copilot uses AI. Check for mistakes.
2. **Stage and commit all changeset files**:
```bash
git add .changeset/*.md
git add /tmp/gh-aw/cache-memory/processed-prs.json
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction to add /tmp/gh-aw/cache-memory/processed-prs.json to the git commit (line 148) is incorrect. The cache-memory directory is outside the repository working directory and should not be committed to the PR. Only the changeset files in .changeset/ should be committed.

The cache is persisted via GitHub Actions cache mechanism, not via git commits.

Recommendation: Remove line 148 from the instructions. The correct git operations should be:

git add .changeset/*.md
git commit -m "Add changesets for merged PRs"
Suggested change
git add /tmp/gh-aw/cache-memory/processed-prs.json

Copilot uses AI. Check for mistakes.

**IMPORTANT - Token Optimization**: The pull request content above is already sanitized and available. DO NOT use `pull_request_read` or similar GitHub API tools to fetch PR details - you already have everything you need in the context above. Using API tools wastes 40k+ tokens per call.
4. **Filter out already-processed PRs** to get the list of PRs that need changeset files
- Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select([.number] | inside($processed) | not)]' /tmp/gh-aw/pr-data/recent-merged-prs.json`
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jq filter for removing already-processed PRs uses the inside function incorrectly. The expression [.number] | inside($processed) checks if the array [.number] exists inside the processed array, but this is backwards.

Current (incorrect): [.number] | inside($processed) - checks if single-element array is inside processed
Should be: .number as $n | $processed | contains([$n]) | not - checks if processed contains the number

Alternatively, use: select(.number as $n | ($processed | index($n)) == null)

Example of the bug:

  • If processed = [123, 456] and .number = 123
  • [123] | inside([123, 456]) returns false (because the array [123] is not a subset)
  • The correct check should return true (because 123 is in the processed list)
Suggested change
- Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select([.number] | inside($processed) | not)]' /tmp/gh-aw/pr-data/recent-merged-prs.json`
- Use jq to filter: `jq --argjson processed "$(cat /tmp/gh-aw/cache-memory/processed-prs.json)" '[.[] | select(.number as $n | ($processed | index($n)) == null)]' /tmp/gh-aw/pr-data/recent-merged-prs.json`

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants