feat: instruction-aware generation — complement existing AGENTS.md, CLAUDE.md, and .instructions.md files by danielmeppiel · Pull Request #17 · microsoft/agentrc

danielmeppiel · 2026-02-26T16:15:45Z

TL;DR

Metric	Value
Prompt injection cost	167 tokens — lists existing instruction files + 3 output rules
Generated output savings	307 fewer tokens in copilot-instructions.md (921 → 614, -33.3%)
Formats detected	AGENTS.md, CLAUDE.md (anywhere in tree), .instructions.md (in `.github/instructions/`)
No-instruction-files impact	Zero — prompt section is empty when no instruction files exist
Tests	13 new, 470 total pass (24 files)
Files changed	2 files

Problem

Relates to #6 — "current instruction set is not considered when generating instructions".

agentrc instructions generates .github/copilot-instructions.md from scratch without considering what instruction files already exist in the repo. When a repo already has AGENTS.md files (from tools like APM or hand-authored), CLAUDE.md files (for Claude Code), or modular .github/instructions/*.instructions.md files, the generated output restates content already delivered by those files — wasting context window budget.

How the duplication happens

A repo has existing instruction files — AGENTS.md (via the agents.md standard), CLAUDE.md (for Claude Code), or .github/instructions/*.instructions.md (VS Code Copilot's native modular format). These may be hand-authored, generated by tools like APM, or a mix.
agentrc instructions explores the codebase, finds the same conventions those files already describe, and restates them in copilot-instructions.md.
The LLM that later reads the repo sees duplicate content — once from the existing instruction files and again from copilot-instructions.md.

Solution

This PR teaches agentrc instructions to detect existing instruction files in the repo and steer the model to generate complementary content instead of duplicating what those files already deliver.

What it detects

File	Location	Consumer
`AGENTS.md`	Anywhere in tree (hierarchical)	GitHub Copilot, Cursor, Codex, Gemini
`CLAUDE.md`	Anywhere in tree (hierarchical)	Claude Code
`*.instructions.md`	`.github/instructions/`	VS Code Copilot (native)

The walker excludes .git, node_modules, apm_modules, and .apm directories. Symlinks are skipped for safety.

How it works

When instruction files are found, a context section is appended to the generation prompt listing every detected file path and 3 output rules that steer the model to defer rather than restate.

System message (adds 4 words when instruction files exist):

"…generate a concise .github/copilot-instructions.md that complements existing instruction files."

Prompt section (appended — 167 tokens for test repo with 9 instruction files):

## Existing Instruction Files
This repo already contains instruction files that AI agents load automatically:
- `AGENTS.md`
- `backend/api/AGENTS.md`
- `docs/AGENTS.md`
- `scripts/deployment/AGENTS.md`
- `tests/AGENTS.md`
- `backend/api/CLAUDE.md`
- `docs/CLAUDE.md`
- `scripts/deployment/CLAUDE.md`
- `tests/CLAUDE.md`

### Output rules
- Content in the above files is already loaded by AI agents — do not restate it.
- For topics covered by existing files, use a single markdown link (e.g., `See [AGENTS.md](AGENTS.md)`).
- Focus only on project-specific conventions not already covered by the above files.

Reproducible test

Test target: danielmeppiel/corporate-website — a repo with 5 AGENTS.md and 4 CLAUDE.md files.

# 1. Clone and build this branch
git clone -b feat/instruction-aware-generation https://github.com/danielmeppiel/agentrc.git
cd agentrc && npm install && npm run build

# 2. Clone test repo
git clone https://github.com/danielmeppiel/corporate-website ~/Repos/corporate-website

# 3. Generate with this branch
node dist/index.js generate instructions ~/Repos/corporate-website --force

# 4. Compare baseline (main branch)
git checkout main && npm run build
node dist/index.js generate instructions ~/Repos/corporate-website --force

# 5. Measure tokens
cat ~/Repos/corporate-website/.github/copilot-instructions.md | \
  python3 -c "import sys,tiktoken; t=tiktoken.encoding_for_model('gpt-4o').encode(sys.stdin.read()); print(f'{len(t)} tokens')"

Comparison

	Baseline (`main`)	This branch
Output tokens (tiktoken gpt-4o)	921	614 (-307 tokens, -33.3%)
Prompt injection cost	0	167 tokens
Net token savings	—	140 fewer tokens across prompt + output
Restated content	Form patterns, design system, styling rules, React conventions — all duplicating existing instruction files	Defers via `See [AGENTS.md]` link
Project-specific rules	Mixed with restated content	Clean section: module system, build output, hybrid stack

Design decisions

Format-agnostic — Detects all three major instruction file formats (AGENTS.md, CLAUDE.md, .instructions.md) regardless of how they were created. No dependency on any specific tool.
No prompt when empty — If no instruction files exist, buildExistingInstructionsSection() returns "". Zero overhead for repos without existing instructions.
Output rules steer writing, not reading — The prompt doesn't tell the model what to avoid reading. It tells the model what to write: defer to existing files, use markdown links as pointers, focus on project-specific conventions not already covered.
Walker safety — Uses readdir({ withFileTypes: true }) to avoid extra stat() calls and skips symlinks to prevent infinite loops.

What's NOT in this PR

No changes to agentrc analyze, agentrc eval, or any other command
No external dependencies — uses only fs/promises (Node built-in)
No breaking changes for repos without existing instruction files

…LAUDE.md, and .instructions.md files Detect existing AI instruction files (AGENTS.md, CLAUDE.md, .instructions.md) before generating copilot-instructions.md and steer the model to complement rather than duplicate their content. - detectExistingInstructions() walks the repo tree for AGENTS.md and CLAUDE.md, scans .github/instructions/ for modular .instructions.md files - buildExistingInstructionsSection() emits prompt context listing found files with output rules that prevent content duplication - 13 new tests covering detection and prompt section generation Relates to microsoft#6

Copilot

Pull request overview

This PR implements instruction-aware generation to address issue #6. It teaches the agentrc instructions command to detect existing instruction files (AGENTS.md, CLAUDE.md, and .instructions.md) in a repository and generate complementary content instead of duplicating what those files already provide. The feature adds 167 tokens to the generation prompt when instruction files exist but saves 307 tokens in output for the test repository, resulting in a net savings of 140 tokens.

Changes:

Added detection functions to find existing AGENTS.md, CLAUDE.md, and .instructions.md files across the repository
Modified both generateCopilotInstructions and generateAreaInstructions to use detected instruction context when building prompts
Added comprehensive test coverage with 13 new test cases covering empty repos, nested files, multiple formats, and edge cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/services/instructions.ts	Implements detection of existing instruction files and integrates instruction-aware prompt generation into both copilot-instructions and area-specific instruction generation workflows
src/services/tests/instructions.test.ts	Adds 13 test cases covering detectExistingInstructions and buildExistingInstructionsSection functions with comprehensive edge case coverage

src/services/instructions.ts

src/services/__tests__/instructions.test.ts

src/services/instructions.ts

- Add explicit symlink filter in findModularInstructionFiles for consistency with findInstructionMarkerFiles - Fix missing newline before final bullet in area instruction prompt - Add test verifying symlinked AGENTS.md/CLAUDE.md are excluded

danielmeppiel requested review from digitarald and pierceboggan as code owners February 26, 2026 16:15

Copilot AI review requested due to automatic review settings February 26, 2026 16:15

Copilot started reviewing on behalf of danielmeppiel February 26, 2026 16:16 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

src/services/instructions.ts Outdated Show resolved Hide resolved

src/services/__tests__/instructions.test.ts Show resolved Hide resolved

src/services/instructions.ts Outdated Show resolved Hide resolved

digitarald merged commit 11e0eaa into microsoft:main Feb 26, 2026
9 checks passed

danielmeppiel mentioned this pull request Feb 26, 2026

current instruction set is not considered when generating instructions #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: instruction-aware generation — complement existing AGENTS.md, CLAUDE.md, and .instructions.md files#17

feat: instruction-aware generation — complement existing AGENTS.md, CLAUDE.md, and .instructions.md files#17
digitarald merged 2 commits intomicrosoft:mainfrom
danielmeppiel:feat/instruction-aware-generation

danielmeppiel commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danielmeppiel commented Feb 26, 2026

TL;DR

Problem

How the duplication happens

Solution

What it detects

How it works

Reproducible test

Comparison

Design decisions

What's NOT in this PR

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants