Skip to content

feat: instruction-aware generation — complement existing AGENTS.md, CLAUDE.md, and .instructions.md files#17

Merged
digitarald merged 2 commits intomicrosoft:mainfrom
danielmeppiel:feat/instruction-aware-generation
Feb 26, 2026
Merged

feat: instruction-aware generation — complement existing AGENTS.md, CLAUDE.md, and .instructions.md files#17
digitarald merged 2 commits intomicrosoft:mainfrom
danielmeppiel:feat/instruction-aware-generation

Conversation

@danielmeppiel
Copy link
Contributor

TL;DR

Metric Value
Prompt injection cost 167 tokens — lists existing instruction files + 3 output rules
Generated output savings 307 fewer tokens in copilot-instructions.md (921 → 614, -33.3%)
Formats detected AGENTS.md, CLAUDE.md (anywhere in tree), .instructions.md (in .github/instructions/)
No-instruction-files impact Zero — prompt section is empty when no instruction files exist
Tests 13 new, 470 total pass (24 files)
Files changed 2 files

Problem

Relates to #6"current instruction set is not considered when generating instructions".

agentrc instructions generates .github/copilot-instructions.md from scratch without considering what instruction files already exist in the repo. When a repo already has AGENTS.md files (from tools like APM or hand-authored), CLAUDE.md files (for Claude Code), or modular .github/instructions/*.instructions.md files, the generated output restates content already delivered by those files — wasting context window budget.

How the duplication happens

  1. A repo has existing instruction files — AGENTS.md (via the agents.md standard), CLAUDE.md (for Claude Code), or .github/instructions/*.instructions.md (VS Code Copilot's native modular format). These may be hand-authored, generated by tools like APM, or a mix.

  2. agentrc instructions explores the codebase, finds the same conventions those files already describe, and restates them in copilot-instructions.md.

  3. The LLM that later reads the repo sees duplicate content — once from the existing instruction files and again from copilot-instructions.md.

Solution

This PR teaches agentrc instructions to detect existing instruction files in the repo and steer the model to generate complementary content instead of duplicating what those files already deliver.

What it detects

File Location Consumer
AGENTS.md Anywhere in tree (hierarchical) GitHub Copilot, Cursor, Codex, Gemini
CLAUDE.md Anywhere in tree (hierarchical) Claude Code
*.instructions.md .github/instructions/ VS Code Copilot (native)

The walker excludes .git, node_modules, apm_modules, and .apm directories. Symlinks are skipped for safety.

How it works

When instruction files are found, a context section is appended to the generation prompt listing every detected file path and 3 output rules that steer the model to defer rather than restate.

System message (adds 4 words when instruction files exist):

"…generate a concise .github/copilot-instructions.md that complements existing instruction files."

Prompt section (appended — 167 tokens for test repo with 9 instruction files):

## Existing Instruction Files
This repo already contains instruction files that AI agents load automatically:
- `AGENTS.md`
- `backend/api/AGENTS.md`
- `docs/AGENTS.md`
- `scripts/deployment/AGENTS.md`
- `tests/AGENTS.md`
- `backend/api/CLAUDE.md`
- `docs/CLAUDE.md`
- `scripts/deployment/CLAUDE.md`
- `tests/CLAUDE.md`

### Output rules
- Content in the above files is already loaded by AI agents — do not restate it.
- For topics covered by existing files, use a single markdown link (e.g., `See [AGENTS.md](AGENTS.md)`).
- Focus only on project-specific conventions not already covered by the above files.

Reproducible test

Test target: danielmeppiel/corporate-website — a repo with 5 AGENTS.md and 4 CLAUDE.md files.

# 1. Clone and build this branch
git clone -b feat/instruction-aware-generation https://github.com/danielmeppiel/agentrc.git
cd agentrc && npm install && npm run build

# 2. Clone test repo
git clone https://github.com/danielmeppiel/corporate-website ~/Repos/corporate-website

# 3. Generate with this branch
node dist/index.js generate instructions ~/Repos/corporate-website --force

# 4. Compare baseline (main branch)
git checkout main && npm run build
node dist/index.js generate instructions ~/Repos/corporate-website --force

# 5. Measure tokens
cat ~/Repos/corporate-website/.github/copilot-instructions.md | \
  python3 -c "import sys,tiktoken; t=tiktoken.encoding_for_model('gpt-4o').encode(sys.stdin.read()); print(f'{len(t)} tokens')"

Comparison

Baseline (main) This branch
Output tokens (tiktoken gpt-4o) 921 614 (-307 tokens, -33.3%)
Prompt injection cost 0 167 tokens
Net token savings 140 fewer tokens across prompt + output
Restated content Form patterns, design system, styling rules, React conventions — all duplicating existing instruction files Defers via See [AGENTS.md] link
Project-specific rules Mixed with restated content Clean section: module system, build output, hybrid stack

Design decisions

  1. Format-agnostic — Detects all three major instruction file formats (AGENTS.md, CLAUDE.md, .instructions.md) regardless of how they were created. No dependency on any specific tool.

  2. No prompt when empty — If no instruction files exist, buildExistingInstructionsSection() returns "". Zero overhead for repos without existing instructions.

  3. Output rules steer writing, not reading — The prompt doesn't tell the model what to avoid reading. It tells the model what to write: defer to existing files, use markdown links as pointers, focus on project-specific conventions not already covered.

  4. Walker safety — Uses readdir({ withFileTypes: true }) to avoid extra stat() calls and skips symlinks to prevent infinite loops.


What's NOT in this PR

  • No changes to agentrc analyze, agentrc eval, or any other command
  • No external dependencies — uses only fs/promises (Node built-in)
  • No breaking changes for repos without existing instruction files

…LAUDE.md, and .instructions.md files

Detect existing AI instruction files (AGENTS.md, CLAUDE.md, .instructions.md) before
generating copilot-instructions.md and steer the model to complement rather than
duplicate their content.

- detectExistingInstructions() walks the repo tree for AGENTS.md and CLAUDE.md,
  scans .github/instructions/ for modular .instructions.md files
- buildExistingInstructionsSection() emits prompt context listing found files
  with output rules that prevent content duplication
- 13 new tests covering detection and prompt section generation

Relates to microsoft#6
Copilot AI review requested due to automatic review settings February 26, 2026 16:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements instruction-aware generation to address issue #6. It teaches the agentrc instructions command to detect existing instruction files (AGENTS.md, CLAUDE.md, and .instructions.md) in a repository and generate complementary content instead of duplicating what those files already provide. The feature adds 167 tokens to the generation prompt when instruction files exist but saves 307 tokens in output for the test repository, resulting in a net savings of 140 tokens.

Changes:

  • Added detection functions to find existing AGENTS.md, CLAUDE.md, and .instructions.md files across the repository
  • Modified both generateCopilotInstructions and generateAreaInstructions to use detected instruction context when building prompts
  • Added comprehensive test coverage with 13 new test cases covering empty repos, nested files, multiple formats, and edge cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/services/instructions.ts Implements detection of existing instruction files and integrates instruction-aware prompt generation into both copilot-instructions and area-specific instruction generation workflows
src/services/tests/instructions.test.ts Adds 13 test cases covering detectExistingInstructions and buildExistingInstructionsSection functions with comprehensive edge case coverage

- Add explicit symlink filter in findModularInstructionFiles for
  consistency with findInstructionMarkerFiles
- Fix missing newline before final bullet in area instruction prompt
- Add test verifying symlinked AGENTS.md/CLAUDE.md are excluded
@digitarald digitarald merged commit 11e0eaa into microsoft:main Feb 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants