-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
Description
Problem
When AI models review diffs, line-based unified diffs can be noisy and token-inefficient. Common scenarios where this hurts:
- JSON/YAML reformatting: A single value change plus auto-formatting creates a huge diff
- Config file updates: Version bumps or reordering keys produce misleading diffs
- CSV/data files: Row shifts make line-based diffs nearly unreadable
Models struggle to identify the actual change amid formatting noise, wasting context tokens and reducing comprehension accuracy.
Proposed Solution
Add a new tool compare_file_contents that:
- Takes two refs (base and head) plus a file path
- For supported formats (JSON, YAML, CSV, TOML), produces a semantic diff showing only value changes
- For unsupported formats, falls back to unified diff
- Always shows the format used and whether fallback was applied
Example: Semantic vs Line-based
Line-based diff (noisy):
-{"users":[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]}
+{
+ "users": [
+ {"id": 1, "name": "Alice"},
+ {"id": 2, "name": "Bobby"}
+ ]
+}Semantic diff (clear):
users[1].name: "Bob" → "Bobby"
Tool Signature
compare_file_contents(
owner: string,
repo: string,
path: string,
base: string, // commit SHA, branch, or tag
head: string, // commit SHA, branch, or tag
)
Use Cases
- Change verification: Model edits a file, uses this tool to confirm only intended changes were made
- PR review: Quickly understand what actually changed in config/data files
- Debugging: Compare file across commits without formatting noise
Implementation Notes
- Start behind a feature flag
- Semantic diff enabled by default for supported formats (no opt-out needed initially)
- Pure Go implementation using standard library JSON + yaml.v3
- Supported formats to start: JSON, YAML
- Future: CSV, TOML, other structured formats
Why This Helps Models
- Fewer tokens = more room for reasoning
- Unambiguous output = clearer before/after semantics
- Path notation (e.g.,
users[1].name) is already familiar to models - Self-verification = models can check their own edits efficiently
Reactions are currently unavailable