feat(queries): expose fileHash in where and query JSON output#257
feat(queries): expose fileHash in where and query JSON output#257carlos-alm merged 7 commits intomainfrom
Conversation
Write symbol-level change events to .codegraph/change-events.ndjson during watch mode. Each line records added/removed/modified symbols with node counts and edge data, enabling external tools to detect rule staleness without polling. File is size-capped at 1 MB with keep-last-half rotation. Impact: 8 functions changed, 4 affected
Impact: 1 functions changed, 1 affected
|
All contributors have signed the CLA ✍️ ✅ |
|
Claude finished @carlos-alm's task —— View job PR Review CompleteReviewing PR #257: feat(watcher): structured NDJSON change journal Todo List:
Root Cause AnalysisProblem Being Solved: This PR adds structured change event logging to complement the existing simple Why This Approach: The NDJSON format enables streaming consumption by external tools, and the symbol diffing provides semantic change tracking beyond file-level changes. This is valuable for impact analysis and change visualization tools. Risk Assessment: LOW RISK - This is purely additive functionality that runs alongside existing journal functionality without replacing it. Configuration Change Detection✅ No problematic configuration changes detected
Test Integrity Assessment✅ Strong test coverage
Code Quality ReviewStrengths
Minor Concerns
Security Assessment✅ No security concerns identified
Technical ImplementationThe integration is well-architected:
The test coverage is exceptional - the 21 unit tests cover:
Final RecommendationThis is a solid, well-tested feature addition that enhances codegraph's change tracking capabilities without breaking existing functionality. The implementation follows project conventions and handles errors gracefully.
Suggested follow-up: Consider adding debug logging for file I/O operations to improve observability in production environments. |
Add fileHash field to queryNameData, whereSymbolImpl, and whereFileImpl return objects by looking up the file_hashes table. This lets consumers (e.g. code-praxis) detect when a rule's target file has changed since the rule was created, enabling staleness detection. Impact: 4 functions changed, 16 affected
|
Claude finished @carlos-alm's task —— View job Claude finished @carlos-alm's task —— View job PR Review CompleteReviewing PR #257: feat(watcher): structured NDJSON change journal Todo List:
Root Cause AnalysisProblem Being Solved: This PR adds structured change event logging to complement the existing simple Why This Approach: The NDJSON format enables streaming consumption by external tools, and the symbol diffing provides semantic change tracking beyond file-level changes. This is valuable for impact analysis and change visualization tools. Risk Assessment: LOW RISK - This is purely additive functionality that runs alongside existing journal functionality without replacing it. Configuration Change Detection✅ No problematic configuration changes detected
Test Integrity Assessment✅ Strong test coverage
Code Quality ReviewStrengths
Minor Concerns
Security Assessment✅ No security concerns identified
Technical ImplementationThe integration is well-architected:
The test coverage is exceptional - the 21 unit tests cover:
Final RecommendationThis is a solid, well-tested feature addition that enhances codegraph's change tracking capabilities without breaking existing functionality. The implementation follows project conventions and handles errors gracefully.
Suggested follow-up: Consider adding debug logging for file I/O operations to improve observability in production environments. |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
|
I have read the CLA Document and I hereby sign the CLA |
Address review feedback: debug log on successful append and rotation, warn when a single oversized line prevents rotation. Impact: 2 functions changed, 2 affected
Impact: 1 functions changed, 1 affected
Greptile SummaryThis PR adds structured NDJSON change journaling to watch mode, enabling symbol-level tracking of file modifications.
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Watcher as watcher.js
participant UpdateFile as updateFile()
participant DB as SQLite
participant Journal as change-journal.js
participant FS as File System
Watcher->>UpdateFile: file change detected
UpdateFile->>DB: listSymbols.all(relPath)
DB-->>UpdateFile: oldSymbols[]
UpdateFile->>DB: deleteEdgesForFile, deleteNodes
UpdateFile->>DB: insert new nodes & edges
UpdateFile->>DB: listSymbols.all(relPath)
DB-->>UpdateFile: newSymbols[]
UpdateFile->>Journal: diffSymbols(oldSymbols, newSymbols)
Journal-->>UpdateFile: symbolDiff {added, removed, modified}
UpdateFile-->>Watcher: return result with symbolDiff
Watcher->>Journal: buildChangeEvent(file, event, symbolDiff, counts)
Journal-->>Watcher: changeEvent object
Watcher->>Journal: appendChangeEvents(rootDir, [changeEvent])
Journal->>FS: appendFileSync(change-events.ndjson)
Journal->>Journal: rotateIfNeeded(filePath, 1MB)
alt file > 1MB
Journal->>FS: writeFileSync (keep last ~half)
end
Last reviewed commit: 5e49ba6 |
|
|
||
| if (stat.size <= maxBytes) return; | ||
|
|
||
| try { | ||
| const content = fs.readFileSync(filePath, 'utf-8'); | ||
| const mid = Math.floor(content.length / 2); | ||
| const newlineIdx = content.indexOf('\n', mid); | ||
| if (newlineIdx === -1) { |
There was a problem hiding this comment.
byte vs character mismatch in rotation logic
stat.size uses bytes but content.length counts characters. For multi-byte UTF-8 (emoji, non-ASCII), the midpoint calculation may not align with the byte-based threshold, keeping more or less than half the file.
Consider using Buffer.byteLength(content) to calculate the midpoint in bytes for consistency with the threshold check
There was a problem hiding this comment.
Fixed in 8b4bf03 — rotation now reads as a raw Buffer and uses buf.indexOf(0x0a, mid) to find the newline at the byte-level midpoint, keeping the threshold check and midpoint calculation both in bytes.
|
Addressed both minor concerns from the automated review in commit a87c5da + 5e49ba6:
|
|
Claude finished @carlos-alm's task —— View job PR Review CompleteReviewing PR #257: feat(queries): expose fileHash in where and query JSON output Todo List:
Root Cause AnalysisProblem Being Solved: This PR adds Why This Approach: Clean and minimal - adds a single helper function Risk Assessment: VERY LOW RISK - This is purely additive functionality that doesn't modify existing behavior, just adds an optional field to JSON output. Configuration Change Detection✅ No configuration changes detected
Test Integrity Assessment✅ Strong test coverage and integrity
Code Quality ReviewStrengths
Minor Considerations
Security Assessment✅ No security concerns
Final RecommendationThis is a well-executed, minimal feature addition that solves a real need for downstream tooling. The implementation is clean, well-tested, and follows established patterns in the codebase. Recent improvements (commits a87c5da, 5e49ba6): Author proactively addressed the automated review feedback by adding debug/warn logging for better observability and fixing biome formatting issues.
|
|
recheck |
1 similar comment
|
recheck |
stat.size returns bytes but String.length counts UTF-16 characters. Read as Buffer and use buf.indexOf(0x0a) to find the newline at the byte-level midpoint, ensuring consistent behavior with multi-byte UTF-8. Impact: 1 functions changed, 1 affected
|
Claude finished @carlos-alm's task —— View job PR Review: feat(queries): expose fileHash in where and query JSON outputTodo List:
Root Cause AnalysisProblem Being Solved: This PR adds Why This Approach: Clean and minimal - adds a single helper function Risk Assessment: VERY LOW RISK - This is purely additive functionality that doesn't modify existing behavior, just adds an optional field to JSON output. Configuration Change Detection✅ No configuration changes detected
Test Integrity Assessment✅ Strong test coverage and integrity
Code Quality ReviewStrengths
Minor Considerations
Security Assessment✅ No security concerns
Final RecommendationThis is a well-executed, minimal feature addition that solves a real need for downstream tooling. The implementation is clean, well-tested, and follows established patterns in the codebase. Recent improvements (commits a87c5da, 5e49ba6): Author proactively addressed the automated review feedback by adding debug/warn logging for better observability and fixing biome formatting issues.
|

Summary
getFileHashhelper insrc/queries.jsthat looks up file content hashes from thefile_hashestablefileHashfield toqueryNameData,whereSymbolImpl(symbol mode), andwhereFileImpl(file mode) return objectsTest plan
file_hashesrows to integration test fixture DBfileHashvalue inqueryNameDatatestfileHashvalue inwhereDatasymbol mode testfileHashvalue inwhereDatafile mode testfileHashappears in live CLI output:where --json,where -f --json,query --json