PR review currently processes diffs per hunk — the raw unified-diff chunks produced by git. Hunks are syntactic boundaries, not semantic ones, so a single logical change (e.g. renaming a parameter, moving a function, introducing a new abstraction) can span several hunks across files while unrelated changes can land in the same hunk.
Grouping review context by semantic change instead of per-hunk could improve intent recognition and produce more useful review feedback.
Research areas
- Semantic chunking strategies — how to cluster diff hunks into logical change units (AST-level analysis, symbol-based grouping, commit-message/description cross-referencing, embedding similarity)
- Tradeoffs vs. per-hunk — cases where semantic grouping helps (renames, refactors, multi-file features) and where it might hurt (unrelated co-located edits, large PRs)
- Implementation approaches — lightweight heuristics (e.g. shared symbol references) vs. heavier analysis (tree-sitter / AST diffing, LLM-based clustering)
- Prior art — existing tools or papers on semantic diff grouping (e.g. GumTree, Semantic Diff, difftastic for structural diffs)
Goal
Determine whether semantic chunking meaningfully improves review quality and, if so, propose an approach suitable for integration into Warden's review pipeline.
Action taken on behalf of David Cramer.
PR review currently processes diffs per hunk — the raw unified-diff chunks produced by git. Hunks are syntactic boundaries, not semantic ones, so a single logical change (e.g. renaming a parameter, moving a function, introducing a new abstraction) can span several hunks across files while unrelated changes can land in the same hunk.
Grouping review context by semantic change instead of per-hunk could improve intent recognition and produce more useful review feedback.
Research areas
Goal
Determine whether semantic chunking meaningfully improves review quality and, if so, propose an approach suitable for integration into Warden's review pipeline.
Action taken on behalf of David Cramer.