Research: semantic change chunking for PR review

PR review currently processes diffs per hunk — the raw unified-diff chunks produced by git. Hunks are syntactic boundaries, not semantic ones, so a single logical change (e.g. renaming a parameter, moving a function, introducing a new abstraction) can span several hunks across files while unrelated changes can land in the same hunk.

Grouping review context by **semantic change** instead of per-hunk could improve intent recognition and produce more useful review feedback.

## Research areas

- **Semantic chunking strategies** — how to cluster diff hunks into logical change units (AST-level analysis, symbol-based grouping, commit-message/description cross-referencing, embedding similarity)
- **Tradeoffs vs. per-hunk** — cases where semantic grouping helps (renames, refactors, multi-file features) and where it might hurt (unrelated co-located edits, large PRs)
- **Implementation approaches** — lightweight heuristics (e.g. shared symbol references) vs. heavier analysis (tree-sitter / AST diffing, LLM-based clustering)
- **Prior art** — existing tools or papers on semantic diff grouping (e.g. GumTree, Semantic Diff, difftastic for structural diffs)

## Goal

Determine whether semantic chunking meaningfully improves review quality and, if so, propose an approach suitable for integration into Warden's review pipeline.

Action taken on behalf of David Cramer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Research: semantic change chunking for PR review #313

Research areas

Goal

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Research: semantic change chunking for PR review #313

Description

Research areas

Goal

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions