Skip to content
Browse files

Break combined diffs into hunks

When we display combined diffs, we should isolate the regions of change
as hunks, just as we do for normal diffs. For example, given the
following versions of a file from the base of a merge, the HEAD commit,
the target commit, and the merge commit:

      Base          Ours          Theirs        Merged
      alfa          alfa          echo          echo
      bravo         bravo         bravo         bravo
      charlie       delta         charlie       delta

The diff of the Merged version against Ours and Theirs respectively
would be displayed, with hunk markers, as:

                Ours                Theirs
                @@ -1,3 +1,4 @@     @@ -1,3 +1,4 @@
                -alfa                echo
                +echo                bravo
                 bravo              -charlie
                 delta              +delta
                +foxtrot            +foxtrot

When displaying these as a combined diff, the two diffs are aligned on
their common content in the post-image (the Merged version), and the
hunk header should display the offsets in each pre-image (the Ours and
Theirs versions) and the post-image.

    @@@ -1,3 -1,3 +1,4 @@@
    - alfa
    + echo

To see how to do this, we can look at the types of the underlying data
structures. I'm going to use a Haskell-like notation here since Ruby
does not have static types. To start with, the `Diff.diff` function
takes two strings, which it breaks into lines, and then returns a list
of edits.

    Diff.diff :: String -> String -> [Edit]

The `Edit` and `Line` structures have the types:

    Line = { number :: Int, text :: String }
    Edit = { type :: Symbol, a_line :: Line, b_line :: Line }

`a_line` is a `Line` from the pre-image, and `b_line` from the
post-image. The notation `[t]` means a list of values of type `t`. I am
ignoring that some of these values can be null because it only adds
noise to the argument, and the actual constraints on the values are more
complex e.g. `a_line` and `b_line` cannot both be null.

The `Diff.combined` function takes a list of diffs (which are lists of
`Edit` values) and returns a list of `Row` values, where:

    Diff.combined :: [[Edit]] -> [Row]

    Row = { edits :: [Edit] }

(Again, the list of `edits` in a `Row` can contain nulls, but this is
not important here. The list will always contain either exactly one
non-null value, or it will contain no nulls.)

Now, the existing `Hunk.filter` function takes a list of edits and
returns a list of hunks, that is:

    Hunk.filter :: [Edit] -> [Hunk]

    Hunk = { a_start :: Int, b_start :: Int, edits :: [Edit] }

We'd like to amend `Hunk.filter` so it can take a list of `Row` values,
and return hunks containing the same, that is:

    Hunk.filter :: [Row] -> [Hunk]

    Hunk = { a_start :: Int, b_start :: Int, edits :: [Row] }

The question is, can we abstract over the `Edit` and `Row` types to
produce something with a consistent interface, which we've called

    Hunk.filter :: (Hunkable t) => [t] -> [Hunk t]

    Hunk t = { a_start :: Int, b_start :: Int, edits :: [t] }

Let's remind ourselves of those types:

    Line = { number :: Int, text :: String }
    Edit = { type :: Symbol, a_line :: Line, b_line :: Line }

    Row = { edits :: [Edit] }

`Hunk.filter` relies on the fact that the items in the list respond to
`type`, so it can determine regions of change that should be chunked
together. The functions for printing diffs also want this information to
decide which colour to use. `Hunk.filter` also relies on the `a_line`
and `b_line` properties of edits, and the `number` of those lines, to
calculate the header offsets. So in short, `Hunk.filter` relies on this

    class Hunkable t where
      type   :: t -> Symbol
      a_line :: t -> Line
      b_line :: t -> Line

But, notice this difference between the normal and combined hunk

    normal:     @@ -1,3 +1,4 @@

    combined:   @@@ -1,3 -1,3 +1,4 @@@

A normal hunk header has only one `-` offset, which comes from the
`a_line` of its edits. Combined hunks have multiple `-` offsets -- one
from each `a_line` in the edits of the combined rows. Now, having a
single `a_line` (or none) is just a special case of having many
`a_lines`; a normal diff against a single pre-image is a special case of
a combined diff against multiple pre-images. So we can adjust our
interface like so:

    class Hunkable t where
      type    :: t -> Symbol
      a_lines :: t -> [Line]
      b_line  :: t -> Line

We're left with the question of whether `a_lines` can be defined for
edits, and whether the entire interface can be defined for rows. The
first is straightforward: `a_lines` for an `Edit` is just a list
containing its `a_line`.

    instance Hunkable Edit where
      a_lines edit = [a_line edit]

Defining the other functions for `Row` are also fairly simple. Its
`type` can be defined so that if it contains any deletions, it's a
deletion, and similarly for insertions.

    instance Hunkable Row where
      type row = selectType (map type (edits row))
          selectType types
            | elem :del types = :del
            | elem :ins types = :ins
            | otherwise       = :eql

The `a_lines` for a `Row` are just the `a_line` for each of its edits:

    instance Hunkable Row where
      -- ...
      a_lines row = map a_line (edits row)

And the `b_line` is the `b_line` of the first non-null `Edit`, since all
the edits in a `Row` have the same `b_line`.

    instance Hunkable Row where
      -- ...
      b_line row = b_line (head (edits row))

This commit adds the necessary methods to `Diff::Edit` and
`Diff::Combined::Row` to accomplish this abstraction, and adjusts the
`Diff::Hunk` class so that it can work with an arbitrary number of
pre-images. This continues to work for normal diffs since they're a
special case of this more general behaviour.
  • Loading branch information...
jcoglan committed Jul 28, 2018
1 parent 97822cb commit 74398616e646f98c18e7990407d5c37f79149a8d
Showing with 35 additions and 11 deletions.
  1. +8 −0 lib/diff.rb
  2. +13 −0 lib/diff/combined.rb
  3. +14 −11 lib/diff/hunk.rb
@@ -12,6 +12,10 @@ module Diff
Line =, :text)

Edit =, :a_line, :b_line) do
def a_lines

def to_s
line = a_line || b_line
SYMBOLS.fetch(type) + line.text
@@ -35,4 +39,8 @@ def self.combined(as, b)
diffs = { |a| diff(a, b) }

def self.combined_hunks(as, b)
Hunk.filter(combined(as, b))
@@ -4,6 +4,19 @@ class Combined
include Enumerable

Row = do
def type
types =
types.include?(:ins) ? :ins : types.first

def a_lines { |edit| edit&.a_line }

def b_line

def to_s
symbols = { |edit| SYMBOLS.fetch(edit&.type, " ") }

@@ -2,7 +2,7 @@ module Diff


Hunk =, :b_start, :edits) do
Hunk =, :b_start, :edits) do
def self.filter(edits)
hunks = []
offset = 0
@@ -13,10 +13,10 @@ def self.filter(edits)

offset -= HUNK_CONTEXT + 1

a_start = (offset < 0) ? 0 : edits[offset].a_line.number
b_start = (offset < 0) ? 0 : edits[offset].b_line.number
a_starts = (offset < 0) ? [] : edits[offset]
b_start = (offset < 0) ? nil : edits[offset].b_line.number

hunks.push(, b_start, []))
hunks.push(, b_start, []))
offset = build_hunk(hunks.last, edits, offset)
@@ -42,19 +42,22 @@ def self.build_hunk(hunk, edits, offset)

def header
a_offset = offsets_for(:a_line, a_start).join(",")
b_offset = offsets_for(:b_line, b_start).join(",")
a_lines =
offsets = { |lines, i| format("-", lines, a_starts[i]) }

"@@ -#{ a_offset } +#{ b_offset } @@"
offsets.push(format("+",, b_start))
sep = "@" * offsets.size

[sep, *offsets, sep].join(" ")


def offsets_for(line_type, default)
lines =
start = lines.first&.number || default
def format(sign, lines, start)
lines = lines.compact
start = lines.first&.number || start || 0

[start, lines.size]
"#{ sign }#{ start },#{ lines.size }"

0 comments on commit 7439861

Please sign in to comment.
You can’t perform that action at this time.