Skip to content

Empty context lines are silently dropped, causing lost hunks and corrupted parsing #50

@andyfeller

Description

@andyfeller

Empty context lines are silently dropped, causing lost hunks and corrupted parsing

Summary

When a unified diff contains empty context lines (i.e., lines that are completely empty with no leading space), parse-diff silently drops them. This causes hunk line counters to become desynchronized, which can cascade into losing entire hunks and even entire files from the parsed output.

Background

In the unified diff format, context lines (unchanged lines shown for surrounding context) are prefixed with a single space character. However, many tools strip trailing whitespace from diff output, and Git has a configuration option (diff.suppressBlankEmpty = true) that omits the leading space on blank context lines. This produces completely empty lines ("") in the diff output instead of lines containing a single space (" ").

The Bug

The schemaContent array in parse.js defines four patterns for matching lines within a hunk body:

Pattern Purpose
/^\ No newline/ End-of-file marker
/^-/ Deleted line
/^\+/ Added line
/^\s+/ Context (unchanged) line

The context line pattern /^\s+/ requires one or more whitespace characters. An empty string "" matches none of the four patterns and is silently skipped by parseContentLine().

The Cascade

Dropping an empty context line has consequences beyond just a missing line in the output:

  1. Each hunk header declares expected line counts (e.g., @@ -1,5 +1,5 @@ means 5 old lines, 5 new lines)
  2. The parser tracks oldLines and newLines counters, decrementing them as lines are parsed
  3. When a context line is dropped, those counters never reach 0
  4. The parser stays in "content mode" instead of transitioning back to "header mode"
  5. The next @@ hunk header is parsed as content instead of starting a new hunk
  6. Subsequent hunks — and even diff --git file boundaries — can be swallowed entirely

Examples

Example 1: Simple case — blank context line between changes

This diff has an empty context line (line 4, between the deletion and addition):

diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,4 +1,4 @@
 first
-old
+new
 last

Now consider the same diff with diff.suppressBlankEmpty = true, where the blank context line between first and -old has its leading space stripped:

diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,4 +1,4 @@
 first

-old
+new
 last

Expected: 4 changes — context, context (empty), deletion, addition, context
Actual: The empty line on line 6 matches no pattern. The parser drops it, oldLines and newLines counters are off by one, and the hunk is left incomplete.

Example 2: Cascade — second hunk is lost

diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,3 @@
 context

-old line
+new line
@@ -10,3 +10,3 @@
 another context
-old second
+new second

Expected: 2 hunks, each with their changes
Actual: The empty context line in the first hunk is dropped. The counter for hunk 1 never reaches 0, so the parser stays in content mode. The @@ -10,3 +10,3 @@ header for hunk 2 is consumed as content. The entire second hunk is lost from the parsed output.

Example 3: Cascade — second file is lost

diff --git a/a.txt b/a.txt
index 1234567..abcdefg 100644
--- a/a.txt
+++ b/a.txt
@@ -1,3 +1,3 @@
 context

-old
+new
diff --git a/b.txt b/b.txt
index 2345678..bcdefgh 100644
--- b/b.txt
+++ b/b.txt
@@ -1,3 +1,3 @@
 hello
-world
+earth

Expected: 2 files parsed, each with 1 hunk
Actual: The empty context line in a.txt's hunk causes the parser to stay in content mode. The diff --git b/b.txt b/b.txt boundary is consumed as hunk content. The entire second file is lost from the parsed output.

Reproduction

const parse = require('parse-diff');

const diff = [
  'diff --git a/file.txt b/file.txt',
  'index 1234567..abcdefg 100644',
  '--- a/file.txt',
  '+++ b/file.txt',
  '@@ -1,4 +1,4 @@',
  ' first',
  '',           // ← empty context line (no leading space)
  '-old',
  '+new',
  ' last',
].join('\n');

const files = parse(diff);
console.log(files[0].chunks[0].changes.length);
// Expected: 5 (first, empty, -old, +new, last)
// Actual:   4 (empty context line is silently dropped)

Environment

This issue is triggered when:

  • git config diff.suppressBlankEmpty is set to true
  • Any diff tool or pipeline strips trailing whitespace from output
  • Diff output is post-processed and trailing spaces on blank lines are removed

Suggested Fix

Change the context line regex from /^\s+/ (one or more whitespace) to /^\s*/ (zero or more whitespace). This is safe because schemaContent patterns are checked in order — deletions (/^-/) and additions (/^\+/) are matched first, so /^\s*/ acts as a catch-all for everything else in the hunk body, which within a correctly-formed hunk can only be a context line.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions