Empty context lines are silently dropped, causing lost hunks and corrupted parsing
Summary
When a unified diff contains empty context lines (i.e., lines that are completely empty with no leading space), parse-diff silently drops them. This causes hunk line counters to become desynchronized, which can cascade into losing entire hunks and even entire files from the parsed output.
Background
In the unified diff format, context lines (unchanged lines shown for surrounding context) are prefixed with a single space character. However, many tools strip trailing whitespace from diff output, and Git has a configuration option (diff.suppressBlankEmpty = true) that omits the leading space on blank context lines. This produces completely empty lines ("") in the diff output instead of lines containing a single space (" ").
The Bug
The schemaContent array in parse.js defines four patterns for matching lines within a hunk body:
| Pattern |
Purpose |
/^\ No newline/ |
End-of-file marker |
/^-/ |
Deleted line |
/^\+/ |
Added line |
/^\s+/ |
Context (unchanged) line |
The context line pattern /^\s+/ requires one or more whitespace characters. An empty string "" matches none of the four patterns and is silently skipped by parseContentLine().
The Cascade
Dropping an empty context line has consequences beyond just a missing line in the output:
- Each hunk header declares expected line counts (e.g.,
@@ -1,5 +1,5 @@ means 5 old lines, 5 new lines)
- The parser tracks
oldLines and newLines counters, decrementing them as lines are parsed
- When a context line is dropped, those counters never reach 0
- The parser stays in "content mode" instead of transitioning back to "header mode"
- The next
@@ hunk header is parsed as content instead of starting a new hunk
- Subsequent hunks — and even
diff --git file boundaries — can be swallowed entirely
Examples
Example 1: Simple case — blank context line between changes
This diff has an empty context line (line 4, between the deletion and addition):
diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,4 +1,4 @@
first
-old
+new
last
Now consider the same diff with diff.suppressBlankEmpty = true, where the blank context line between first and -old has its leading space stripped:
diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,4 +1,4 @@
first
-old
+new
last
Expected: 4 changes — context, context (empty), deletion, addition, context
Actual: The empty line on line 6 matches no pattern. The parser drops it, oldLines and newLines counters are off by one, and the hunk is left incomplete.
Example 2: Cascade — second hunk is lost
diff --git a/file.txt b/file.txt
index 1234567..abcdefg 100644
--- a/file.txt
+++ b/file.txt
@@ -1,3 +1,3 @@
context
-old line
+new line
@@ -10,3 +10,3 @@
another context
-old second
+new second
Expected: 2 hunks, each with their changes
Actual: The empty context line in the first hunk is dropped. The counter for hunk 1 never reaches 0, so the parser stays in content mode. The @@ -10,3 +10,3 @@ header for hunk 2 is consumed as content. The entire second hunk is lost from the parsed output.
Example 3: Cascade — second file is lost
diff --git a/a.txt b/a.txt
index 1234567..abcdefg 100644
--- a/a.txt
+++ b/a.txt
@@ -1,3 +1,3 @@
context
-old
+new
diff --git a/b.txt b/b.txt
index 2345678..bcdefgh 100644
--- b/b.txt
+++ b/b.txt
@@ -1,3 +1,3 @@
hello
-world
+earth
Expected: 2 files parsed, each with 1 hunk
Actual: The empty context line in a.txt's hunk causes the parser to stay in content mode. The diff --git b/b.txt b/b.txt boundary is consumed as hunk content. The entire second file is lost from the parsed output.
Reproduction
const parse = require('parse-diff');
const diff = [
'diff --git a/file.txt b/file.txt',
'index 1234567..abcdefg 100644',
'--- a/file.txt',
'+++ b/file.txt',
'@@ -1,4 +1,4 @@',
' first',
'', // ← empty context line (no leading space)
'-old',
'+new',
' last',
].join('\n');
const files = parse(diff);
console.log(files[0].chunks[0].changes.length);
// Expected: 5 (first, empty, -old, +new, last)
// Actual: 4 (empty context line is silently dropped)
Environment
This issue is triggered when:
git config diff.suppressBlankEmpty is set to true
- Any diff tool or pipeline strips trailing whitespace from output
- Diff output is post-processed and trailing spaces on blank lines are removed
Suggested Fix
Change the context line regex from /^\s+/ (one or more whitespace) to /^\s*/ (zero or more whitespace). This is safe because schemaContent patterns are checked in order — deletions (/^-/) and additions (/^\+/) are matched first, so /^\s*/ acts as a catch-all for everything else in the hunk body, which within a correctly-formed hunk can only be a context line.
Empty context lines are silently dropped, causing lost hunks and corrupted parsing
Summary
When a unified diff contains empty context lines (i.e., lines that are completely empty with no leading space),
parse-diffsilently drops them. This causes hunk line counters to become desynchronized, which can cascade into losing entire hunks and even entire files from the parsed output.Background
In the unified diff format, context lines (unchanged lines shown for surrounding context) are prefixed with a single space character. However, many tools strip trailing whitespace from diff output, and Git has a configuration option (
diff.suppressBlankEmpty = true) that omits the leading space on blank context lines. This produces completely empty lines ("") in the diff output instead of lines containing a single space (" ").The Bug
The
schemaContentarray inparse.jsdefines four patterns for matching lines within a hunk body:/^\ No newline//^-//^\+//^\s+/The context line pattern
/^\s+/requires one or more whitespace characters. An empty string""matches none of the four patterns and is silently skipped byparseContentLine().The Cascade
Dropping an empty context line has consequences beyond just a missing line in the output:
@@ -1,5 +1,5 @@means 5 old lines, 5 new lines)oldLinesandnewLinescounters, decrementing them as lines are parsed@@hunk header is parsed as content instead of starting a new hunkdiff --gitfile boundaries — can be swallowed entirelyExamples
Example 1: Simple case — blank context line between changes
This diff has an empty context line (line 4, between the deletion and addition):
Now consider the same diff with
diff.suppressBlankEmpty = true, where the blank context line betweenfirstand-oldhas its leading space stripped:Expected: 4 changes — context, context (empty), deletion, addition, context
Actual: The empty line on line 6 matches no pattern. The parser drops it,
oldLinesandnewLinescounters are off by one, and the hunk is left incomplete.Example 2: Cascade — second hunk is lost
Expected: 2 hunks, each with their changes
Actual: The empty context line in the first hunk is dropped. The counter for hunk 1 never reaches 0, so the parser stays in content mode. The
@@ -10,3 +10,3 @@header for hunk 2 is consumed as content. The entire second hunk is lost from the parsed output.Example 3: Cascade — second file is lost
Expected: 2 files parsed, each with 1 hunk
Actual: The empty context line in
a.txt's hunk causes the parser to stay in content mode. Thediff --git b/b.txt b/b.txtboundary is consumed as hunk content. The entire second file is lost from the parsed output.Reproduction
Environment
This issue is triggered when:
git config diff.suppressBlankEmptyis set totrueSuggested Fix
Change the context line regex from
/^\s+/(one or more whitespace) to/^\s*/(zero or more whitespace). This is safe becauseschemaContentpatterns are checked in order — deletions (/^-/) and additions (/^\+/) are matched first, so/^\s*/acts as a catch-all for everything else in the hunk body, which within a correctly-formed hunk can only be a context line.