Skip to content

Loading…

Suppress unified_diff for files deemed to be "binary". #78

Closed
wants to merge 2 commits into from

2 participants

@tkanemoto

For commits with lots of binary files or large binary files, the
memory usage in write_object_diff could get quite nasty.

Moreover, it is not really interesting to see the binary level
diff for binary files.

This change makes write_object_diff display a line saying
"Binary files a/old/path and b/new/path differ" instead
of the unified diff for files that it deems "binary" like, using
the same strategy used in the git diff command [1].

  1. http://git.kernel.org/?p=git/git.git;a=blob;f=xdiff-interface.c#l197
@tkanemoto tkanemoto Make patch.write_object_diff suppress diff for binary files
For commits with lots of binary files or large binary files, the
memory usage in write_object_diff could get quite nasty.

Moreover, it is not really interesting to see the binary level
diff for binary files.

This change makes write_object_diff display a line saying
"Binary files a/<old_path> and b/<new_path> differ" instead
of the unified diff for files that it deems "binary" like, using
the same strategy used in the `git diff` command [1].

1. http://git.kernel.org/?p=git/git.git;a=blob;f=xdiff-interface.c#l197
3850e9e
@jelmer
Owner

Thanks! The changes look reasonable. Any chance you can add a trivial test for this new behaviour?

Hi. Thanks for the feedback.

Are there any repos/commits in the dulwich/tests/data that includes binary diffs that I can re-use for this? I'd rather not add something that's already.

Owner
@tkanemoto

Thanks. I'm on holiday at the moment. I'll pick this up when I get back.

Thanks,

Takeshi

@tkanemoto tkanemoto Added tests for write_object_diff with binary files
Added tests for patch.write_object_diff's binary comparison
behaviour.
4fdf31f
@tkanemoto

Tests have been added for the new behaviour.

@jelmer
Owner

Thanks, merged. Sorry for the delay.

@jelmer jelmer closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 19, 2012
  1. @tkanemoto

    Make patch.write_object_diff suppress diff for binary files

    tkanemoto committed
    For commits with lots of binary files or large binary files, the
    memory usage in write_object_diff could get quite nasty.
    
    Moreover, it is not really interesting to see the binary level
    diff for binary files.
    
    This change makes write_object_diff display a line saying
    "Binary files a/<old_path> and b/<new_path> differ" instead
    of the unified diff for files that it deems "binary" like, using
    the same strategy used in the `git diff` command [1].
    
    1. http://git.kernel.org/?p=git/git.git;a=blob;f=xdiff-interface.c#l197
Commits on Feb 9, 2013
  1. @tkanemoto

    Added tests for write_object_diff with binary files

    tkanemoto committed
    Added tests for patch.write_object_diff's binary comparison
    behaviour.
Showing with 78 additions and 8 deletions.
  1. +27 −8 dulwich/patch.py
  2. +51 −0 dulwich/tests/test_patch.py
View
35 dulwich/patch.py
@@ -31,6 +31,9 @@
S_ISGITLINK,
)
+FIRST_FEW_BYTES = 8000
+
+
def write_commit_patch(f, commit, contents, progress, version=None):
"""Write a individual file patch.
@@ -103,6 +106,11 @@ def unified_diff(a, b, fromfile='', tofile='', n=3):
yield '+' + line
+def is_binary(content):
+ """See if the first few bytes contains any null characters."""
+ return '\0' in content[:FIRST_FEW_BYTES]
+
+
def write_object_diff(f, store, (old_path, old_mode, old_id),
(new_path, new_mode, new_id)):
"""Write the diff for an object.
@@ -119,13 +127,21 @@ def shortid(hexsha):
return "0" * 7
else:
return hexsha[:7]
- def lines(mode, hexsha):
+
+ def content(mode, hexsha):
if hexsha is None:
- return []
+ return ''
elif S_ISGITLINK(mode):
- return ["Submodule commit " + hexsha + "\n"]
+ return "Submodule commit " + hexsha + "\n"
else:
- return store[hexsha].data.splitlines(True)
+ return store[hexsha].data
+
+ def lines(content):
+ if not content:
+ return []
+ else:
+ return content.splitlines(True)
+
if old_path is None:
old_path = "/dev/null"
else:
@@ -146,10 +162,13 @@ def lines(mode, hexsha):
if new_mode is not None:
f.write(" %o" % new_mode)
f.write("\n")
- old_contents = lines(old_mode, old_id)
- new_contents = lines(new_mode, new_id)
- f.writelines(unified_diff(old_contents, new_contents,
- old_path, new_path))
+ old_content = content(old_mode, old_id)
+ new_content = content(new_mode, new_id)
+ if is_binary(old_content) or is_binary(new_content):
+ f.write("Binary files %s and %s differ\n" % (old_path, new_path))
+ else:
+ f.writelines(unified_diff(lines(old_content), lines(new_content),
+ old_path, new_path))
def write_blob_diff(f, (old_path, old_mode, old_blob),
View
51 dulwich/tests/test_patch.py
@@ -369,6 +369,57 @@ def test_object_diff_remove_blob(self):
'-same'
], f.getvalue().splitlines())
+ def test_object_diff_bin_blob(self):
+ f = StringIO()
+ # Prepare two slightly different PNG headers
+ b1 = Blob.from_string(
+ "\x89\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\x49\x48\x44\x52"
+ "\x00\x00\x01\xd5\x00\x00\x00\x9f\x08\x04\x00\x00\x00\x05\x04\x8b")
+ b2 = Blob.from_string(
+ "\x89\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\x49\x48\x44\x52"
+ "\x00\x00\x01\xd5\x00\x00\x00\x9f\x08\x03\x00\x00\x00\x98\xd3\xb3")
+ store = MemoryObjectStore()
+ store.add_objects([(b1, None), (b2, None)])
+ write_object_diff(f, store, ('foo.png', 0644, b1.id),
+ ('bar.png', 0644, b2.id))
+ self.assertEqual([
+ 'diff --git a/foo.png b/bar.png',
+ 'index f73e47d..06364b7 644',
+ 'Binary files a/foo.png and b/bar.png differ'
+ ], f.getvalue().splitlines())
+
+ def test_object_diff_add_bin_blob(self):
+ f = StringIO()
+ b2 = Blob.from_string(
+ '\x89\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\x49\x48\x44\x52'
+ '\x00\x00\x01\xd5\x00\x00\x00\x9f\x08\x03\x00\x00\x00\x98\xd3\xb3')
+ store = MemoryObjectStore()
+ store.add_object(b2)
+ write_object_diff(f, store, (None, None, None),
+ ('bar.png', 0644, b2.id))
+ self.assertEqual([
+ 'diff --git /dev/null b/bar.png',
+ 'new mode 644',
+ 'index 0000000..06364b7 644',
+ 'Binary files /dev/null and b/bar.png differ'
+ ], f.getvalue().splitlines())
+
+ def test_object_diff_remove_bin_blob(self):
+ f = StringIO()
+ b1 = Blob.from_string(
+ '\x89\x50\x4e\x47\x0d\x0a\x1a\x0a\x00\x00\x00\x0d\x49\x48\x44\x52'
+ '\x00\x00\x01\xd5\x00\x00\x00\x9f\x08\x04\x00\x00\x00\x05\x04\x8b')
+ store = MemoryObjectStore()
+ store.add_object(b1)
+ write_object_diff(f, store, ('foo.png', 0644, b1.id),
+ (None, None, None))
+ self.assertEqual([
+ 'diff --git a/foo.png /dev/null',
+ 'deleted mode 644',
+ 'index f73e47d..0000000',
+ 'Binary files a/foo.png and /dev/null differ'
+ ], f.getvalue().splitlines())
+
def test_object_diff_kind_change(self):
f = StringIO()
b1 = Blob.from_string("new\nsame\n")
Something went wrong with that request. Please try again.