Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistake in diff_cleanupMerge algorithm in diff_match_patch.cpp #31

Closed
GoogleCodeExporter opened this issue May 6, 2015 · 5 comments
Closed

Comments

@GoogleCodeExporter
Copy link

In diff_match_patch::diff_cleanupMerge the 'if (count_delete != 0 && 
count_insert != 0)' is always true because it is proceeded by 'while 
(count_delete-- > 0)' & 'while (count_insert-- > 0)'.
I think the test was supposed to be before the variables were decremented.

Luckily (or not) this only results in consuming extra cycles.

This same bug appears in the Java version, but not in the C#, JavaScript, Lua 
or Python versions.

Patch for C++ attached.

Original issue reported on code.google.com by chris...@gmail.com on 4 Nov 2010 at 1:36

Attachments:

@GoogleCodeExporter
Copy link
Author

Nice catch!  It looks like that bug has been in the code since the initial Java 
version in 2007, and was later faithfully ported to C++.  This function has 
comprehensive unit testing, but no amount of testing would have caught that.

While in that function I also noticed that we are rebuilding [Equality, 
Insertion, Equality] diffs for no reason.  So I've added a shortcut for 
single-edits sandwiched between two equalities.

The patch has gone out for overnight review.

Original comment by neil.fra...@gmail.com on 5 Nov 2010 at 7:38

  • Changed state: Started
  • Added labels: Performance

@GoogleCodeExporter
Copy link
Author

The changes were commited to Subversion a few hours ago.
  http://code.google.com/p/google-diff-match-patch/source/detail?r=72
Thanks for caching this.

There's another small optimization coming on Monday for C# and JavaScript, so I 
won't bother updating the download package until then.

Original comment by neil.fra...@gmail.com on 6 Nov 2010 at 6:21

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Neil,

Unfortunately your change in r72 appears to cause test "diff_main: Overlap #1." 
to fail.

It's the 'if (count_delete + count_insert > 1)' which is not equivalent to
the old 'if (count_delete != 0 || count_insert != 0)'.

Original comment by chris...@gmail.com on 6 Nov 2010 at 6:01

@GoogleCodeExporter
Copy link
Author

I cannot replicate the error you report.  All unit tests pass in all languages:

diff_main: Overlap #1. OK
diff_main: Overlap #2. OK
diff_main: Overlap #3. OK
[...]
All tests passed.
Total time: 1297 ms
Done.

Could you double-check that the error is real?

Original comment by neil.fra...@gmail.com on 8 Nov 2010 at 8:36

@GoogleCodeExporter
Copy link
Author

Neil,

I'm terribly sorry.  You are absolutely correct - r72 does NOT cause a test 
failure.

Let me explain: I'd made a change that asserted that `text_delete` & 
`text_insert` were already empty at the bottom of the `case` statement & so 
there was no need for `text_delete = ""; text_insert = "";`.  This was true 
before r72.  Unfortunately I neglected to test my debug build with that change 
from r72 so the asserts weren't triggered.

Why did I make that change & why did I not suspect it?
Well, it's because I don't have/can't use Qt & I've ported the C++ 
diff-match-patch to use the standard C++ library, so it is just one of many 
changes.  I hope to submit my C++ version (cpp-std) to you soon for possible 
inclusion in diff-match-patch.

In fact I'm quite pleased that that change in r72 does work as it provides a 
nice performance improvement.
I've spent quite some time optimising my C++ conversion & it is now up to 4 
times as fast as the initial conversion.
And, of course it passes all the tests.  I've also added a couple of switches 
to the test program to provide some performance metrics.
Additionally, I've added another switch (currently not for submission) that 
processes a diff/patch file to produce HTML.  It's based on the JavaScript 
version in svnX <svnx.googlecode.com> .
[It converts a 91,500 line diff (from a user) to 29.2MB of HTML in as little as 
14 secs, even on my ageing machine.]

My conversion is pretty much complete, I just need to do some tidying up & test 
it with one more compiler.
I'll send you more details when it's ready.

Thanks for a very useful resource.

Regards,

Chris

Original comment by chris...@gmail.com on 9 Nov 2010 at 6:50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant