Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff --color-moved[-ws] speedups #981

Closed

Conversation

phillipwood
Copy link

@phillipwood phillipwood commented Jun 14, 2021

Thanks to Dscho for his comments on V3.
Changes since V4:

  • Fixed a typo in the commit message to patch 6

Changes since V3:

  • Patch 1 now allows the user to choose different endpoints for the diff perf tests to facilitate testing with different repositories.
  • Fixed the alignment of the perf results column headers in a couple of patches.

Changes since V2:

  • Patches 1-3 are new and fix an existing bug.
  • Patch 8 includes Peff's unused parameter fix.
  • Patch 11 has been updated to fix a bug fix in V2.
  • Patch 13 has an expanded commit message explaining a change in behavior for lines starting with a form-feed.
  • Updated benchmark results.

The bug fix in patch 3 degrades the performance, but by the end of the series the timings are the same as V2 - see the range diff.

V2 Cover Letter:
Thanks to Ævar and Elijah for their comments, I've reworded the commit messages, addressed the enum initialization issue in patch 2 (now 3) and added some perf tests.

There are two new patches in this round. The first patch is new and adds the perf tests suggested by Ævar, the penultimate patch is also new and coverts the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new tests, the percentage changes are broadly similar to the previous results though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change is considerably slower that the implementation of diff --color-moved which is in turn slower than a regular diff. This patch series starts with a couple of bug fixes and then reworks the implementation of diff --color-moved and diff --color-moved-ws=allow-indentation-change to speed them up on large diffs. The time to run git diff --color-moved --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs with --color-moved - the time to run git log -p --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent processors. On older processors these patches reduce the running time in all cases that I've tested. In general the larger the diff the larger the speed up. As an extreme example the time to run diff --color-moved --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8 minutes to 6 seconds.

cc: Phillip Wood phillip.wood@dunelm.org.uk
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Elijah Newren newren@gmail.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Johannes Schindelin Johannes.Schindelin@gmx.de

@phillipwood
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

There is an issue in commit 27f23b5:
First line of commit message is too long (> 76 columns): diff --color-moved-ws=allow-indentation-change: simplify calculation of whitespace delta

1 similar comment
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

There is an issue in commit 27f23b5:
First line of commit message is too long (> 76 columns): diff --color-moved-ws=allow-indentation-change: simplify calculation of whitespace delta

@dscho
Copy link
Member

dscho commented Jun 14, 2021

diff --color-moved-ws=allow-indentation-change: simplify calculation of whitespace delta

How about: diff: simplify allow-indentation-change logic instead?

@phillipwood
Copy link
Author

diff --color-moved-ws=allow-indentation-change: simplify calculation of whitespace delta

How about: diff: simplify allow-indentation-change logic instead?

Thanks, I've reworded it - do I need to submit it again?

@phillipwood
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

Submitted as pull.981.git.1623675888.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v1

To fetch this version to local tag pr-981/phillipwood/wip/diff-color-moved-tweaks-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-981/phillipwood/wip/diff-color-moved-tweaks-v1

@dscho
Copy link
Member

dscho commented Jun 14, 2021

Thanks, I've reworded it - do I need to submit it again?

Yes, it was not submitted automatically.

@@ -1142,14 +1142,14 @@ static void mark_color_as_moved(struct diff_options *o,
struct moved_block *pmb = NULL; /* potentially moved blocks */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring

Zerba?

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
> alternation", 2018-11-23) sought to avoid using the alternate colors
> unless there are two adjacent moved blocks of the same
> sign. Unfortunately it contains two bugs that prevented it from fixing
> the problem properly. Firstly `last_symbol` is reset at the start of
> each iteration of the loop losing the symbol of the last line and
> secondly when deciding whether to use the alternate color it should be
> checking if the current line is the same sign of the last line, not a
> different sign. The combination of the two errors means that we still
> use the alternate color when we should do but we also use it when we
> shouldn't. This is most noticable when using
> --color-moved-ws=allow-indentation-change with hunks like
>
> -this line gets indented
> +    this line gets indented
>
> where the post image is colored with newMovedAlternate rather than
> newMoved. While this does not matter much, the next commit will change
> the coloring to be correct in this case, so lets fix the bug here to
> make it clear why the output is changing and add a regression test.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c                     |  4 +--
>  t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 52c791574b71..cb068f8258c0 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  	struct moved_block *pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
>  	int n, flipped_block = 0, block_length = 0;
> +	enum diff_symbol last_symbol = 0;
>  
>  
>  	for (n = 0; n < o->emitted_symbols->nr; n++) {
> @@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
>  		struct moved_entry *key;
>  		struct moved_entry *match = NULL;
>  		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
> -		enum diff_symbol last_symbol = 0;
>  
>  		switch (l->s) {
>  		case DIFF_SYMBOL_PLUS:
> @@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  			}
>  
>  			if (adjust_last_block(o, n, block_length) &&
> -			    pmb_nr && last_symbol != l->s)
> +			    pmb_nr && last_symbol == l->s)
>  				flipped_block = (flipped_block + 1) % 2;
>  			else
>  				flipped_block = 0;
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 2c13b62d3c65..920114cd795c 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'zebra alternate color is only used when necessary' '
> +	cat >old.txt <<-\EOF &&
> +	line 1A should be marked as oldMoved newMovedAlternate
> +	line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	line 2A should be marked as oldMoved newMovedAlternate
> +	line 2B should be marked as oldMoved newMovedAlternate
> +	line 3A should be marked as oldMovedAlternate newMoved
> +	line 3B should be marked as oldMovedAlternate newMoved
> +	unchanged
> +	line 4A should be marked as oldMoved newMovedAlternate
> +	line 4B should be marked as oldMoved newMovedAlternate
> +	line 5A should be marked as oldMovedAlternate newMoved
> +	line 5B should be marked as oldMovedAlternate newMoved
> +	line 6A should be marked as oldMoved newMoved
> +	line 6B should be marked as oldMoved newMoved
> +	EOF
> +	cat >new.txt <<-\EOF &&
> +	  line 1A should be marked as oldMoved newMovedAlternate
> +	  line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 3A should be marked as oldMovedAlternate newMoved
> +	  line 3B should be marked as oldMovedAlternate newMoved
> +	  line 2A should be marked as oldMoved newMovedAlternate
> +	  line 2B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 6A should be marked as oldMoved newMoved
> +	  line 6B should be marked as oldMoved newMoved
> +	    line 4A should be marked as oldMoved newMovedAlternate
> +	    line 4B should be marked as oldMoved newMovedAlternate
> +	  line 5A should be marked as oldMovedAlternate newMoved
> +	  line 5B should be marked as oldMovedAlternate newMoved
> +	EOF
> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
> +		 --color-moved-ws=allow-indentation-change \
> +		 old.txt new.txt >output &&
> +	grep -v index output | test_decode_color >actual &&
> +	cat >expected <<-\EOF &&
> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
> +	<BOLD>--- a/old.txt<RESET>
> +	<BOLD>+++ b/new.txt<RESET>
> +	<CYAN>@@ -1,14 +1,14 @@<RESET>
> +	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	EOF
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'cmd option assumes configured colored-moved' '
>  	test_config color.diff.oldMoved "magenta" &&
>  	test_config color.diff.newMoved "cyan" &&

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Phillip Wood wrote (reply to this):

On 15/06/2021 04:24, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Subject: Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring
> 
> Zerba?

They're quite rare in the wild :-/ Thanks for pointing that out I'll fix 
it when I re-roll

Best Wishes

Phillip

>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
>> alternation", 2018-11-23) sought to avoid using the alternate colors
>> unless there are two adjacent moved blocks of the same
>> sign. Unfortunately it contains two bugs that prevented it from fixing
>> the problem properly. Firstly `last_symbol` is reset at the start of
>> each iteration of the loop losing the symbol of the last line and
>> secondly when deciding whether to use the alternate color it should be
>> checking if the current line is the same sign of the last line, not a
>> different sign. The combination of the two errors means that we still
>> use the alternate color when we should do but we also use it when we
>> shouldn't. This is most noticable when using
>> --color-moved-ws=allow-indentation-change with hunks like
>>
>> -this line gets indented
>> +    this line gets indented
>>
>> where the post image is colored with newMovedAlternate rather than
>> newMoved. While this does not matter much, the next commit will change
>> the coloring to be correct in this case, so lets fix the bug here to
>> make it clear why the output is changing and add a regression test.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   diff.c                     |  4 +--
>>   t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 52c791574b71..cb068f8258c0 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   	struct moved_block *pmb = NULL; /* potentially moved blocks */
>>   	int pmb_nr = 0, pmb_alloc = 0;
>>   	int n, flipped_block = 0, block_length = 0;
>> +	enum diff_symbol last_symbol = 0;
>>   
>>   
>>   	for (n = 0; n < o->emitted_symbols->nr; n++) {
>> @@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
>>   		struct moved_entry *key;
>>   		struct moved_entry *match = NULL;
>>   		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
>> -		enum diff_symbol last_symbol = 0;
>>   
>>   		switch (l->s) {
>>   		case DIFF_SYMBOL_PLUS:
>> @@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   			}
>>   
>>   			if (adjust_last_block(o, n, block_length) &&
>> -			    pmb_nr && last_symbol != l->s)
>> +			    pmb_nr && last_symbol == l->s)
>>   				flipped_block = (flipped_block + 1) % 2;
>>   			else
>>   				flipped_block = 0;
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 2c13b62d3c65..920114cd795c 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
>>   	test_cmp expected actual
>>   '
>>   
>> +test_expect_success 'zebra alternate color is only used when necessary' '
>> +	cat >old.txt <<-\EOF &&
>> +	line 1A should be marked as oldMoved newMovedAlternate
>> +	line 1B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	line 2A should be marked as oldMoved newMovedAlternate
>> +	line 2B should be marked as oldMoved newMovedAlternate
>> +	line 3A should be marked as oldMovedAlternate newMoved
>> +	line 3B should be marked as oldMovedAlternate newMoved
>> +	unchanged
>> +	line 4A should be marked as oldMoved newMovedAlternate
>> +	line 4B should be marked as oldMoved newMovedAlternate
>> +	line 5A should be marked as oldMovedAlternate newMoved
>> +	line 5B should be marked as oldMovedAlternate newMoved
>> +	line 6A should be marked as oldMoved newMoved
>> +	line 6B should be marked as oldMoved newMoved
>> +	EOF
>> +	cat >new.txt <<-\EOF &&
>> +	  line 1A should be marked as oldMoved newMovedAlternate
>> +	  line 1B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	  line 3A should be marked as oldMovedAlternate newMoved
>> +	  line 3B should be marked as oldMovedAlternate newMoved
>> +	  line 2A should be marked as oldMoved newMovedAlternate
>> +	  line 2B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	  line 6A should be marked as oldMoved newMoved
>> +	  line 6B should be marked as oldMoved newMoved
>> +	    line 4A should be marked as oldMoved newMovedAlternate
>> +	    line 4B should be marked as oldMoved newMovedAlternate
>> +	  line 5A should be marked as oldMovedAlternate newMoved
>> +	  line 5B should be marked as oldMovedAlternate newMoved
>> +	EOF
>> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
>> +		 --color-moved-ws=allow-indentation-change \
>> +		 old.txt new.txt >output &&
>> +	grep -v index output | test_decode_color >actual &&
>> +	cat >expected <<-\EOF &&
>> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
>> +	<BOLD>--- a/old.txt<RESET>
>> +	<BOLD>+++ b/new.txt<RESET>
>> +	<CYAN>@@ -1,14 +1,14 @@<RESET>
>> +	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
>> +	 unchanged<RESET>
>> +	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
>> +	 unchanged<RESET>
>> +	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
>> +	EOF
>> +	test_cmp expected actual
>> +'
>> +
>>   test_expect_success 'cmd option assumes configured colored-moved' '
>>   	test_config color.diff.oldMoved "magenta" &&
>>   	test_config color.diff.newMoved "cyan" &&

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2021

This patch series was integrated into seen via git@434d4a9.

@gitgitgadget gitgitgadget bot added the seen label Jun 15, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2021

This branch is now known as pw/diff-color-moved-fix.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2021

There was a status update in the "New Topics" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Long-overdue correctness and performance update to "diff
--color-moved" feature.

Waiting for reviews.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 15, 2021

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2021

This patch series was integrated into seen via git@f329af8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2021

This patch series was integrated into seen via git@5d625c0.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2021

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):


On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:

> The current implementation of diff --color-moved-ws=allow-indentation-change
> is considerably slower that the implementation of diff --color-moved which
> is in turn slower than a regular diff. This patch series starts with a
> couple of bug fixes and then reworks the implementation of diff
> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
> them up on large diffs. The time to run git diff --color-moved
> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
> with --color-moved - the time to run git log -p --color-moved
> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
> processors. On older processors these patches reduce the running time in all
> cases that I've tested. In general the larger the diff the larger the speed
> up. As an extreme example the time to run diff --color-moved
> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
> minutes to 6 seconds.
>
> Phillip Wood (10):
>   diff --color-moved=zerba: fix alternate coloring
>   diff --color-moved: avoid false short line matches and bad zerba
>     coloring
>   diff: simplify allow-indentation-change delta calculation
>   diff --color-moved-ws=allow-indentation-change: simplify and optimize
>   diff --color-moved: call comparison function directly
>   diff --color-moved: unify moved block growth functions
>   diff --color-moved: shrink potential moved blocks as we go
>   diff --color-moved: stop clearing potential moved blocks
>   diff --color-moved-ws=allow-indentation-change: improve hash lookups
>   diff --color-moved: intern strings

Nice to see these land after the earlier on-list reference to them.

I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.

One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 16, 2021

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2021

This patch series was integrated into seen via git@4885b7e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Long-overdue correctness and performance update to "diff
--color-moved" feature.

Waiting for reviews.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 17, 2021

This patch series was integrated into seen via git@a92be0a.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 19, 2021

This patch series was integrated into seen via git@7bf7283.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 21, 2021

On the Git mailing list, Phillip Wood wrote (reply to this):

On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:
> 
>> The current implementation of diff --color-moved-ws=allow-indentation-change
>> is considerably slower that the implementation of diff --color-moved which
>> is in turn slower than a regular diff. This patch series starts with a
>> couple of bug fixes and then reworks the implementation of diff
>> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
>> them up on large diffs. The time to run git diff --color-moved
>> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
>> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
>> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
>> with --color-moved - the time to run git log -p --color-moved
>> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
>> processors. On older processors these patches reduce the running time in all
>> cases that I've tested. In general the larger the diff the larger the speed
>> up. As an extreme example the time to run diff --color-moved
>> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
>> minutes to 6 seconds.
>>
>> Phillip Wood (10):
>>    diff --color-moved=zerba: fix alternate coloring
>>    diff --color-moved: avoid false short line matches and bad zerba
>>      coloring
>>    diff: simplify allow-indentation-change delta calculation
>>    diff --color-moved-ws=allow-indentation-change: simplify and optimize
>>    diff --color-moved: call comparison function directly
>>    diff --color-moved: unify moved block growth functions
>>    diff --color-moved: shrink potential moved blocks as we go
>>    diff --color-moved: stop clearing potential moved blocks
>>    diff --color-moved-ws=allow-indentation-change: improve hash lookups
>>    diff --color-moved: intern strings
> 
> Nice to see these land after the earlier on-list reference to them.
> 
> I skimmed these mostly, and am not familiar with this code, but didn't
> see any glaring things missing. There was one existing oddity with
> assigning a 0 to an "enum diff_symbol", don't we want
> DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
> in 02/10, and pre-dates these changes.

Thanks for taking a look at these patches. I take your point about the 
assignment, I don't think the actual value matters so long as it's not 
DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS.

> One thing I would very much like to see here is a conversion of the
> existing ad-hoc benchmarks you note in commit messages to something that
> lives in t/perf/, it really helps future maintenance of perf-sensitive
> code to be able to re-run those, and I for one find the output much
> easier to read than whatever tool you're using to produce your
> benchmarks.

Adding some perf tests is a good idea, I'll do that when I reroll which 
may take a couple of weeks as I'm going offline for a while at the end 
of the week. The tool I have been using is hyperfine[1], it has been 
used by a few other contributors (see `git log --grep σ` if you're 
interested)

[1] https://github.com/sharkdp/hyperfine

Best Wishes

Phillip

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 29, 2021

This patch series was integrated into seen via git@e2e6481.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 29, 2021

This patch series was integrated into seen via git@7bc5c6e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 30, 2021

This patch series was integrated into seen via git@3259b7d.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 30, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Long-overdue correctness and performance update to "diff
--color-moved" feature.

Waiting for reviews.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2021

This patch series was integrated into seen via git@d842cd6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2021

This patch series was integrated into seen via git@0a4f9b4.

@phillipwood
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 9, 2021

Submitted as pull.981.v5.git.1639045809.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v5

To fetch this version to local tag pr-981/phillipwood/wip/diff-color-moved-tweaks-v5:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-981/phillipwood/wip/diff-color-moved-tweaks-v5

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 9, 2021

This patch series was integrated into seen via git@55ef4eb.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Correctness and performance update to "diff --color-moved" feature.

Will merge to 'next'?
source: <pull.981.v5.git.1639045809.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

This patch series was integrated into seen via git@232f609.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@7b70897.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@92da42b.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 16, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Correctness and performance update to "diff --color-moved" feature.

Will merge to 'next'.
source: <pull.981.v5.git.1639045809.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 16, 2021

This patch series was integrated into seen via git@e6541db.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 20, 2021

This patch series was integrated into seen via git@2fec60e.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into seen via git@89bd8e5.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 22, 2021

This patch series was integrated into seen via git@86e5a5e.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 23, 2021

This patch series was integrated into seen via git@756770a.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 24, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Correctness and performance update to "diff --color-moved" feature.

Will merge to 'next'.
source: <pull.981.v5.git.1639045809.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 25, 2021

This patch series was integrated into seen via git@3d6c7ae.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 27, 2021

This patch series was integrated into seen via git@e249cf7.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 27, 2021

This patch series was integrated into next via git@443dd9f.

@gitgitgadget gitgitgadget bot added the next label Dec 27, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 28, 2021

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Correctness and performance update to "diff --color-moved" feature.

Will merge to 'master'.
source: <pull.981.v5.git.1639045809.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 30, 2021

This patch series was integrated into seen via git@f161a38.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2022

This patch series was integrated into seen via git@eec77b9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 4, 2022

There was a status update in the "Cooking" section about the branch pw/diff-color-moved-fix on the Git mailing list:

Correctness and performance update to "diff --color-moved" feature.

Will merge to 'master'.
source: <pull.981.v5.git.1639045809.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2022

This patch series was integrated into seen via git@ca694ce.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2022

This patch series was integrated into seen via git@2b755b3.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2022

This patch series was integrated into next via git@2b755b3.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2022

This patch series was integrated into master via git@2b755b3.

@gitgitgadget gitgitgadget bot added the master label Jan 5, 2022
@gitgitgadget gitgitgadget bot closed this Jan 5, 2022
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 5, 2022

Closed via 2b755b3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants