Teach diff to honor diff algorithms set through git attributes #1452

john-cai · 2023-02-04T17:55:51Z

When a repository contains different kinds of files, it may be desirable to use different algorithms based on file type. This is currently not feasible through the command line or using git configs. However, we can leverage the fact that gitattributes are path aware.

Teach the diff machinery to check gitattributes when diffing files by using the existing diff. scheme, and add an "algorithm" type to the external driver config.

Change since V3:

cleaned up documentation, typos
minor cleanup such as if statement ordering, and overly long lines

Changes since V2:

minor clean up and variable renaming
avoid parsing attribute files for the driver if the diff algorithm is set through the command line

Changes since V1:

utilize the existing diff.<driver>.* scheme where the driver is defined in gitattributes, but the algorithm is defined in the gitconfig.

To address some of the performance concerns in the previous series, a benchmark shows that now only a minor performance penalty is incurred, now that we are no longer adding an additional attributes parsing call:

$ echo "*.[ch] diff=other" >> .gitattributes
$ hyperfine -r 10 -L a git-bin-wrapper,git '{a} -c diff.other.algorithm=myers diff v2.0.0 v2.28.0'
Benchmark 1: git-bin-wrapper -c diff.other.algorithm=myers diff v2.0.0 v2.28.0
Time (mean ± σ): 716.3 ms ± 3.8 ms [User: 660.2 ms, System: 50.8 ms]
Range (min … max): 709.8 ms … 720.6 ms 10 runs

Benchmark 2: git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0
Time (mean ± σ): 704.3 ms ± 2.9 ms [User: 656.6 ms, System: 44.3 ms]
Range (min … max): 700.1 ms … 708.6 ms 10 runs

Summary
'git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0' ran
1.02 ± 0.01 times faster than 'git-bin-wrapper -c diff.other.algorithm=myers diff v2.0.0 v2.28.0'

cc: Eric Sunshine sunshine@sunshineco.com
cc: Phillip Wood phillip.wood123@gmail.com
cc: Ævar Arnfjörð Bjarmason avarab@gmail.com
cc: Jeff King peff@peff.net
cc: Elijah Newren newren@gmail.com

gitgitgadget-git · 2023-02-04T17:56:34Z

There are issues in commit 8e8fee8:
diff: Teach diff to read attribute diff-algorithm
Prefixed commit message must be in lower case

john-cai · 2023-02-05T03:31:43Z

/preview

gitgitgadget-git · 2023-02-05T03:32:39Z

Preview email sent as pull.1452.git.git.1675567956.gitgitgadget@gmail.com

john-cai · 2023-02-05T03:45:42Z

/submit

gitgitgadget-git · 2023-02-05T03:46:32Z

Submitted as pull.1452.git.git.1675568781.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1452/john-cai/jc/attr-diff-algo-v1

To fetch this version to local tag pr-git-1452/john-cai/jc/attr-diff-algo-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1452/john-cai/jc/attr-diff-algo-v1

gitgitgadget-git · 2023-02-05T17:53:23Z

Documentation/gitattributes.txt

@@ -736,6 +736,29 @@ String::
 	by the configuration variables in the "diff.foo" section of the


On the Git mailing list, Eric Sunshine wrote (reply to this):

On Sat, Feb 4, 2023 at 11:47 PM John Cai via GitGitGadget <gitgitgadget@gmail.com> wrote: > It can be useful to specify diff algorithms per file type. For example, > one may want to use the minimal diff algorithm for .json files, another > for .c files, etc. > > Teach the diff machinery to check attributes for a diff algorithm. > Enforce precedence by favoring the command line option, then looking at > attributes, then finally the config. > > To enforce precedence order, set the `xdl_opts_command_line` member > during options pasing to indicate the diff algorithm was set via command > line args. > > Signed-off-by: John Cai <johncai86@gmail.com> > --- > diff --git a/diff.c b/diff.c > @@ -3652,6 +3652,27 @@ static void builtin_diff(const char *name_a, > + if (!o->xdl_opts_command_line) { > + static struct attr_check *check; `check` is declared static... > + const char *one_diff_algo; > + const char *two_diff_algo; > + > + check = attr_check_alloc(); ... is allocated here... > + attr_check_append(check, git_attr("diff-algorithm")); > + > + git_check_attr(the_repository->index, NULL, one->path, check); > + one_diff_algo = check->items[0].value; > + git_check_attr(the_repository->index, NULL, two->path, check); > + two_diff_algo = check->items[0].value; > + > + if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) && > + !strcmp(one_diff_algo, two_diff_algo)) > + set_diff_algorithm(o, one_diff_algo); > + > + attr_check_free(check); ... and freed here... > + } ... so the reason for the `static` declaration is not clear. Am I missing something obvious?

On the Git mailing list, John Cai wrote (reply to this):

Hi Eric, On 5 Feb 2023, at 12:50, Eric Sunshine wrote: > On Sat, Feb 4, 2023 at 11:47 PM John Cai via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> It can be useful to specify diff algorithms per file type. For example, >> one may want to use the minimal diff algorithm for .json files, another >> for .c files, etc. >> >> Teach the diff machinery to check attributes for a diff algorithm. >> Enforce precedence by favoring the command line option, then looking at >> attributes, then finally the config. >> >> To enforce precedence order, set the `xdl_opts_command_line` member >> during options pasing to indicate the diff algorithm was set via command >> line args. >> >> Signed-off-by: John Cai <johncai86@gmail.com> >> --- >> diff --git a/diff.c b/diff.c >> @@ -3652,6 +3652,27 @@ static void builtin_diff(const char *name_a, >> + if (!o->xdl_opts_command_line) { >> + static struct attr_check *check; > > `check` is declared static... > >> + const char *one_diff_algo; >> + const char *two_diff_algo; >> + >> + check = attr_check_alloc(); > > ... is allocated here... > >> + attr_check_append(check, git_attr("diff-algorithm")); >> + >> + git_check_attr(the_repository->index, NULL, one->path, check); >> + one_diff_algo = check->items[0].value; >> + git_check_attr(the_repository->index, NULL, two->path, check); >> + two_diff_algo = check->items[0].value; >> + >> + if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) && >> + !strcmp(one_diff_algo, two_diff_algo)) >> + set_diff_algorithm(o, one_diff_algo); >> + >> + attr_check_free(check); > > ... and freed here... > >> + } > > ... so the reason for the `static` declaration is not clear. Am I > missing something obvious? No, you are correct. No reason for the static declaration. `check` is not used outside of the scope of this conditional. I think this made it in from an earlier iteration and I didn't catch the oversight. thanks John

gitgitgadget-git · 2023-02-05T17:53:26Z

User Eric Sunshine <sunshine@sunshineco.com> has been added to the cc: list.

gitgitgadget-git · 2023-02-06T16:24:19Z

diff.c

@@ -3437,6 +3437,22 @@ static int diff_filepair_is_phoney(struct diff_filespec *one,
 	return !DIFF_FILE_VALID(one) && !DIFF_FILE_VALID(two);


On the Git mailing list, Phillip Wood wrote (reply to this):

Hi John On 05/02/2023 03:46, John Cai via GitGitGadget wrote: > From: John Cai <jcai@gitlab.com> > > The diff option parsing for --minimal, --patience, --histgoram can all > be consolidated into one function. This is a preparatory step for the > subsequent commit which teaches diff to keep track of whether or not a > diff algorithm has been set via the command line. > > While we're at it, the logic that sets the diff algorithm in > diff_opt_diff_algorithm() can be refactored into a helper that will > allow multiple callsites to set the diff algorithm. You say "while we're at it" but isn't it a wholly necessary change for what you want to do? This patch basically looks good, I've left a couple of comments below, thanks for separating it out as a preparatory step > Signed-off-by: John Cai <johncai86@gmail.com> > --- > diff.c | 87 ++++++++++++++++++++++++++++++++++++---------------------- > 1 file changed, 54 insertions(+), 33 deletions(-) > > diff --git a/diff.c b/diff.c > index 329eebf16a0..a8a31c81fe7 100644 > --- a/diff.c > +++ b/diff.c > @@ -3437,6 +3437,22 @@ static int diff_filepair_is_phoney(struct diff_filespec *one, > return !DIFF_FILE_VALID(one) && !DIFF_FILE_VALID(two); > } > > +static int set_diff_algorithm(struct diff_options *opts, > + const char *alg) > +{ > + long value = parse_algorithm_value(alg); > + > + if (value < 0) > + return 1; > + > + /* clear out previous settings */ > + DIFF_XDL_CLR(opts, NEED_MINIMAL); > + opts->xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK; > + opts->xdl_opts |= value; > + > + return 0; > +} > + > static void builtin_diff(const char *name_a, > const char *name_b, > struct diff_filespec *one, > @@ -5107,17 +5123,40 @@ static int diff_opt_diff_algorithm(const struct option *opt, > const char *arg, int unset) > { > struct diff_options *options = opt->value; > - long value = parse_algorithm_value(arg); > > BUG_ON_OPT_NEG(unset); > - if (value < 0) > + > + if (set_diff_algorithm(options, arg)) > return error(_("option diff-algorithm accepts \"myers\", " > "\"minimal\", \"patience\" and \"histogram\"")); > > - /* clear out previous settings */ > - DIFF_XDL_CLR(options, NEED_MINIMAL); > - options->xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK; > - options->xdl_opts |= value; > + return 0; > +} > + > +static int diff_opt_diff_algorithm_no_arg(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + > + BUG_ON_OPT_NEG(unset); > + BUG_ON_OPT_ARG(arg); > + > + if (!strcmp(opt->long_name, "patience")) { > + int i; This is copied from the existing code but as `options->anchors_nr` is a size_t it is probably worth converting `i` to a size_t here. > + /* > + * Both --patience and --anchored use PATIENCE_DIFF > + * internally, so remove any anchors previously > + * specified. > + */ > + for (i = 0; i < options->anchors_nr; i++) > + free(options->anchors[i]); > + options->anchors_nr = 0; > + } > + > + if (set_diff_algorithm(options, opt->long_name)) > + return error(_("available diff algorithms include \"myers\", " > + "\"minimal\", \"patience\" and \"histogram\"")); I think this should be a BUG() as it is a programming error if we reach this point. Best Wishes Phillip > + > return 0; > } > > @@ -5242,26 +5281,6 @@ static enum parse_opt_result diff_opt_output(struct parse_opt_ctx_t *ctx, > return 0; > } > > -static int diff_opt_patience(const struct option *opt, > - const char *arg, int unset) > -{ > - struct diff_options *options = opt->value; > - int i; > - > - BUG_ON_OPT_NEG(unset); > - BUG_ON_OPT_ARG(arg); > - options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); > - /* > - * Both --patience and --anchored use PATIENCE_DIFF > - * internally, so remove any anchors previously > - * specified. > - */ > - for (i = 0; i < options->anchors_nr; i++) > - free(options->anchors[i]); > - options->anchors_nr = 0; > - return 0; > -} > - > static int diff_opt_ignore_regex(const struct option *opt, > const char *arg, int unset) > { > @@ -5562,9 +5581,10 @@ struct option *add_diff_options(const struct option *opts, > N_("prevent rename/copy detection if the number of rename/copy targets exceeds given limit")), > > OPT_GROUP(N_("Diff algorithm options")), > - OPT_BIT(0, "minimal", &options->xdl_opts, > - N_("produce the smallest possible diff"), > - XDF_NEED_MINIMAL), > + OPT_CALLBACK_F(0, "minimal", options, NULL, > + N_("produce the smallest possible diff"), > + PARSE_OPT_NONEG | PARSE_OPT_NOARG, > + diff_opt_diff_algorithm_no_arg), > OPT_BIT_F('w', "ignore-all-space", &options->xdl_opts, > N_("ignore whitespace when comparing lines"), > XDF_IGNORE_WHITESPACE, PARSE_OPT_NONEG), > @@ -5589,10 +5609,11 @@ struct option *add_diff_options(const struct option *opts, > OPT_CALLBACK_F(0, "patience", options, NULL, > N_("generate diff using the \"patience diff\" algorithm"), > PARSE_OPT_NONEG | PARSE_OPT_NOARG, > - diff_opt_patience), > - OPT_BITOP(0, "histogram", &options->xdl_opts, > - N_("generate diff using the \"histogram diff\" algorithm"), > - XDF_HISTOGRAM_DIFF, XDF_DIFF_ALGORITHM_MASK), > + diff_opt_diff_algorithm_no_arg), > + OPT_CALLBACK_F(0, "histogram", options, NULL, > + N_("generate diff using the \"histogram diff\" algorithm"), > + PARSE_OPT_NONEG | PARSE_OPT_NOARG, > + diff_opt_diff_algorithm_no_arg), > OPT_CALLBACK_F(0, "diff-algorithm", options, N_("<algorithm>"), > N_("choose a diff algorithm"), > PARSE_OPT_NONEG, diff_opt_diff_algorithm),

gitgitgadget-git · 2023-02-06T16:24:23Z

User Phillip Wood <phillip.wood123@gmail.com> has been added to the cc: list.

gitgitgadget-git · 2023-02-06T16:29:54Z

Documentation/gitattributes.txt

@@ -736,6 +736,29 @@ String::
 	by the configuration variables in the "diff.foo" section of the


On the Git mailing list, Phillip Wood wrote (reply to this):

Hi John On 05/02/2023 03:46, John Cai via GitGitGadget wrote: > From: John Cai <johncai86@gmail.com> > > It can be useful to specify diff algorithms per file type. For example, > one may want to use the minimal diff algorithm for .json files, another > for .c files, etc. Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types. > Teach the diff machinery to check attributes for a diff algorithm. > Enforce precedence by favoring the command line option, then looking at > attributes, then finally the config. > > To enforce precedence order, set the `xdl_opts_command_line` member > during options pasing to indicate the diff algorithm was set via command > line args. I've only commented on the tests as it looks like Eric and had a careful look at the code > diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh > index 8d1e408bb58..630c98ea65a 100644 > --- a/t/lib-diff-alternative.sh > +++ b/t/lib-diff-alternative.sh > @@ -107,8 +107,27 @@ EOF > > STRATEGY=$1 > > + test_expect_success "$STRATEGY diff from attributes" ' > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index file1 file2 > output && > + test_cmp expect output > + ' > + > test_expect_success "$STRATEGY diff" ' > - test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > + test_cmp expect output > + ' > + > + test_expect_success "$STRATEGY diff command line precedence before attributes" ' > + echo "file* diff-algorithm=meyers" >.gitattributes && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > + test_cmp expect output > + ' > + > + test_expect_success "$STRATEGY diff attributes precedence before config" ' > + git config diff.algorithm default && > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && This test passes with or without the code changes in this patch. I think you need to drop --diff-algorithm=$STRATEGY from the diff command. > test_cmp expect output > ' > > @@ -166,5 +185,11 @@ EOF > test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output && > test_cmp expect output > ' > + > + test_expect_success "$STRATEGY diff from attributes" ' > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index uniq1 uniq2 > output && > + test_cmp expect output This test also passes with or without the code changes in this patch. It is the same as the first test added above but with files that give the same diff irrespective of the algorithm chosen so I don't think it is doing anything useful. Unless I've missed something it should be dropped. Best Wishes Phillip > + ' > } >

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Feb 6, 2023 at 11:46 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > On 05/02/2023 03:46, John Cai via GitGitGadget wrote: > > Teach the diff machinery to check attributes for a diff algorithm. > > Enforce precedence by favoring the command line option, then looking at > > attributes, then finally the config. > > > > To enforce precedence order, set the `xdl_opts_command_line` member > > during options pasing to indicate the diff algorithm was set via command > > line args. > > I've only commented on the tests as it looks like Eric and had a careful > look at the code I only very quickly ran my eye over the code; didn't even really read it.

On the Git mailing list, John Cai wrote (reply to this):

Hi Phillip, On 6 Feb 2023, at 11:27, Phillip Wood wrote: > Hi John > > On 05/02/2023 03:46, John Cai via GitGitGadget wrote: >> From: John Cai <johncai86@gmail.com> >> >> It can be useful to specify diff algorithms per file type. For example, >> one may want to use the minimal diff algorithm for .json files, another >> for .c files, etc. > > Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types. At $DAYJOB, there has been a discussion and request for a feature like this [1]. One use case that came up was to be able to set a different diff algorithm for .json files. 1. https://gitlab.com/gitlab-org/gitaly/-/issues/2591 > >> Teach the diff machinery to check attributes for a diff algorithm. >> Enforce precedence by favoring the command line option, then looking at >> attributes, then finally the config. >> >> To enforce precedence order, set the `xdl_opts_command_line` member >> during options pasing to indicate the diff algorithm was set via command >> line args. > > I've only commented on the tests as it looks like Eric and had a careful look at the code > >> diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh >> index 8d1e408bb58..630c98ea65a 100644 >> --- a/t/lib-diff-alternative.sh >> +++ b/t/lib-diff-alternative.sh >> @@ -107,8 +107,27 @@ EOF >> STRATEGY=$1 >> + test_expect_success "$STRATEGY diff from attributes" ' >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index file1 file2 > output && >> + test_cmp expect output >> + ' >> + >> test_expect_success "$STRATEGY diff" ' >> - test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && >> + test_cmp expect output >> + ' >> + >> + test_expect_success "$STRATEGY diff command line precedence before attributes" ' >> + echo "file* diff-algorithm=meyers" >.gitattributes && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && >> + test_cmp expect output >> + ' >> + >> + test_expect_success "$STRATEGY diff attributes precedence before config" ' >> + git config diff.algorithm default && >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > > This test passes with or without the code changes in this patch. I think you need to drop --diff-algorithm=$STRATEGY from the diff command. an oversight indeed, thanks for catching this. > >> test_cmp expect output >> ' >> @@ -166,5 +185,11 @@ EOF >> test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output && >> test_cmp expect output >> ' >> + >> + test_expect_success "$STRATEGY diff from attributes" ' >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index uniq1 uniq2 > output && >> + test_cmp expect output > > This test also passes with or without the code changes in this patch. It is the same as the first test added above but with files that give the same diff irrespective of the algorithm chosen so I don't think it is doing anything useful. Unless I've missed something it should be dropped. I should have been more thorough with this one as well, thanks. > > Best Wishes > > Phillip > >> + ' >> } >>

On the Git mailing list, Elijah Newren wrote (reply to this):

Hi John, On Mon, Feb 6, 2023 at 12:02 PM John Cai <johncai86@gmail.com> wrote: > > Hi Phillip, > > On 6 Feb 2023, at 11:27, Phillip Wood wrote: > > > Hi John > > > > On 05/02/2023 03:46, John Cai via GitGitGadget wrote: > >> From: John Cai <johncai86@gmail.com> > >> > >> It can be useful to specify diff algorithms per file type. For example, > >> one may want to use the minimal diff algorithm for .json files, another > >> for .c files, etc. > > > > Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types. > > At $DAYJOB, there has been a discussion and request for a feature like this [1]. > One use case that came up was to be able to set a different diff algorithm for > .json files. > > 1. https://gitlab.com/gitlab-org/gitaly/-/issues/2591 A couple points: First, there seems to be a misunderstanding in that issue. In particular, the merge algorithm does call into the xdiff library to do the three-way content merge of individual files, and when it does so, it has to specify the diff algorithm (or take the default, currently myers). merge-recursive allows the diff algorithm to be specified by the user (there are -Xdiff-algorithm={histogram,minimal,patience,myers} flags to merge/rebase for it), while merge-ort uses histogram (though it uses the same parser as merge-recursive and thus gets the variables set from the -Xdiff-algorithm flag, it just ignores those values and hardcodes histogram). Second, I also think the user request got converted to a particular solution without looking at the wider problem space: The idea seemed to assume "myers" is default for a good reason, and thus asked for an option to use something else. I'm not sure the assumption is valid; I think "myers" is default for historical reasons and histogram is better not just for special Salesforce xml files, but code files too. The output makes more sense to users. So much so that even though my simple testing suggested it had a 2% performance penalty compared to myers, I forced ort to use it[1] even though I designed everything else in that algorithm around eking out maximum performance. Others who have tested the diff algorithms have also found histogram has very similar performance to myers, and oftentimes even beats it[2][3]. Also, worries about invalidating rerere caches[4] was real, but we already paid that price when we switched to ort. And if performance is still a worry, [3] gives me reason to believe we can make our histogram implementation faster. Finally, for the period of time when Palantir was letting me make an internal git distribution (mostly for testing ort), I also carried a patch that changed the default diff algorithm to histogram (not just for ort, but for diff/log/etc. as well). Never had any complaints from the users from it. Perhaps you could do the same in your local version of git used by gitaly? [1] See c8017176ac ("merge-ort: use histogram diff", 2020-12-13) [2] From 85551232b5 ("perf: compare diff algorithms", 2012-03-06): "This does indeed show that histogram diff slightly beats Myers, while patience is much slower than the others." [3] https://github.com/pascalkuthe/imara-diff [4] https://lore.kernel.org/git/20120307114714.GA14990@sigill.intra.peff.net/

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Thu, Feb 09 2023, Elijah Newren wrote: > [...] I > think "myers" is default for historical reasons and histogram is > better not just for special Salesforce xml files, but code files too. > The output makes more sense to users. So much so that even though my > simple testing suggested it had a 2% performance penalty compared to > myers, I forced ort to use it[1] even though I designed everything > else in that algorithm around eking out maximum performance. Others > who have tested the diff algorithms have also found histogram has very > similar performance to myers, and oftentimes even beats it[2][3]. > Also, worries about invalidating rerere caches[4] was real, but we > already paid that price when we switched to ort. FWIW as someone who went through that one-time pain for my many git.git topics it wasn't even a price, it was paying me! Maybe it's just confirmation bias, or looking at the conflicts with fresh eyes, but I found I mis-solved some of them seemingly because the output from the old conflicts was so confusing, but much better with "ort". > And if performance > is still a worry, [3] gives me reason to believe we can make our > histogram implementation faster. Finally, for the period of time when > Palantir was letting me make an internal git distribution (mostly for > testing ort), I also carried a patch that changed the default diff > algorithm to histogram (not just for ort, but for diff/log/etc. as > well). Never had any complaints from the users from it. Perhaps you > could do the same in your local version of git used by gitaly? I think that might be worth considering for GitLab, John? Although the bias has definitely been to go with vanilla git semantics, but it sounds like you might be in favor of endorsing a patch to change the default, so if that happens to solve the problem... > I also think the user request got converted to a particular > solution without looking at the wider problem space: The idea seemed > to assume "myers" is default for a good reason, and thus asked for an > option to use something else. I'm not sure the assumption is valid; Just on the "wider problem", I've also looked at a lot of "bad diffs" and there's interesting cases where histogram does equally bad as the others *from most user's POV*, but from a "let's make a small diff" for a computer it's perfect. I've seen this most commonly with repetative file formats, e.g JSON-like ones are a good example. You've probably looked at this exact thing N times, but in case it's useful consider: $ cat a { thing => 'foo', a => 'b', }, { c => 'd', thing => 'bar', } $ cat b { thing => 'foo', a => 'b', }, { c => 'd', }, { e => 'f', thing => 'bar', } Here all of our diff algorithms generate this equally terrible diff, or a great diff, depending on your POV :): diff --git a/a b/b index 186017c1f38..5afadde1800 100644 --- a/a +++ b/b @@ -4,5 +4,8 @@ }, { c => 'd', +}, +{ + e => 'f', thing => 'bar', } The problem here is that from the user's POV they didn't add a closing bracket, comma, open bracket etc. They added a whole new copy/pasted block, and then modified a key-value in the subsequent one. But the diff engine doesn't know about any of that, and will "helpfully" proceed to "steal" parts of the previous block. All of the cases where users have asked me (in person) about some bad diffs have pretty much come down to this sort of thing. In those cases one of the algorithms sometimes *happened* to find a "better" diff, but in my experience it's been a wrong-clock-is-right-twice-a-day sort of thing. I've wondered if we couldn't have some much more stupid but effective solution to these common cases. All of the ones I remember could basically be expressed as a hypothetical: [diff "c"] balanceBrackets = true Where we'd try as hard as we could not to produce diffs that had un-balanced brackets. I.e. in this case (I manually produced this by converting the brackets[1] to []'s, then changing them back: diff --git a/a b/b index 186017c1f38..05cdd03bfa4 100644 --- a/a +++ b/b @@ -2,7 +2,10 @@ thing => 'foo', a => 'b', }, +{ + x => 'y', +}, { - c => 'd', + c => 'f', thing => 'bar', } That's a much worse diff to a computer (now 11 lines, v.s. 8 lines before), but I'd think to most users that's *much* more understandable, just by knowing just a bit about the language (although I'd argue we could go as far as assuming this in general, with how common balanced brackets[1] are across languages). P.S.: Funny story: at <pastjob> I once helped a user who'd been struggling for hours to turn their already working change into something that made more sense with "git diff". We eventually managed to come up with something that looked "right", I can't remember how, probably some mixture of -U<n>, diff algorithm etc. Their next question was "Ok, so how do I commit this?", referring to "this particular version of the diff". Which, having already spent more time than I'd like to admit in trying to "help" them was a good reminder to first ask what problem we're trying to solve :) 1. By "balanced bracket" I'm referring not just to "{}", but what's considered a "mirrored" character in Unicode. I.e. not just ()[]{}<> etc., but also ∈∋ and the like (see e.g. https://www.compart.com/en/unicode/U+2208) For better or worse the Perl 6 language has this as part of its grammar, see e.g.: https://andrewshitov.com/2018/01/23/embedded-comment-delimiters-in-perl-6/

On the Git mailing list, John Cai wrote (reply to this):

On 23/02/09 12:26AM, Elijah Newren wrote: > Hi John, > > On Mon, Feb 6, 2023 at 12:02 PM John Cai <johncai86@gmail.com> wrote: > > > > Hi Phillip, > > > > On 6 Feb 2023, at 11:27, Phillip Wood wrote: > > > > > Hi John > > > > > > On 05/02/2023 03:46, John Cai via GitGitGadget wrote: > > >> From: John Cai <johncai86@gmail.com> > > >> > > >> It can be useful to specify diff algorithms per file type. For example, > > >> one may want to use the minimal diff algorithm for .json files, another > > >> for .c files, etc. > > > > > > Have you got any examples of why this is useful? I find myself occasionally changing the algorithm when the default gives a sub-optimal diff but I've not really noticed any pattern with respect to file types. > > > > At $DAYJOB, there has been a discussion and request for a feature like this [1]. > > One use case that came up was to be able to set a different diff algorithm for > > .json files. > > > > 1. https://gitlab.com/gitlab-org/gitaly/-/issues/2591 > > A couple points: > > First, there seems to be a misunderstanding in that issue. In > particular, the merge algorithm does call into the xdiff library to do > the three-way content merge of individual files, and when it does so, > it has to specify the diff algorithm (or take the default, currently > myers). merge-recursive allows the diff algorithm to be specified by > the user (there are > -Xdiff-algorithm={histogram,minimal,patience,myers} flags to > merge/rebase for it), while merge-ort uses histogram (though it uses > the same parser as merge-recursive and thus gets the variables set > from the -Xdiff-algorithm flag, it just ignores those values and > hardcodes histogram). > > Second, I also think the user request got converted to a particular > solution without looking at the wider problem space: The idea seemed > to assume "myers" is default for a good reason, and thus asked for an > option to use something else. I'm not sure the assumption is valid; I > think "myers" is default for historical reasons and histogram is > better not just for special Salesforce xml files, but code files too. > The output makes more sense to users. So much so that even though my > simple testing suggested it had a 2% performance penalty compared to > myers, I forced ort to use it[1] even though I designed everything > else in that algorithm around eking out maximum performance. Others > who have tested the diff algorithms have also found histogram has very > similar performance to myers, and oftentimes even beats it[2][3]. > Also, worries about invalidating rerere caches[4] was real, but we > already paid that price when we switched to ort. And if performance > is still a worry, [3] gives me reason to believe we can make our > histogram implementation faster. Finally, for the period of time when > Palantir was letting me make an internal git distribution (mostly for > testing ort), I also carried a patch that changed the default diff > algorithm to histogram (not just for ort, but for diff/log/etc. as > well). Never had any complaints from the users from it. Perhaps you > could do the same in your local version of git used by gitaly? Thanks for that suggestion, Elijah. Changing the diff alg to histogram has been something we've considered doing as well. However, we've gotten customer requests to also have the option to view their diffs with patience. Seeing as the diff algorithm is sometimes a matter of preference, we thought giving users the control through the repository would be nice. thanks John > > [1] See c8017176ac ("merge-ort: use histogram diff", 2020-12-13) > [2] From 85551232b5 ("perf: compare diff algorithms", 2012-03-06): > "This does indeed show that histogram diff slightly beats Myers, while > patience is much slower than the others." > [3] https://github.com/pascalkuthe/imara-diff > [4] https://lore.kernel.org/git/20120307114714.GA14990@sigill.intra.peff.net/

gitgitgadget-git · 2023-02-06T16:56:56Z

Documentation/gitattributes.txt

@@ -736,6 +736,29 @@ String::
 	by the configuration variables in the "diff.foo" section of the


On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Sun, Feb 05 2023, John Cai via GitGitGadget wrote: > From: John Cai <johncai86@gmail.com> > [...] > + > + if (!o->xdl_opts_command_line) { > + static struct attr_check *check; > + const char *one_diff_algo; > + const char *two_diff_algo; > + > + check = attr_check_alloc(); > + attr_check_append(check, git_attr("diff-algorithm")); > + > + git_check_attr(the_repository->index, NULL, one->path, check); > + one_diff_algo = check->items[0].value; > + git_check_attr(the_repository->index, NULL, two->path, check); > + two_diff_algo = check->items[0].value; > + > + if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) && > + !strcmp(one_diff_algo, two_diff_algo)) > + set_diff_algorithm(o, one_diff_algo); > + > + attr_check_free(check); This is a bit nitpicky, but I for one would find this much easier to read with some shorter variables, here just with "a" rather than "one_diff_algo", "b" instead of "two_diff_algo", and splitting "the_repository->index" into "istate" (untested): + if (!o->xdl_opts_command_line) { + static struct attr_check *check; + const char *a; + const char *b; + struct index_state *istate = the_repository->index; + + check = attr_check_alloc(); + attr_check_append(check, git_attr("diff-algorithm")); + + git_check_attr(istate, NULL, one->path, check); + a = check->items[0].value; + git_check_attr(istate, NULL, two->path, check); + b = check->items[0].value; + + if (!ATTR_UNSET(a) && !ATTR_UNSET(b) && !strcmp(a, b)) + set_diff_algorithm(o, a); + + attr_check_free(check); + } That also nicely keeps the line length shorter. > @@ -333,6 +333,8 @@ struct diff_options { > int prefix_length; > const char *stat_sep; > int xdl_opts; > + /* If xdl_opts has been set via the command line. */ > + int xdl_opts_command_line; > > /* see Documentation/diff-options.txt */ > char **anchors; > diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh > index 8d1e408bb58..630c98ea65a 100644 > --- a/t/lib-diff-alternative.sh > +++ b/t/lib-diff-alternative.sh > @@ -107,8 +107,27 @@ EOF > > STRATEGY=$1 > > + test_expect_success "$STRATEGY diff from attributes" ' > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index file1 file2 > output && > + test_cmp expect output > + ' > + > test_expect_success "$STRATEGY diff" ' > - test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && Nit: The usual style is ">output", not "> output". > + test_cmp expect output > + ' > + > + test_expect_success "$STRATEGY diff command line precedence before attributes" ' > + echo "file* diff-algorithm=meyers" >.gitattributes && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > + test_cmp expect output > + ' > + > + test_expect_success "$STRATEGY diff attributes precedence before config" ' > + git config diff.algorithm default && > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > test_cmp expect output > ' > > @@ -166,5 +185,11 @@ EOF > test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output && > test_cmp expect output > ' > + > + test_expect_success "$STRATEGY diff from attributes" ' > + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && > + test_must_fail git diff --no-index uniq1 uniq2 > output && > + test_cmp expect output > + ' > } For some non-nitpicking, I do worry about exposing this as a DoS vector, e.g. here's a diff between two distant points in git.git with the various algorithms: $ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0' Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0 Time (abs ≡): 42.121 s [User: 41.879 s, System: 0.144 s] Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0 Time (abs ≡): 35.634 s [User: 35.473 s, System: 0.160 s] Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0 Time (abs ≡): 46.912 s [User: 46.657 s, System: 0.228 s] Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0 Time (abs ≡): 33.233 s [User: 33.072 s, System: 0.160 s] Summary 'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran 1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0' 1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0' 1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0' Now, all of those are very slow overall, but some much more than others. I seem to recall that the non-default ones also had some pathological cases. Another thing to think about is that we've so far considered the diff algorithm to be purely about presentation, with some notable exceptions such as "patch-id". I've advocated for us getting to the point of having an in-repo .gitconfig or .gitattributes before with a whitelist of settings like diff.context for certain paths, or a diff.orderFile. But those seem easy to promise future behavior for, v.s. an entire diff algorithm (which we of course had before, but now we'd have it in repository data). Maybe that's not a distinction worth worrying about, just putting that out there. I think if others are concerned about the above something that would neatly side-step those is to have it opt-in via the .git/config somehow, similar to e.g. how you can commit *.gpg content, put this in .gitattributes: *.gpg diff=gpg But not have it do anything until this is in the repo's .git/config (or similar): [diff "gpg"] textconv = gpg --no-tty --decrypt For that you could still keep the exact .gitattributes format you have here, i.e.: file* diff-algorithm=$STRATEGY But we to pick it up we'd need either: [diff-algorithm] histogram = myers Or: [diff-algorithm "histogram"] allow = true The former form being one that would allow you to map the .gitattributes of the repo (but maybe that would be redundant to .git/info/attributes)...

On the Git mailing list, John Cai wrote (reply to this):

Hi Ævar, On 6 Feb 2023, at 11:39, Ævar Arnfjörð Bjarmason wrote: > On Sun, Feb 05 2023, John Cai via GitGitGadget wrote: > >> From: John Cai <johncai86@gmail.com> >> [...] >> + >> + if (!o->xdl_opts_command_line) { >> + static struct attr_check *check; >> + const char *one_diff_algo; >> + const char *two_diff_algo; >> + >> + check = attr_check_alloc(); >> + attr_check_append(check, git_attr("diff-algorithm")); >> + >> + git_check_attr(the_repository->index, NULL, one->path, check); >> + one_diff_algo = check->items[0].value; >> + git_check_attr(the_repository->index, NULL, two->path, check); >> + two_diff_algo = check->items[0].value; >> + >> + if (!ATTR_UNSET(one_diff_algo) && !ATTR_UNSET(two_diff_algo) && >> + !strcmp(one_diff_algo, two_diff_algo)) >> + set_diff_algorithm(o, one_diff_algo); >> + >> + attr_check_free(check); > > This is a bit nitpicky, but I for one would find this much easier to > read with some shorter variables, here just with "a" rather than > "one_diff_algo", "b" instead of "two_diff_algo", and splitting > "the_repository->index" into "istate" (untested): > > + if (!o->xdl_opts_command_line) { > + static struct attr_check *check; > + const char *a; > + const char *b; > + struct index_state *istate = the_repository->index; > + > + check = attr_check_alloc(); > + attr_check_append(check, git_attr("diff-algorithm")); > + > + git_check_attr(istate, NULL, one->path, check); > + a = check->items[0].value; > + git_check_attr(istate, NULL, two->path, check); > + b = check->items[0].value; > + > + if (!ATTR_UNSET(a) && !ATTR_UNSET(b) && !strcmp(a, b)) > + set_diff_algorithm(o, a); > + > + attr_check_free(check); > + } > > That also nicely keeps the line length shorter. Thanks, I think this does look better. > >> @@ -333,6 +333,8 @@ struct diff_options { >> int prefix_length; >> const char *stat_sep; >> int xdl_opts; >> + /* If xdl_opts has been set via the command line. */ >> + int xdl_opts_command_line; >> >> /* see Documentation/diff-options.txt */ >> char **anchors; >> diff --git a/t/lib-diff-alternative.sh b/t/lib-diff-alternative.sh >> index 8d1e408bb58..630c98ea65a 100644 >> --- a/t/lib-diff-alternative.sh >> +++ b/t/lib-diff-alternative.sh >> @@ -107,8 +107,27 @@ EOF >> >> STRATEGY=$1 >> >> + test_expect_success "$STRATEGY diff from attributes" ' >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index file1 file2 > output && >> + test_cmp expect output >> + ' >> + >> test_expect_success "$STRATEGY diff" ' >> - test_must_fail git diff --no-index "--$STRATEGY" file1 file2 > output && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && > > Nit: The usual style is ">output", not "> output". noted! > >> + test_cmp expect output >> + ' >> + >> + test_expect_success "$STRATEGY diff command line precedence before attributes" ' >> + echo "file* diff-algorithm=meyers" >.gitattributes && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && >> + test_cmp expect output >> + ' >> + >> + test_expect_success "$STRATEGY diff attributes precedence before config" ' >> + git config diff.algorithm default && >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index "--diff-algorithm=$STRATEGY" file1 file2 > output && >> test_cmp expect output >> ' >> >> @@ -166,5 +185,11 @@ EOF >> test_must_fail git diff --no-index "--$STRATEGY" uniq1 uniq2 > output && >> test_cmp expect output >> ' >> + >> + test_expect_success "$STRATEGY diff from attributes" ' >> + echo "file* diff-algorithm=$STRATEGY" >.gitattributes && >> + test_must_fail git diff --no-index uniq1 uniq2 > output && >> + test_cmp expect output >> + ' >> } > > For some non-nitpicking, I do worry about exposing this as a DoS vector, > e.g. here's a diff between two distant points in git.git with the > various algorithms: > > $ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0' > Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0 > Time (abs ≡): 42.121 s [User: 41.879 s, System: 0.144 s] > > Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0 > Time (abs ≡): 35.634 s [User: 35.473 s, System: 0.160 s] > > Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0 > Time (abs ≡): 46.912 s [User: 46.657 s, System: 0.228 s] > > Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0 > Time (abs ≡): 33.233 s [User: 33.072 s, System: 0.160 s] > > Summary > 'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran > 1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0' > 1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0' > 1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0' Thanks for this analysis. To clarify, .gitconfig's diff.algorithm setting is already an attack vector right? I see how this would be adding another one. That being said, here's a separate issue. I benchmarked the usage of .gitattributes as introduced in this patch series, and indeed it does look like there is additional latency: $ echo "* diff-algorithm=patience >> .gitattributes $ hyperfine -r 5 'git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0' ✭ Benchmark 1: git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0 Time (mean ± σ): 889.4 ms ± 113.8 ms [User: 715.7 ms, System: 65.3 ms] Range (min … max): 764.1 ms … 1029.3 ms 5 runs $ hyperfine -r 5 'git-bin-wrapper diff v2.0.0 v2.28.0' ✭ Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 Time (mean ± σ): 2.146 s ± 0.368 s [User: 0.827 s, System: 0.243 s] Range (min … max): 1.883 s … 2.795 s 5 runs and I imagine the latency scales with the size of .gitattributes. Although I'm not familiar with other parts of the codebase and how it deals with the latency introduced by reading attributes files. > > Now, all of those are very slow overall, but some much more than > others. I seem to recall that the non-default ones also had some > pathological cases. > > Another thing to think about is that we've so far considered the diff > algorithm to be purely about presentation, with some notable exceptions > such as "patch-id". > > I've advocated for us getting to the point of having an in-repo > .gitconfig or .gitattributes before with a whitelist of settings like > diff.context for certain paths, or a diff.orderFile. > > But those seem easy to promise future behavior for, v.s. an entire diff > algorithm (which we of course had before, but now we'd have it in > repository data). > > Maybe that's not a distinction worth worrying about, just putting that > out there. > > I think if others are concerned about the above something that would > neatly side-step those is to have it opt-in via the .git/config somehow, > similar to e.g. how you can commit *.gpg content, put this in > .gitattributes: > > *.gpg diff=gpg > > But not have it do anything until this is in the repo's .git/config (or > similar): > > [diff "gpg"] > textconv = gpg --no-tty --decrypt > > For that you could still keep the exact .gitattributes format you have > here, i.e.: > > file* diff-algorithm=$STRATEGY > > But we to pick it up we'd need either: > > [diff-algorithm] > histogram = myers > > Or: > > [diff-algorithm "histogram"] > allow = true This would help address slowness from the diff algorithm itself. I'm not opposed to adding this config if this attack vector is concerning to people. However, it wouldn't help address the additional latency of scanning .gitattributes to find the diff algorithm. Would a separate config to allow gitattributes be helpful here? [diff-algorithm] attributes = true > > The former form being one that would allow you to map the .gitattributes > of the repo (but maybe that would be redundant to > .git/info/attributes)...

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi John On 06/02/2023 20:37, John Cai wrote: > On 6 Feb 2023, at 11:39, Ævar Arnfjörð Bjarmason wrote: >> On Sun, Feb 05 2023, John Cai via GitGitGadget wrote: >> For some non-nitpicking, I do worry about exposing this as a DoS vector, >> e.g. here's a diff between two distant points in git.git with the >> various algorithms: >> >> $ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0' >> Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0 >> Time (abs ≡): 42.121 s [User: 41.879 s, System: 0.144 s] >> >> Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0 >> Time (abs ≡): 35.634 s [User: 35.473 s, System: 0.160 s] >> >> Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0 >> Time (abs ≡): 46.912 s [User: 46.657 s, System: 0.228 s] >> >> Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0 >> Time (abs ≡): 33.233 s [User: 33.072 s, System: 0.160 s] >> >> Summary >> 'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran >> 1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0' >> 1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0' >> 1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0' > > Thanks for this analysis. To clarify, .gitconfig's diff.algorithm setting is > already an attack vector right? .gitconfig is under the user's control though whereas .gitattributes is attacker controlled if one clones a malicious repository. Having said the worst results above are for the historgram algorithm that merge-ort uses internally and no one has complained about ort's performance. > I see how this would be adding another one. > > That being said, here's a separate issue. I benchmarked the usage of > .gitattributes as introduced in this patch series, and indeed it does look like > there is additional latency: > > $ echo "* diff-algorithm=patience >> .gitattributes > $ hyperfine -r 5 'git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0' ✭ > Benchmark 1: git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0 > Time (mean ± σ): 889.4 ms ± 113.8 ms [User: 715.7 ms, System: 65.3 ms] > Range (min … max): 764.1 ms … 1029.3 ms 5 runs > > $ hyperfine -r 5 'git-bin-wrapper diff v2.0.0 v2.28.0' ✭ > Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 > Time (mean ± σ): 2.146 s ± 0.368 s [User: 0.827 s, System: 0.243 s] > Range (min … max): 1.883 s … 2.795 s 5 runs > > and I imagine the latency scales with the size of .gitattributes. Although I'm > not familiar with other parts of the codebase and how it deals with the latency > introduced by reading attributes files. Ouch! Thanks for benchmarking that is a suspiciously large slow down. I've a feeling the attributes code parses all the .gitattribute files from scratch for each path that's queried, so there may be scope for making it a bit smarter. I see some slow down but no where near as much $ hyperfine -r 3 'bin-wrappers/git diff --diff-algorithm=patience --no-color --no-color-moved v2.0.0 v2.28.0' Benchmark 1: bin-wrappers/git diff --diff-algorithm=patience --no-color --no-color-moved v2.0.0 v2.28.0 Time (mean ± σ): 1.996 s ± 0.008 s [User: 1.706 s, System: 0.286 s] Range (min … max): 1.989 s … 2.004 s 3 runs $ hyperfine -r 5 'bin-wrappers/git diff --no-color --no-color-moved v2.0.0 v2.28.0' Benchmark 1: bin-wrappers/git diff --no-color --no-color-moved v2.0.0 v2.28.0 Time (mean ± σ): 2.238 s ± 0.037 s [User: 1.880 s, System: 0.350 s] Range (min … max): 2.216 s … 2.303 s 5 runs Perhaps I'm over simplifying but having read the issue you linked to I couldn't help feeling that the majority of users might be satisfied by just changing gitlab to use the patience algorithm when generating diffs. The idea to use .gitattributes seems to have come from Patrick rather than a user request. This is slightly off topic but one thing I'd really like is a way to tell diff use automatically use --diff-words on some files (e.g. Documentation/*) Best Wishes Phillip >> Now, all of those are very slow overall, but some much more than >> others. I seem to recall that the non-default ones also had some >> pathological cases. >> >> Another thing to think about is that we've so far considered the diff >> algorithm to be purely about presentation, with some notable exceptions >> such as "patch-id". >> >> I've advocated for us getting to the point of having an in-repo >> .gitconfig or .gitattributes before with a whitelist of settings like >> diff.context for certain paths, or a diff.orderFile. >> >> But those seem easy to promise future behavior for, v.s. an entire diff >> algorithm (which we of course had before, but now we'd have it in >> repository data). >> >> Maybe that's not a distinction worth worrying about, just putting that >> out there. >> >> I think if others are concerned about the above something that would >> neatly side-step those is to have it opt-in via the .git/config somehow, >> similar to e.g. how you can commit *.gpg content, put this in >> .gitattributes: >> >> *.gpg diff=gpg >> >> But not have it do anything until this is in the repo's .git/config (or >> similar): >> >> [diff "gpg"] >> textconv = gpg --no-tty --decrypt >> >> For that you could still keep the exact .gitattributes format you have >> here, i.e.: >> >> file* diff-algorithm=$STRATEGY >> >> But we to pick it up we'd need either: >> >> [diff-algorithm] >> histogram = myers >> >> Or: >> >> [diff-algorithm "histogram"] >> allow = true > > This would help address slowness from the diff algorithm itself. I'm not opposed > to adding this config if this attack vector is concerning to people. > > However, it wouldn't help address the additional latency of scanning > .gitattributes to find the diff algorithm. > > Would a separate config to allow gitattributes be helpful here? > > [diff-algorithm] > attributes = true > >> >> The former form being one that would allow you to map the .gitattributes >> of the repo (but maybe that would be redundant to >> .git/info/attributes)...

On the Git mailing list, John Cai wrote (reply to this):

Hi Phillip, On 7 Feb 2023, at 9:55, Phillip Wood wrote: > Hi John > > On 06/02/2023 20:37, John Cai wrote: >> On 6 Feb 2023, at 11:39, Ævar Arnfjörð Bjarmason wrote: >>> On Sun, Feb 05 2023, John Cai via GitGitGadget wrote: >>> For some non-nitpicking, I do worry about exposing this as a DoS vector, >>> e.g. here's a diff between two distant points in git.git with the >>> various algorithms: >>> >>> $ hyperfine -r 1 -L a patience,minimal,histogram,myers 'git diff --diff-algorithm={a} v2.0.0 v2.28.0' >>> Benchmark 1: git diff --diff-algorithm=patience v2.0.0 v2.28.0 >>> Time (abs ≡): 42.121 s [User: 41.879 s, System: 0.144 s] >>> >>> Benchmark 2: git diff --diff-algorithm=minimal v2.0.0 v2.28.0 >>> Time (abs ≡): 35.634 s [User: 35.473 s, System: 0.160 s] >>> >>> Benchmark 3: git diff --diff-algorithm=histogram v2.0.0 v2.28.0 >>> Time (abs ≡): 46.912 s [User: 46.657 s, System: 0.228 s] >>> >>> Benchmark 4: git diff --diff-algorithm=myers v2.0.0 v2.28.0 >>> Time (abs ≡): 33.233 s [User: 33.072 s, System: 0.160 s] >>> >>> Summary >>> 'git diff --diff-algorithm=myers v2.0.0 v2.28.0' ran >>> 1.07 times faster than 'git diff --diff-algorithm=minimal v2.0.0 v2.28.0' >>> 1.27 times faster than 'git diff --diff-algorithm=patience v2.0.0 v2.28.0' >>> 1.41 times faster than 'git diff --diff-algorithm=histogram v2.0.0 v2.28.0' >> >> Thanks for this analysis. To clarify, .gitconfig's diff.algorithm setting is >> already an attack vector right? > > .gitconfig is under the user's control though whereas .gitattributes is attacker controlled if one clones a malicious repository. Having said the worst results above are for the historgram algorithm that merge-ort uses internally and no one has complained about ort's performance. > >> I see how this would be adding another one. >> >> That being said, here's a separate issue. I benchmarked the usage of >> .gitattributes as introduced in this patch series, and indeed it does look like >> there is additional latency: >> >> $ echo "* diff-algorithm=patience >> .gitattributes >> $ hyperfine -r 5 'git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0' ✭ >> Benchmark 1: git-bin-wrapper diff --diff-algorithm=patience v2.0.0 v2.28.0 >> Time (mean ± σ): 889.4 ms ± 113.8 ms [User: 715.7 ms, System: 65.3 ms] >> Range (min … max): 764.1 ms … 1029.3 ms 5 runs >> >> $ hyperfine -r 5 'git-bin-wrapper diff v2.0.0 v2.28.0' ✭ >> Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 >> Time (mean ± σ): 2.146 s ± 0.368 s [User: 0.827 s, System: 0.243 s] >> Range (min … max): 1.883 s … 2.795 s 5 runs >> >> and I imagine the latency scales with the size of .gitattributes. Although I'm >> not familiar with other parts of the codebase and how it deals with the latency >> introduced by reading attributes files. > > Ouch! Thanks for benchmarking that is a suspiciously large slow down. I've a feeling the attributes code parses all the .gitattribute files from scratch for each path that's queried, so there may be scope for making it a bit smarter. I see some slow down but no where near as much Yes, definitely worth looking into how this can be sped up. > > > $ hyperfine -r 3 'bin-wrappers/git diff --diff-algorithm=patience --no-color --no-color-moved v2.0.0 v2.28.0' > Benchmark 1: bin-wrappers/git diff --diff-algorithm=patience --no-color --no-color-moved v2.0.0 v2.28.0 > Time (mean ± σ): 1.996 s ± 0.008 s [User: 1.706 s, System: 0.286 s] > Range (min … max): 1.989 s … 2.004 s 3 runs > > $ hyperfine -r 5 'bin-wrappers/git diff --no-color --no-color-moved v2.0.0 v2.28.0' > Benchmark 1: bin-wrappers/git diff --no-color --no-color-moved v2.0.0 v2.28.0 > Time (mean ± σ): 2.238 s ± 0.037 s [User: 1.880 s, System: 0.350 s] > Range (min … max): 2.216 s … 2.303 s 5 runs > > > Perhaps I'm over simplifying but having read the issue you linked to I couldn't help feeling that the majority of users might be satisfied by just changing gitlab to use the patience algorithm when generating diffs. Right, I recognize this is a judgment call that may be best left up to the list. We don't have a way in GitLab to change the diff algorithm currently. Of course that can be implemented outside of Git, but having it as part of Git may have a wider benefit for users who would appreciate the convenience of automatically having certain files use one diff algorithm and other files use another, without having to pass in the diff algorithm through the command line each time. > > The idea to use .gitattributes seems to have come from Patrick rather than a user request. > > This is slightly off topic but one thing I'd really like is a way to tell diff use automatically use --diff-words on some files (e.g. Documentation/*) I think it's a bit similar to the spirit of this desired feature. thanks John > > Best Wishes > > Phillip > >>> Now, all of those are very slow overall, but some much more than >>> others. I seem to recall that the non-default ones also had some >>> pathological cases. >>> >>> Another thing to think about is that we've so far considered the diff >>> algorithm to be purely about presentation, with some notable exceptions >>> such as "patch-id". >>> >>> I've advocated for us getting to the point of having an in-repo >>> .gitconfig or .gitattributes before with a whitelist of settings like >>> diff.context for certain paths, or a diff.orderFile. >>> >>> But those seem easy to promise future behavior for, v.s. an entire diff >>> algorithm (which we of course had before, but now we'd have it in >>> repository data). >>> >>> Maybe that's not a distinction worth worrying about, just putting that >>> out there. >>> >>> I think if others are concerned about the above something that would >>> neatly side-step those is to have it opt-in via the .git/config somehow, >>> similar to e.g. how you can commit *.gpg content, put this in >>> .gitattributes: >>> >>> *.gpg diff=gpg >>> >>> But not have it do anything until this is in the repo's .git/config (or >>> similar): >>> >>> [diff "gpg"] >>> textconv = gpg --no-tty --decrypt >>> >>> For that you could still keep the exact .gitattributes format you have >>> here, i.e.: >>> >>> file* diff-algorithm=$STRATEGY >>> >>> But we to pick it up we'd need either: >>> >>> [diff-algorithm] >>> histogram = myers >>> >>> Or: >>> >>> [diff-algorithm "histogram"] >>> allow = true >> >> This would help address slowness from the diff algorithm itself. I'm not opposed >> to adding this config if this attack vector is concerning to people. >> >> However, it wouldn't help address the additional latency of scanning >> .gitattributes to find the diff algorithm. >> >> Would a separate config to allow gitattributes be helpful here? >> >> [diff-algorithm] >> attributes = true >> >>> >>> The former form being one that would allow you to map the .gitattributes >>> of the repo (but maybe that would be redundant to >>> .git/info/attributes)...

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Feb 07 2023, Phillip Wood wrote: > This is slightly off topic but one thing I'd really like is a way to > tell diff use automatically use --diff-words on some files > (e.g. Documentation/*) Unlike changing the algorithm, -U options, diff.orderFile etc. doing that would give you a diff that can't be applied with "git apply" or other tools that expect a valid unified diff. So I can imagine that it would be neat in some contexts, but such a change would have much wider implications than options that tweak how a valid unified diff looks, or is generated. We'd need some way to mark a diff as "for ad-hoc viewing only". But as it sounds like you want this for git.git the Documentation/doc-diff script is much better than anything word-diff could spew out, as it diffs the resulting generated docs. I wonder (but haven't tried) whether you can't "diff" that using the same method that can be used to diff binary files using a custom driver. Hrm, except that in that case (with includes etc) there isn't really a 1=1 mapping between files within Documentation/ and generated docs (due to includes etc.). But I suppose it could be used only for those files that 1=1 correspond to the generated manpages.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 14, 2023 at 06:35:00PM -0800, Elijah Newren wrote: > > Just a small counter-point, since I happened to be looking at myers vs > > patience for something elsewhere in the thread, but: > > > > git show 35bd13fcd2caa4185bf3729655ca20b6a5fe9b6f builtin/add.c > > "fatal: bad object 35bd13fcd2caa4185bf3729655ca20b6a5fe9b6f" > > Is that a local commit of yours? Oh, sorry. It's not my commit, but it may have been something I picked up off the list. The version from Junio is 20b813d7d3d. > * They had two people annotating the diffs and independently > scoring them, and then checked for agreement between their answers > afterwards. (No, they didn't always agree, but they did have > substantial agreement.) Kind of a weird methodology. I'd think you'd do better to put those diffs in front of a lot of people to get a consensus. But OK... > For the (again, non-identical) diffs modifying non-code, they found > (see table 11) that: > * 14.9% of the myers diffs are better > * 13.4% of the histogram diffs are better > * 71.6% of the diffs have equal quality > > For the (non-identical) diffs modifying code, they found (again, see > table 11) that: > * 16.9% of the myers diffs are better > * 62.6% of the histogram diffs are better > * 20.6% of the diffs have equal quality > > A ratio of 4 to 1 for histogram being better on code diffs is pretty > weighty to me. Interesting. They do slightly worse on non-code, but probably not enough to worry about. Despite my complaint above, I do find those results compelling. Or at least, it is much closer to being real data than anything I have seen before, so it seems like the best guide we have. :) Like I said, I don't have a very strong opinion myself. Mostly I wanted to note that while it is easy for us to remember the times we saw a Myers diff, said "yuck", and tried patience and it was better, most of us may be blind to the opposite case. If patience is not the default, most of us would not have hit "yuck" case for it at all. FWIW, I coincidentally hit this case earlier today where patience does a _much_ better job than myers: https://lore.kernel.org/git/Y+vV8Ifkj1QV7KF0@coredump.intra.peff.net/ So don't count me as an objection, but just a curious observer. ;) -Peff

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jeff King <peff@peff.net> writes: >> > git show 35bd13fcd2caa4185bf3729655ca20b6a5fe9b6f builtin/add.c >> >> "fatal: bad object 35bd13fcd2caa4185bf3729655ca20b6a5fe9b6f" >> >> Is that a local commit of yours? > > Oh, sorry. It's not my commit, but it may have been something I picked > up off the list. The version from Junio is 20b813d7d3d. $ git show -s --notes=amlog --format='%N %s' 20b813d7d3d Message-Id: <patch-v2-1.3-71c7922b25f-20230206T225639Z-avarab@gmail.com> add: remove "add.interactive.useBuiltin" & Perl "git add--interactive" > FWIW, I coincidentally hit this case earlier today where patience does a > _much_ better job than myers: > > https://lore.kernel.org/git/Y+vV8Ifkj1QV7KF0@coredump.intra.peff.net/ > > So don't count me as an objection, but just a curious observer. ;) Yeah, I know some people consider that Patience is the best thing since sliced bread, and it does produce more readable diffs often. I haven't used it often enough to cite/remember a bad case, though.

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Peff Thanks for the counter example On 11/02/2023 01:59, Jeff King wrote: > Just a small counter-point, since I happened to be looking at myers vs > patience for something elsewhere in the thread, but: > > git show 35bd13fcd2caa4185bf3729655ca20b6a5fe9b6f builtin/add.c > > looks slightly better to me with myers, even though it is 2 lines > longer. The issue is that patience and histogram are very eager to use > blank lines as anchor points, so a diff like: > > -some words > - > -and some more > +unrelated content > + > +but it happens to also be two paragraphs > > in myers becomes: > > -some words > +unrelated content > > -and some more > +but it happens to also be two paragraphs > > in patience (here I'm using single lines, but in practice these may be > paragraphs, or stanzas of code). I think that's also the _strength_ of > patience in many cases, but it really depends on the content. Indeed. Ironically as there are no unique context lines in that example the blank lines are being matched by patience implementation falling back to the myers algorithm. Normally the myers implementation tries to avoid matching common context lines between two blocks of changed lines but I think because in this case it is only called on a small part of the file the blank lines are not common enough to trigger that heuristic. I've got a patch[1] that stops the patience implementation falling back to the myers algorithm and just trims any leading and trailing context. On the whole it I think it gives more readable diffs but I've not got any systematic data to back that up. I also suspect there are pathological cases such as each line in the file being duplicated where the falling back to the myers algorithm gives a much better result. > Replacing > a multi-stanza block with another one may be the best explanation for > what happened. Or the two stanzas may be independent, and showing the > change for each one may be better. > > I'm not sure which one happens more often. And you'd probably want to > weight it by how good/bad the change is. In the example I showed I don't > find patience very much worse, since it's already a pretty ugly diff. > But in cases where patience shines, it may be making things > significantly more readable. I agree that having some data would be useful if we're going to change the default but collecting it would entail quite a bit of work and as the scoring is subjective we'd want a few people doing it. It's great that someone has done that for the histogram algorithm in the paper Elijah cited. > I don't have a super strong opinion, but I just wanted to chime in that > it is not clear to me that patience/histogram is always a win over myers > (yes, I know your examples were comparing patience vs histogram, but the > larger thread is discussing the other). Agreed, there are definitely cases where myers gives more readable diffs, I think if we're going to change the default the question we need to answer is which algorithm gives the best result most of the time. Best Wishes Phillip [1] https://github.com/phillipwood/git/commits/pure-patience-diff

On the Git mailing list, Phillip Wood wrote (reply to this):

Hi Ævar On 07/02/2023 17:27, Ævar Arnfjörð Bjarmason wrote: > > On Tue, Feb 07 2023, Phillip Wood wrote: > >> This is slightly off topic but one thing I'd really like is a way to >> tell diff use automatically use --diff-words on some files >> (e.g. Documentation/*) > > Unlike changing the algorithm, -U options, diff.orderFile etc. doing > that would give you a diff that can't be applied with "git apply" or > other tools that expect a valid unified diff. That's a good point, we'd probably would want to guard it by checking if the output is going to a tty. > So I can imagine that it would be neat in some contexts, but such a > change would have much wider implications than options that tweak how a > valid unified diff looks, or is generated. > > We'd need some way to mark a diff as "for ad-hoc viewing only". > > But as it sounds like you want this for git.git the > Documentation/doc-diff script is much better than anything word-diff > could spew out, as it diffs the resulting generated docs. It's true that I should look at the doc-diff script for git, but it is a wider wish for text files in other projects as well. Best Wishes Phillip > I wonder (but haven't tried) whether you can't "diff" that using the > same method that can be used to diff binary files using a custom driver. > > Hrm, except that in that case (with includes etc) there isn't really a > 1=1 mapping between files within Documentation/ and generated docs (due > to includes etc.). But I suppose it could be used only for those files > that 1=1 correspond to the generated manpages.

On the Git mailing list, Jeff King wrote (reply to this):

On Wed, Feb 15, 2023 at 02:44:59PM +0000, Phillip Wood wrote: > Indeed. Ironically as there are no unique context lines in that example the > blank lines are being matched by patience implementation falling back to the > myers algorithm. Normally the myers implementation tries to avoid matching > common context lines between two blocks of changed lines but I think because > in this case it is only called on a small part of the file the blank lines > are not common enough to trigger that heuristic. I've got a patch[1] that > stops the patience implementation falling back to the myers algorithm and > just trims any leading and trailing context. On the whole it I think it > gives more readable diffs but I've not got any systematic data to back that > up. I also suspect there are pathological cases such as each line in the > file being duplicated where the falling back to the myers algorithm gives a > much better result. Ah, I should have suspected it was something like that (since one of the purposes of patience is trying not to key on meaningless lines). I tried your patch on my test case, and the result is even more readable than the myers output, because it really was effectively a complete rewrite of the function. It is, of course, not the minimal diff. I'm not sure if there would be cases where you'd prefer the minimal. I guess if each stanza's change really was independent of the others. But if there is no commonality except for blank lines, I find it hard to imagine that it's much worse to just treat the whole thing as a block. Anyway, thank you (and Elijah) for explaining. I'm getting more comfortable with the idea of switching the default. -Peff

gitgitgadget-git · 2023-02-06T16:56:59Z

User Ævar Arnfjörð Bjarmason <avarab@gmail.com> has been added to the cc: list.

gitgitgadget-git · 2023-02-07T17:58:53Z

Documentation/gitattributes.txt

@@ -736,6 +736,29 @@ String::
 	by the configuration variables in the "diff.foo" section of the


On the Git mailing list, Jeff King wrote (reply to this):

On Sun, Feb 05, 2023 at 03:46:21AM +0000, John Cai via GitGitGadget wrote: > +`diff-algorithm` > +^^^^^^^^^^^^^^^^ > + > +The attribute `diff-algorithm` affects which algorithm Git uses to generate > +diffs. This allows defining diff algorithms per file extension. Precedence rules > +are as follows, in order from highest to lowest: > + > +*Command line option* > + > +Pass in the `--diff-algorithm` command line option int git-diff(1) > + > +*Git attributes* > + > +------------------------ > +*.json diff-algorithm=histogram > +------------------------ > + > +*Git config* > + > +---------------------------------------------------------------- > +[diff] > + algorithm = histogram > +---------------------------------------------------------------- From the user's perspective, this is weirdly inconsistent with the existing diff attributes, which would be more like: # in .gitattributes *.json diff=json # in config [diff "json"] algorithm = histogram I know why one might choose the scheme you did; it kicks in if the repo sets the algorithm, without users having to set up any extra config. Which is sort of nice, if we assume that malicious actors don't have any incentive to pick the algorithm. In theory they don't, though I saw Ævar mention possible DoS elsewhere in the thread. Side note: It's also possible that algorithm selection could be required to trigger a separate security bug (say, a buffer overflow in the patience code or something), so restricting that works in a belt-and-suspenders way. But that somehow feels like like the wrong side of the paranoia-vs-feature line. So I dunno. I recognize that this scheme fulfills your immediate needs better, but I fear that we'll be stuck with a weird split between "diff" and "diff-*" attributes forever. In the long run, having a way for the repo to say "and here is some config I recommend to you" would give you the best of both, but that is a challenging topic that has been discussed and punted on for many years. -Peff

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Feb 07 2023, Jeff King wrote: > On Sun, Feb 05, 2023 at 03:46:21AM +0000, John Cai via GitGitGadget wrote: > From the user's perspective, this is weirdly inconsistent with the > existing diff attributes, which would be more like: > > # in .gitattributes > *.json diff=json > > # in config > [diff "json"] > algorithm = histogram That does look more elegant. > I know why one might choose the scheme you did; it kicks in if the repo > sets the algorithm, without users having to set up any extra config. > Which is sort of nice, if we assume that malicious actors don't have any > incentive to pick the algorithm. In theory they don't, though I saw Ævar > mention possible DoS elsewhere in the thread. > > Side note: It's also possible that algorithm selection could be > required to trigger a separate security bug (say, a buffer overflow in > the patience code or something), so restricting that works in a > belt-and-suspenders way. But that somehow feels like like the wrong > side of the paranoia-vs-feature line. > > So I dunno. I recognize that this scheme fulfills your immediate needs > better, but I fear that we'll be stuck with a weird split between "diff" > and "diff-*" attributes forever. In the long run, having a way for the > repo to say "and here is some config I recommend to you" would give you > the best of both, but that is a challenging topic that has been > discussed and punted on for many years. If (and I'm not taking a stance on whether this is the case here) we think that a hypothetical .gitconfig in-repo combined with .gitattributes would be more elegant for this or other cases, I don't see why the path to some very limited version of that where we read literally one whitelisted config variable from such a .gitconfig, and ignore everything else, wouldn't be OK. I.e. the more general in-repo .gitconfig discussion (of which there was some discussion this past Git Merge in Chicago) includes potentially much harder things Like what to do with changing genreal in-repo config, if and how to compare that against some local whitelist of what sort of config we should trust (or none) etc. etc. But if we agreed that we'd be willing to trust the remote with some config-by-another-name unconditionally, as is being proposed here, we could easily read that one thing from an in-repo .gitconfig, ignore everything else, and then just document that interface as a special-case. We have several such "limited config" readers already, and the relevant bits of the config parser already have to handle arbitrary "git config" files in-tree, due to .gitmodules. We even have special-snowflake config in the configspace already that doesn't obey the usual rules: The trace2.* config is only read from the --system config, as it has an inherent bootsrtapping problem in wanting to log as early as possible. The advantage of the limited in-repo .gitconfig would be that if we think the interface was more elegant this would be a way to start small in a way that would be future-proof as far as the permanent UX goes. If there's interest the read_early_config(), read_very_early_config() and gitmodules_config_oid() functions are good places to look. The last one in particular could be generalized pretty easily to read a limited in-tree .gitconfig. Although plugging it into the "config order" might be tricky, I haven't tried. Even a minimal implementation of such a thing would probably want a "git config --repository" (or whatever we call the flag) to go with "--local", "--system" etc, to spew out something sensible from "--show-origin" etc.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> From the user's perspective, this is weirdly inconsistent with the >> existing diff attributes, which would be more like: >> >> # in .gitattributes >> *.json diff=json >> >> # in config >> [diff "json"] >> algorithm = histogram > > That does look more elegant. We use attributes to define what it is, while configurations are used to define what to do on different things. The separation of attributes and configuration came not from "elegance" or "security" but from a lot more practical reasons. For a tracked file, the fact that it contains JSON text as payload does not change per user who cloned the project, or per platform the user used to do so. In-tree .gitattributes that the project controls is a perfect way to define what it is for each file. On the other hand, the diff program suitable to compare two JSON files may vary per platform (your favorite Windows program may not be available to me) and per user (a platform may support more than one and the choice becomes the matter of personal taste). The security aspect of giving users tighter control over which exact programs are to be run by not allowing the attributes or so called in-tree configuration mechansim is a small bonus that fell out as a consequence.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Feb 07 2023, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >>> From the user's perspective, this is weirdly inconsistent with the >>> existing diff attributes, which would be more like: >>> >>> # in .gitattributes >>> *.json diff=json >>> >>> # in config >>> [diff "json"] >>> algorithm = histogram >> >> That does look more elegant. > > We use attributes to define what it is, while configurations are > used to define what to do on different things. The separation of > attributes and configuration came not from "elegance" or "security" > but from a lot more practical reasons. > > For a tracked file, the fact that it contains JSON text as payload > does not change per user who cloned the project, or per platform the > user used to do so. In-tree .gitattributes that the project > controls is a perfect way to define what it is for each file. > > On the other hand, the diff program suitable to compare two JSON > files may vary per platform (your favorite Windows program may not > be available to me) and per user (a platform may support more than > one and the choice becomes the matter of personal taste). > > The security aspect of giving users tighter control over which exact > programs are to be run by not allowing the attributes or so called > in-tree configuration mechansim is a small bonus that fell out as a > consequence. To clarify, I'm not suggesting that we ever read arbitrary parts of the "diff.<driver>.<key>" config space, but that we could whitelist one set of "diff.<driver>.<known-key>"="<known-values>". The reason to do it being that, as Jeff points out, that config mechanism is already established, and arguably more elegant. I.e. that this is a natural fit for "diff=<driver>" in .gitattributes), and that mechanism is already tied to config as Jeff's example shows. Some of your reply seems like it assumed that I was suggesting that we read "diff.algorithm" (i.e. a config setting to apply for all paths) from an in-repo .gitconfig. I wasn't suggesting that, nor that we open Pandora's box on starting a limited in-repo .gitconfig support with anything remotely to do with executing arbitrary commands (which the full "diff.<driver>.<key>" space does support).

On the Git mailing list, Junio C Hamano wrote (reply to this):

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > To clarify, I'm not suggesting that we ever read arbitrary parts of the > "diff.<driver>.<key>" config space, but that we could whitelist one set > of "diff.<driver>.<known-key>"="<known-values>". When the value names the path to an executable or the command line to invoke a program, there is no "portable" value that is useful. Whitelisting macOS only program only because its pathname is one of the known values does not help me running something else.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Tue, Feb 07 2023, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> To clarify, I'm not suggesting that we ever read arbitrary parts of the >> "diff.<driver>.<key>" config space, but that we could whitelist one set >> of "diff.<driver>.<known-key>"="<known-values>". > > When the value names the path to an executable or the command line > to invoke a program, there is no "portable" value that is useful. > Whitelisting macOS only program only because its pathname is one of > the known values does not help me running something else. We're not talking about invoking an executable of any kind, but a mechanism to pick up a "diff.algorithm" setting referring to one of our built-in algorithms, either via the in-repo .gitattributes, as in John's original proposal: *.json diff-algorithm=histogram Or via some hybrid .gitattributes/.gitconfig mechanism, where (if I understand Jeff's suggestion correctly) the two would contain: # In .gitattributes *.json diff=json # In .gitconfig [diff "json"] algorithm = histogram In terms of implementation this would be pretty much what we do with "submodule.<name>.update", where "Allowed values here are checkout, rebase, merge or none.". I don't see where you're getting this suggestion of "[a] diff program suitable to compare two JSON files" from something I said. That is a thing we generally support, but I've only mentioned that arbitrary command execution (e.g. in [1]) as something we explicitly *don't* want to support in this context. The "submodule.<name>.update" example is particularly relevant because it also supports arbitrary command execution with "!"-prefixed values (the rest being the arbitrary command). But that's something we specifically exclude when reading the in-repo version, it's only allowed in the user-controlled config. I'm not saying that we should be going for the read-just-the-one-key in-repo .gitconfig approach here, just *if* we like that interface better the implementation should be relatively straightforward, assuming that we like the proposal in principle, and are just bikeshedding about what the mecanhism to enable it would be. The advantage of that being that it more naturally slots into existing config, e.g. "diff.<driver>.binary=<bool>", this being just another "diff.<driver>.<known-key>". And that the way to do that securely is something we're doing for several "submodule.*" keys, so if we pick that approach tweaking or stealing ideas from that code should be relatively straightforward. 1. https://lore.kernel.org/git/230206.865yce7n1w.gmgdl@evledraar.gmail.com/

On the Git mailing list, John Cai wrote (reply to this):

On 23/02/07 12:56PM, Jeff King wrote: > On Sun, Feb 05, 2023 at 03:46:21AM +0000, John Cai via GitGitGadget wrote: > > > +`diff-algorithm` > > +^^^^^^^^^^^^^^^^ > > + > > +The attribute `diff-algorithm` affects which algorithm Git uses to generate > > +diffs. This allows defining diff algorithms per file extension. Precedence rules > > +are as follows, in order from highest to lowest: > > + > > +*Command line option* > > + > > +Pass in the `--diff-algorithm` command line option int git-diff(1) > > + > > +*Git attributes* > > + > > +------------------------ > > +*.json diff-algorithm=histogram > > +------------------------ > > + > > +*Git config* > > + > > +---------------------------------------------------------------- > > +[diff] > > + algorithm = histogram > > +---------------------------------------------------------------- > > From the user's perspective, this is weirdly inconsistent with the > existing diff attributes, which would be more like: > > # in .gitattributes > *.json diff=json > > # in config > [diff "json"] > algorithm = histogram Thanks for this suggestion, Peff. What I like about this is that it builds off of the existing diff.<driver> scheme rather than inventing another one. Additionally, we won't get hit with a performance penalty since we already read gitattrbitues to see if a driver has been set or not. Thinking out loud, if we add "algorithm" as a key for diff.<driver>, it would be mutually exclusive with "command" where "command" takes precedence, correct? > > I know why one might choose the scheme you did; it kicks in if the repo > sets the algorithm, without users having to set up any extra config. > Which is sort of nice, if we assume that malicious actors don't have any > incentive to pick the algorithm. In theory they don't, though I saw Ævar > mention possible DoS elsewhere in the thread. > > Side note: It's also possible that algorithm selection could be > required to trigger a separate security bug (say, a buffer overflow in > the patience code or something), so restricting that works in a > belt-and-suspenders way. But that somehow feels like like the wrong > side of the paranoia-vs-feature line. > > So I dunno. I recognize that this scheme fulfills your immediate needs > better, but I fear that we'll be stuck with a weird split between "diff" > and "diff-*" attributes forever. In the long run, having a way for the > repo to say "and here is some config I recommend to you" would give you > the best of both, but that is a challenging topic that has been > discussed and punted on for many years. > > -Peff

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Feb 09, 2023 at 11:34:00AM -0500, John Cai wrote: > > From the user's perspective, this is weirdly inconsistent with the > > existing diff attributes, which would be more like: > > > > # in .gitattributes > > *.json diff=json > > > > # in config > > [diff "json"] > > algorithm = histogram > > Thanks for this suggestion, Peff. What I like about this is that it builds off > of the existing diff.<driver> scheme rather than inventing another one. > Additionally, we won't get hit with a performance penalty since we already read > gitattrbitues to see if a driver has been set or not. > > Thinking out loud, if we add "algorithm" as a key for diff.<driver>, it would be > mutually exclusive with "command" where "command" takes precedence, correct? Yes. I think the documentation would be something like "When generating a diff internally, use <algorithm> to do so." And external diffs obviously skip that code path internally. I didn't check whether your patch does this or not, but should this feature (however it is engaged) apply to diff-stats, too? I think --patience, etc, does, which makes sense. But that might be something worth elaborating in the description, too. -Peff

gitgitgadget-git · 2023-02-07T17:58:57Z

User Jeff King <peff@peff.net> has been added to the cc: list.

gitgitgadget-git · 2023-02-08T21:59:10Z

There are issues in commit 94b5ef6:
diff: Teach diff to read attribute diff-algorithm
Prefixed commit message must be in lower case

gitgitgadget-git · 2023-02-08T21:59:11Z

There are issues in commit 65c8b2d:
wip: read config
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2023-02-08T22:01:45Z

There are issues in commit 5d81c3f:
wip: read config
Commit checks stopped - the message is too short
Commit not signed off

gitgitgadget-git · 2023-02-09T07:56:16Z

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

john-cai · 2023-02-14T21:17:52Z

/preview

gitgitgadget-git · 2023-02-14T21:20:53Z

Preview email sent as pull.1452.v2.git.git.1676409650.gitgitgadget@gmail.com

john-cai · 2023-02-14T21:39:42Z

/submit

john-cai · 2023-02-20T16:54:15Z

/preview

gitgitgadget-git · 2023-02-20T16:55:03Z

Preview email sent as pull.1452.v4.git.git.1676912099.gitgitgadget@gmail.com

john-cai · 2023-02-20T20:54:50Z

/preview

gitgitgadget-git · 2023-02-20T20:57:19Z

Preview email sent as pull.1452.v4.git.git.1676926635.gitgitgadget@gmail.com

john-cai · 2023-02-20T21:04:04Z

/submit

gitgitgadget-git · 2023-02-20T21:04:59Z

Submitted as pull.1452.v4.git.git.1676927082.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1452/john-cai/jc/attr-diff-algo-v4

To fetch this version to local tag pr-git-1452/john-cai/jc/attr-diff-algo-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1452/john-cai/jc/attr-diff-algo-v4

gitgitgadget-git · 2023-02-21T17:36:24Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> When a repository contains different kinds of files, it may be desirable to
> use different algorithms based on file type. This is currently not feasible
> through the command line or using git configs. However, we can leverage the
> fact that gitattributes are path aware.
> ...
> To address some of the performance concerns in the previous series, a
> benchmark shows that now only a minor performance penalty is incurred, now
> that we are no longer adding an additional attributes parsing call:
>
> $ echo "*.[ch] diff=other" >> .gitattributes $ hyperfine -r 10 -L a
> git-bin-wrapper,git '{a} -c diff.other.algorithm=myers diff v2.0.0 v2.28.0'
> Benchmark 1: git-bin-wrapper -c diff.other.algorithm=myers diff v2.0.0
> v2.28.0 Time (mean ± σ): 716.3 ms ± 3.8 ms [User: 660.2 ms, System: 50.8 ms]
> Range (min … max): 709.8 ms … 720.6 ms 10 runs
>
> Benchmark 2: git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0 Time
> (mean ± σ): 704.3 ms ± 2.9 ms [User: 656.6 ms, System: 44.3 ms] Range (min …
> max): 700.1 ms … 708.6 ms 10 runs
>
> Summary 'git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0' ran 1.02 ±
> 0.01 times faster than 'git-bin-wrapper -c diff.other.algorithm=myers diff
> v2.0.0 v2.28.0'

Hopefully this round can immediately be merged down to 'next'?
Thanks.

gitgitgadget-git · 2023-02-21T18:09:27Z

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 21, 2023 at 9:34 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > When a repository contains different kinds of files, it may be desirable to
> > use different algorithms based on file type. This is currently not feasible
> > through the command line or using git configs. However, we can leverage the
> > fact that gitattributes are path aware.
> > ...
> > To address some of the performance concerns in the previous series, a
> > benchmark shows that now only a minor performance penalty is incurred, now
> > that we are no longer adding an additional attributes parsing call:
> >
> > $ echo "*.[ch] diff=other" >> .gitattributes $ hyperfine -r 10 -L a
> > git-bin-wrapper,git '{a} -c diff.other.algorithm=myers diff v2.0.0 v2.28.0'
> > Benchmark 1: git-bin-wrapper -c diff.other.algorithm=myers diff v2.0.0
> > v2.28.0 Time (mean ± σ): 716.3 ms ± 3.8 ms [User: 660.2 ms, System: 50.8 ms]
> > Range (min … max): 709.8 ms … 720.6 ms 10 runs
> >
> > Benchmark 2: git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0 Time
> > (mean ± σ): 704.3 ms ± 2.9 ms [User: 656.6 ms, System: 44.3 ms] Range (min …
> > max): 700.1 ms … 708.6 ms 10 runs
> >
> > Summary 'git -c diff.other.algorithm=myers diff v2.0.0 v2.28.0' ran 1.02 ±
> > 0.01 times faster than 'git-bin-wrapper -c diff.other.algorithm=myers diff
> > v2.0.0 v2.28.0'
>
> Hopefully this round can immediately be merged down to 'next'?
> Thanks.

I'll leave that up to you and John, but are we risking merging code
that could go unused or that we need to fundamentally change?  I don't
see how to handle the issues over at
https://lore.kernel.org/git/647D3D49-B85B-4B66-A857-695CFF9685EE@gmail.com/
(not that I've spent much thought on it besides asking how things are
going to work), and would be worried that we'd end up needing to
change the mechanism somehow to cover the things John ultimately
wants.

gitgitgadget-git · 2023-02-21T18:54:21Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

Elijah Newren <newren@gmail.com> writes:

> I'll leave that up to you and John, but are we risking merging code
> that could go unused or that we need to fundamentally change?  I don't
> see how to handle the issues over at
> https://lore.kernel.org/git/647D3D49-B85B-4B66-A857-695CFF9685EE@gmail.com/

If this is useful enough for desktop users already, then that is a
good enough reason to take it, I would say.

GitLab can easily add WebUI that says "You can define what diff
algorithm is used for files with which suffix" to allow you to
configure a table like this:

	json | histogram
	c    | patience
	*    | myers

and populate the server-side equivalent of .git/info/attributes and
.git/config based on that, and without anything further than the
posted patches, the result should just work, no?

gitgitgadget-git · 2023-02-21T19:39:26Z

On the Git mailing list, John Cai wrote (reply to this):

On 21 Feb 2023, at 13:51, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
>
>> I'll leave that up to you and John, but are we risking merging code
>> that could go unused or that we need to fundamentally change?  I don't
>> see how to handle the issues over at
>> https://lore.kernel.org/git/647D3D49-B85B-4B66-A857-695CFF9685EE@gmail.com/
>
> If this is useful enough for desktop users already, then that is a
> good enough reason to take it, I would say.
>
> GitLab can easily add WebUI that says "You can define what diff
> algorithm is used for files with which suffix" to allow you to
> configure a table like this:
>
> 	json | histogram
> 	c    | patience
> 	*    | myers
>
> and populate the server-side equivalent of .git/info/attributes and
> .git/config based on that, and without anything further than the
> posted patches, the result should just work, no?

Yes, that is my thinking too. The goal was to make this user friendly for
everyone on the Git command line, and then GitLab can hook into this accordingly
to make it ussable with the GitLab UI.

thanks
John

gitgitgadget-git · 2023-02-21T20:18:19Z

On the Git mailing list, Elijah Newren wrote (reply to this):

On Tue, Feb 21, 2023 at 11:36 AM John Cai <johncai86@gmail.com> wrote:
>
> On 21 Feb 2023, at 13:51, Junio C Hamano wrote:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> >> I'll leave that up to you and John, but are we risking merging code
> >> that could go unused or that we need to fundamentally change?  I don't
> >> see how to handle the issues over at
> >> https://lore.kernel.org/git/647D3D49-B85B-4B66-A857-695CFF9685EE@gmail.com/
> >
> > If this is useful enough for desktop users already, then that is a
> > good enough reason to take it, I would say.
> >
> > GitLab can easily add WebUI that says "You can define what diff
> > algorithm is used for files with which suffix" to allow you to
> > configure a table like this:
> >
> >       json | histogram
> >       c    | patience
> >       *    | myers
> >
> > and populate the server-side equivalent of .git/info/attributes and
> > .git/config based on that, and without anything further than the
> > posted patches, the result should just work, no?
>
> Yes, that is my thinking too. The goal was to make this user friendly for
> everyone on the Git command line, and then GitLab can hook into this accordingly
> to make it ussable with the GitLab UI.

Ok, sounds good.  Merge away.  :-)

gitgitgadget-git · 2023-02-21T23:27:47Z

This patch series was integrated into seen via 14a1eb9.

gitgitgadget-git · 2023-02-22T23:21:39Z

This patch series was integrated into seen via e00c58a.

gitgitgadget-git · 2023-02-22T23:21:40Z

This patch series was integrated into next via 2375640.

gitgitgadget-git · 2023-02-23T03:17:12Z

There was a status update in the "New Topics" section about the branch jc/diff-algo-attribute on the Git mailing list:

The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.

Will merge to 'master'.
source: <pull.1452.v4.git.git.1676927082.gitgitgadget@gmail.com>

gitgitgadget-git · 2023-02-24T00:05:18Z

This patch series was integrated into seen via b32135d.

gitgitgadget-git · 2023-02-24T20:11:38Z

This patch series was integrated into seen via d05da27.

gitgitgadget-git · 2023-02-24T20:30:14Z

There was a status update in the "Cooking" section about the branch jc/diff-algo-attribute on the Git mailing list:

The "diff" drivers specified by the "diff" attribute attached to
paths can now specify which algorithm (e.g. histogram) to use.

Will merge to 'master'.
source: <pull.1452.v4.git.git.1676927082.gitgitgadget@gmail.com>

gitgitgadget-git · 2023-02-24T23:13:12Z

This patch series was integrated into seen via 01f1ff8.

gitgitgadget-git · 2023-02-25T07:34:19Z

This patch series was integrated into seen via 5fa284c.

gitgitgadget-git · 2023-02-27T18:11:55Z

This patch series was integrated into seen via a0134ff.

gitgitgadget-git · 2023-02-27T19:03:29Z

This patch series was integrated into seen via ece8dc9.

gitgitgadget-git · 2023-02-27T19:03:30Z

This patch series was integrated into master via ece8dc9.

gitgitgadget-git · 2023-02-27T19:03:31Z

This patch series was integrated into next via ece8dc9.

gitgitgadget-git · 2023-02-27T19:03:35Z

Closed via ece8dc9.

john-cai force-pushed the jc/attr-diff-algo branch from 8e8fee8 to 602ab51 Compare February 4, 2023 22:49

john-cai force-pushed the jc/attr-diff-algo branch from 602ab51 to 8e73793 Compare February 5, 2023 03:44

gitgitgadget-git bot reviewed Feb 5, 2023

View reviewed changes

gitgitgadget-git bot reviewed Feb 6, 2023

View reviewed changes

gitgitgadget-git bot reviewed Feb 7, 2023

View reviewed changes

john-cai force-pushed the jc/attr-diff-algo branch from 8e73793 to 65c8b2d Compare February 8, 2023 21:57

john-cai force-pushed the jc/attr-diff-algo branch from 65c8b2d to f8c0223 Compare February 8, 2023 21:59

john-cai force-pushed the jc/attr-diff-algo branch 5 times, most recently from 6ff3dbc to cb03056 Compare February 14, 2023 21:04

This comment was marked as off-topic.

Sign in to view

john-cai force-pushed the jc/attr-diff-algo branch from 430d6f8 to 77e66ab Compare February 20, 2023 17:21

gitgitgadget-git bot added the next label Feb 22, 2023

gitgitgadget-git bot added the master label Feb 27, 2023

gitgitgadget-git bot closed this Feb 27, 2023

		@@ -736,6 +736,29 @@ String::
		by the configuration variables in the "diff.foo" section of the

		@@ -3437,6 +3437,22 @@ static int diff_filepair_is_phoney(struct diff_filespec *one,
		return !DIFF_FILE_VALID(one) && !DIFF_FILE_VALID(two);

Teach diff to honor diff algorithms set through git attributes #1452

Teach diff to honor diff algorithms set through git attributes #1452

Conversation

john-cai commented Feb 4, 2023 • edited

gitgitgadget-git bot commented Feb 4, 2023

john-cai commented Feb 5, 2023

gitgitgadget-git bot commented Feb 5, 2023

john-cai commented Feb 5, 2023

gitgitgadget-git bot commented Feb 5, 2023

gitgitgadget-git bot Feb 5, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot commented Feb 5, 2023

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot commented Feb 6, 2023

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 9, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 9, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 9, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 6, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 15, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 15, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 15, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 15, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 15, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot commented Feb 6, 2023

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 7, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 9, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot Feb 11, 2023

Choose a reason for hiding this comment

gitgitgadget-git bot commented Feb 7, 2023

gitgitgadget-git bot commented Feb 8, 2023

gitgitgadget-git bot commented Feb 8, 2023

gitgitgadget-git bot commented Feb 8, 2023

gitgitgadget-git bot commented Feb 9, 2023

john-cai commented Feb 14, 2023

gitgitgadget-git bot commented Feb 14, 2023

john-cai commented Feb 14, 2023

This comment was marked as off-topic.

john-cai commented Feb 20, 2023

gitgitgadget-git bot commented Feb 20, 2023

john-cai commented Feb 20, 2023

gitgitgadget-git bot commented Feb 20, 2023

john-cai commented Feb 20, 2023

john-cai commented Feb 4, 2023 •

edited