Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range diff with ranges lacking dotdot #841

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 21, 2021

In https://lore.kernel.org/git/20200306091933.mx2jmurmdnsjua4b@pengutronix.de/, it was reported that git range-diff does not handle commit ranges like rev^!. This patch series fixes that.

Changes since v5:

  • The commit marks are now cleared in a separate loop.
  • The regression test no longer looks only for "rang" but for "range" in the error message.
  • We now pass NULL as opt parameter to setup_revisions(), not 0.

Changes since v4:

  • The commit marks are now cleared in is_range_diff_range().
  • A regression test now verifies that HEAD^{/something..or other} isn't mistaken for a commit range.
  • The manual page no longer mentions "symmetric range", to avoid contentious language.

Changes since v3:

  • The revision machinery is now used directly to validate the commit ranges.

Changes since v2:

  • Move the helper function from revision.c to range-diff.c and rename it.
  • Use a regex to make it easier to understand what we're trying to match.
  • Fix the documentation that claimed that we used git merge-base internally when git range-diff parses ...-style arguments, which is not the case.

Changes since v1:

  • In addition to git range-diff, git format-patch --range-diff gets the same improvement.
  • The comment talking about ^@ was removed.
  • The parsing was made a bit safer (e.g. catching ! by its own as an invalid range).

Cc: Uwe Kleine-König u.kleine-koenig@pengutronix.de
cc: Eric Sunshine sunshine@sunshineco.com

@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch from a2b7896 to 041456b Compare January 21, 2021 14:36
@dscho
Copy link
Member Author

dscho commented Jan 21, 2021

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 21, 2021

Submitted as pull.841.git.1611267638.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v1

To fetch this version to local tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v1

@@ -11,6 +11,11 @@ N_("git range-diff [<options>] <base> <old-tip> <new-tip>"),
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Currently, when called with exactly two arguments, we test for a literal
> `..` in each of the two.
>
> However, `<commit>^!` is a perfectly valid commit range, equivalent to
> `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
> gitrevisions[7].
>
> In preparation for allowing more sophisticated ways to specify commit
> ranges, let's refactor the conditional into its own function.

Makes sense.

@@ -11,6 +11,30 @@ N_("git range-diff [<options>] <base> <old-tip> <new-tip>"),
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Thu, Jan 21, 2021 at 5:22 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
> described to specify commit ranges that `range-diff` does not yet
> accept: "<commit>^!" and "<commit>^-<n>".
>
> Let's accept them.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> @@ -13,7 +13,26 @@ NULL
>  static int is_range(const char *range)
>  {
> +       if (strstr(range, ".."))
> +               return 1;
> +
> +       i = strlen(range);
> +       c = i ? range[--i] : 0;
> +       if (c == '!')
> +               i--; /* might be ...^! or ...^@ */
> +       else if (isdigit(c)) {
> +               /* handle ...^-<n> */
> +               while (i > 2 && isdigit(range[--i]))
> +                       ; /* keep trimming trailing digits */
> +               if (i < 2 || range[i--] != '-')
> +                       return 0;
> +       } else
> +               return 0;
> +
> +       return i > 0 && range[i] == '^';
>  }

Is this something that the --range-diff option of git-format-patch
will want to do, as well? At present,
builtin/log.c:infer_range_diff_ranges() detects a range only by
checking for "..", much like this function did before this patch. If
so, perhaps this function can be part of the public range-diff API
(or, indeed, part of some other more general API if it's not really
specific to range-diff).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Thu, 21 Jan 2021, Eric Sunshine wrote:

> On Thu, Jan 21, 2021 at 5:22 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
> > described to specify commit ranges that `range-diff` does not yet
> > accept: "<commit>^!" and "<commit>^-<n>".
> >
> > Let's accept them.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> > @@ -13,7 +13,26 @@ NULL
> >  static int is_range(const char *range)
> >  {
> > +       if (strstr(range, ".."))
> > +               return 1;
> > +
> > +       i = strlen(range);
> > +       c = i ? range[--i] : 0;
> > +       if (c == '!')
> > +               i--; /* might be ...^! or ...^@ */
> > +       else if (isdigit(c)) {
> > +               /* handle ...^-<n> */
> > +               while (i > 2 && isdigit(range[--i]))
> > +                       ; /* keep trimming trailing digits */
> > +               if (i < 2 || range[i--] != '-')
> > +                       return 0;
> > +       } else
> > +               return 0;
> > +
> > +       return i > 0 && range[i] == '^';
> >  }
>
> Is this something that the --range-diff option of git-format-patch
> will want to do, as well?

Thank you for pointing that out. I should have checked via `git grep
'strstr.*"\.\."'` myself. There are two more instances, one in
`rev-parse.c` and the other in `revision.c`, but both are necessary as-are
because their return value is actually used to further disect a `..`-style
commit range.

Thanks,
Dscho

> At present, builtin/log.c:infer_range_diff_ranges() detects a range only
> by checking for "..", much like this function did before this patch. If
> so, perhaps this function can be part of the public range-diff API (or,
> indeed, part of some other more general API if it's not really specific
> to range-diff).

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 21, 2021

User Eric Sunshine <sunshine@sunshineco.com> has been added to the cc: list.

@@ -11,6 +11,30 @@ N_("git range-diff [<options>] <base> <old-tip> <new-tip>"),
NULL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
> described to specify commit ranges that `range-diff` does not yet
> accept: "<commit>^!" and "<commit>^-<n>".
>
> Let's accept them.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  builtin/range-diff.c  | 21 ++++++++++++++++++++-
>  t/t3206-range-diff.sh |  8 ++++++++
>  2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 551d3e689cb..6097635c432 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c
> @@ -13,7 +13,26 @@ NULL
>  
>  static int is_range(const char *range)
>  {
> -	return !!strstr(range, "..");
> +	size_t i;
> +	char c;
> +
> +	if (strstr(range, ".."))
> +		return 1;
> +
> +	i = strlen(range);
> +	c = i ? range[--i] : 0;
> +	if (c == '!')
> +		i--; /* might be ...^! or ...^@ */

I am confused.  If it ends with '!', I do not see how it can end
with "^@".

If the input were "!", i gets strlen("!") which is 1, c gets '!'
while predecrementing i down to 0, and we notice c is '!' and
decrement i again to make it (size_t)(-1) which is a fairly large
number.

Then we skip all the else/if cascade, ensure that i is positive, and
happily access range[i], which likely is way out of bounds (but it
probably is almost one turn around the earth out of bounds, it may
access just a single byte before the array).

Am I reading the code right?

IOW, "git range-diff \! A..B" would do something strange, I would
guess.

> +	else if (isdigit(c)) {
> +		/* handle ...^-<n> */
> +		while (i > 2 && isdigit(range[--i]))
> +			; /* keep trimming trailing digits */
> +		if (i < 2 || range[i--] != '-')
> +			return 0;
> +	} else
> +		return 0;
> +
> +	return i > 0 && range[i] == '^';
>  }
>  
>  int cmd_range_diff(int argc, const char **argv, const char *prefix)
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 6eb344be031..e217cecac9e 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -150,6 +150,14 @@ test_expect_success 'simple A B C (unmodified)' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'A^! and A^-<n> (unmodified)' '
> +	git range-diff --no-color topic^! unmodified^-1 >actual &&
> +	cat >expect <<-EOF &&
> +	1:  $(test_oid t4) = 1:  $(test_oid u4) s/12/B/
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'trivial reordering' '
>  	git range-diff --no-color master topic reordered >actual &&
>  	cat >expect <<-EOF &&

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 21 Jan 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
> > described to specify commit ranges that `range-diff` does not yet
> > accept: "<commit>^!" and "<commit>^-<n>".
> >
> > Let's accept them.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  builtin/range-diff.c  | 21 ++++++++++++++++++++-
> >  t/t3206-range-diff.sh |  8 ++++++++
> >  2 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> > index 551d3e689cb..6097635c432 100644
> > --- a/builtin/range-diff.c
> > +++ b/builtin/range-diff.c
> > @@ -13,7 +13,26 @@ NULL
> >
> >  static int is_range(const char *range)
> >  {
> > -	return !!strstr(range, "..");
> > +	size_t i;
> > +	char c;
> > +
> > +	if (strstr(range, ".."))
> > +		return 1;
> > +
> > +	i = strlen(range);
> > +	c = i ? range[--i] : 0;
> > +	if (c == '!')
> > +		i--; /* might be ...^! or ...^@ */
>
> I am confused.  If it ends with '!', I do not see how it can end
> with "^@".

Bah. This is a left-over from an earlier version. I tried to hide the fact
that I had misunderstood `<rev>^@` to specify a commit range. Oh well.

> If the input were "!", i gets strlen("!") which is 1, c gets '!'
> while predecrementing i down to 0, and we notice c is '!' and
> decrement i again to make it (size_t)(-1) which is a fairly large
> number.

Right, guarding that `range[--i]` only by `i` is not enough. My idea was
to exit early if the string is too short, anyway, i.e. if `i < 3`.

> Then we skip all the else/if cascade, ensure that i is positive, and
> happily access range[i], which likely is way out of bounds (but it
> probably is almost one turn around the earth out of bounds, it may
> access just a single byte before the array).
>
> Am I reading the code right?
>
> IOW, "git range-diff \! A..B" would do something strange, I would
> guess.

Right. Will be fixed in the next iteration.

Thanks,
Dscho

@@ -28,6 +28,19 @@ Finally, the list of matching commits is shown in the order of the
second commit range, with unmatched commits being inserted just after
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> There are three forms, depending whether the user specifies one, two or
> three non-option arguments. We've never actually explained how this
> works in the manual, so let's explain it.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  Documentation/git-range-diff.txt | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index 9701c1e5fdd..76359baf26d 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -28,6 +28,19 @@ Finally, the list of matching commits is shown in the order of the
>  second commit range, with unmatched commits being inserted just after
>  all of their ancestors have been shown.
>  
> +There are three ways to specify the commit ranges:
> +
> +- `<range1> <range2>`: Either commit range can be of the form
> +  `<base>..<rev>`, `<rev>^!` or `<rev>^-<n>`. See `SPECIFYING RANGES`
> +  in linkgit:gitrevisions[7] for more details.

Good.

> +- `<rev1>...<rev2>`. This resembles the symmetric ranges mentioned in
> +  the `SPECIFYING RANGES` section of linkgit:gitrevisions[7], and is
> +  equivalent to `<base>..<rev1> <base>..<rev2>` where `<base>` is the
> +  merge base as obtained via `git merge-base <rev1> <rev2>`.

Does this merely resemble?  Isn't it exactly what a symmetric range is?

> +- `<base> <rev1> <rev2>`: This is equivalent to `<base>..<rev1>
> +  <base>..<rev2>`.

Nice to see this documented.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 21 Jan 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > There are three forms, depending whether the user specifies one, two or
> > three non-option arguments. We've never actually explained how this
> > works in the manual, so let's explain it.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  Documentation/git-range-diff.txt | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> > index 9701c1e5fdd..76359baf26d 100644
> > --- a/Documentation/git-range-diff.txt
> > +++ b/Documentation/git-range-diff.txt
> > @@ -28,6 +28,19 @@ Finally, the list of matching commits is shown in the order of the
> >  second commit range, with unmatched commits being inserted just after
> >  all of their ancestors have been shown.
> >
> > +There are three ways to specify the commit ranges:
> > +
> > +- `<range1> <range2>`: Either commit range can be of the form
> > +  `<base>..<rev>`, `<rev>^!` or `<rev>^-<n>`. See `SPECIFYING RANGES`
> > +  in linkgit:gitrevisions[7] for more details.
>
> Good.
>
> > +- `<rev1>...<rev2>`. This resembles the symmetric ranges mentioned in
> > +  the `SPECIFYING RANGES` section of linkgit:gitrevisions[7], and is
> > +  equivalent to `<base>..<rev1> <base>..<rev2>` where `<base>` is the
> > +  merge base as obtained via `git merge-base <rev1> <rev2>`.
>
> Does this merely resemble?  Isn't it exactly what a symmetric range is?

No, it is not exactly what a symmetric range is because `range-diff`
treats both arms of the symmetric range independently, as two distinct
non-symmetric ranges.

> > +- `<base> <rev1> <rev2>`: This is equivalent to `<base>..<rev1>
> > +  <base>..<rev2>`.
>
> Nice to see this documented.

Yes. I was rather surprised that I had overlooked that in my original
submission of the `range-diff` command.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > +- `<rev1>...<rev2>`. This resembles the symmetric ranges mentioned in
>> > +  the `SPECIFYING RANGES` section of linkgit:gitrevisions[7], and is
>> > +  equivalent to `<base>..<rev1> <base>..<rev2>` where `<base>` is the
>> > +  merge base as obtained via `git merge-base <rev1> <rev2>`.

The above paragraph says A...B is turned into $(git merge-base A
B)..A and $(git merge-base A B)..B, but I wonder if we should be
rewriting it into A..B and B..A instead; that would make it
unnecessary to explain what should happen when there are more than
one merge bases.

>> Does this merely resemble?  Isn't it exactly what a symmetric range is?
>
> No, it is not exactly what a symmetric range is because `range-diff`
> treats both arms of the symmetric range independently, as two distinct
> non-symmetric ranges.

This however is an end-user documentation, isn't it?

The fact that range-diff internally translates A...B into something
else is an implementation detail of achieving something, and from
the end-users' point of view, isn't it "take A...B from the
end-user, and show the difference between the left and right
branches of the symmetric range" that range-diff achieves to do?

So it processes exactly the same two sets of commits as what is
given by "--left-right A...B", no?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Fri, 22 Jan 2021, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> > +- `<rev1>...<rev2>`. This resembles the symmetric ranges mentioned in
> >> > +  the `SPECIFYING RANGES` section of linkgit:gitrevisions[7], and is
> >> > +  equivalent to `<base>..<rev1> <base>..<rev2>` where `<base>` is the
> >> > +  merge base as obtained via `git merge-base <rev1> <rev2>`.
>
> The above paragraph says A...B is turned into $(git merge-base A
> B)..A and $(git merge-base A B)..B, but I wonder if we should be
> rewriting it into A..B and B..A instead; that would make it
> unnecessary to explain what should happen when there are more than
> one merge bases.

You know what? I lied. The code already does what you suggested. Look
[here](https://github.com/git/git/blob/v2.30.0/builtin/range-diff.c#L59-L77):

	[...]
        } else if (argc == 1) {
                const char *b = strstr(argv[0], "..."), *a = argv[0];
                int a_len;

                if (!b) {
                        error(_("single arg format must be symmetric range"));
                        usage_with_options(builtin_range_diff_usage, options);
                }

                a_len = (int)(b - a);
                if (!a_len) {
                        a = "HEAD";
                        a_len = strlen(a);
                }
                b += 3;
                if (!*b)
                        b = "HEAD";
                strbuf_addf(&range1, "%s..%.*s", b, a_len, a);
                strbuf_addf(&range2, "%.*s..%s", a_len, a, b);
	[...]

> >> Does this merely resemble?  Isn't it exactly what a symmetric range is?
> >
> > No, it is not exactly what a symmetric range is because `range-diff`
> > treats both arms of the symmetric range independently, as two distinct
> > non-symmetric ranges.
>
> This however is an end-user documentation, isn't it?

Yes, and the end user is talking about _two_ commit ranges in the context
of `git range-diff`, and about _one_ commit range in the context of `git
log`.

So I still think that it really just only resembles the symmetric range.
It is fundamentally as different from it as the number 2 is different from
the number 1.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> The above paragraph says A...B is turned into $(git merge-base A
>> B)..A and $(git merge-base A B)..B, but I wonder if we should be
>> rewriting it into A..B and B..A instead; that would make it
>> unnecessary to explain what should happen when there are more than
>> one merge bases.
>
> You know what? I lied.

I knew.  So we need an update to the patch.

>> >> Does this merely resemble?  Isn't it exactly what a symmetric range is?
>> >
>> > No, it is not exactly what a symmetric range is because `range-diff`
>> > treats both arms of the symmetric range independently, as two distinct
>> > non-symmetric ranges.
>>
>> This however is an end-user documentation, isn't it?
>
> Yes, and the end user is talking about _two_ commit ranges in the context
> of `git range-diff`, and about _one_ commit range in the context of `git
> log`.

You are forgetting that "log A...B" shows only one half of what a
symmetric range means, and hides which side each commit belongs to.

"git log --left-right A...B" shows what a symmetric range really is;
there are two sides, left and right, and "git range-diff A...B" is
a natural way to compare these two ranges.

I've been quite happy with the way how "git range-diff @{-1}..."
shows the comparison of two sides of the symmetric range, given by
"git log --oneline --left-right @{-1}..."


@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2021

This branch is now known as js/range-diff-wo-dotdot.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2021

This patch series was integrated into seen via git@898f7c5.

@gitgitgadget gitgitgadget bot added the seen label Jan 22, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2021

On the Git mailing list, Uwe Kleine-König wrote (reply to this):


--a56bftk6pd3gksas
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello Johannes,

On Thu, Jan 21, 2021 at 10:20:35PM +0000, Johannes Schindelin via GitGitGad=
get wrote:
> In
> https://lore.kernel.org/git/20200306091933.mx2jmurmdnsjua4b@pengutronix.d=
e/,
> it was reported that git range-diff does not handle commit ranges like
> rev^!. This patch series fixes that.

Oh wow, great you picked that up. It still occasionally annoys me that
the range parser of range-diff cannot do $hash^!.

I assume you will fix the stuff Junio pointed out (or argument your
patch 1 is right, I didn't try to check). When this is discussed to an
end or fixed I will happily test your series.

And by the way you also made one of my coworkers happy with your series.

Best regards and thanks
Uwe

--=20
Pengutronix e.K.                           | Uwe Kleine-K=F6nig            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

--a56bftk6pd3gksas
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAABCgAdFiEEfnIqFpAYrP8+dKQLwfwUeK3K7AkFAmAKf2EACgkQwfwUeK3K
7AmliAf7B40jC1CMV7LPsiSS0oqPaEx7gq2XYgze0YVF2dUdwJcO1aJ4RZDzeAx3
dV32y1CBo6gMuDlfprKgbbRcly4WUdZbgcxEzGYL6x/NIOVQG+bzK+fWg+nYaCO5
2+rtDerBPLw9ZDzyyI1BtdTHpRAEAqOw+zOay1+DOMKBvW8uBzIlq4JZj+QGfQKI
2FK9m0OMoKXqw74hxXEccHe0Vc/IoaY3b7CvNON6yfivS8XaRw7G06GcqBHC/rgj
IQk0Gn7W7QsYi+WmbNxaPLh3bUUSj2aE+KUc8SdfJ7prv9dwPOkQHU79gI6BTUiU
PUcZnCxCCZRxnVobHnYXGBdwwhGxow==
=FUiQ
-----END PGP SIGNATURE-----

--a56bftk6pd3gksas--

@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch 2 times, most recently from 33d6b9c to 4f5e5ac Compare January 22, 2021 16:22
@dscho
Copy link
Member Author

dscho commented Jan 22, 2021

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2021

Submitted as pull.841.v2.git.1611339373.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v2

To fetch this version to local tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v2:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v2

revision.c Outdated
@@ -4206,3 +4206,28 @@ void put_revision_mark(const struct rev_info *revs, const struct commit *commit)
fputs(mark, stdout);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +	i = strlen(range);
> +	c = i > 2 ? range[--i] : 0;
> +	if (c == '!')
> +		i--; /* might be ...^! */
> +	else if (isdigit(c)) {
> +		/* handle ...^-<n> */
> +		while (i > 2 && isdigit(range[--i]))
> +			; /* keep trimming trailing digits */
> +		if (i < 2 || range[i--] != '-')
> +			return 0;
> +	} else
> +		return 0;
> +
> +	/* Before the `!` or the `-<n>`, we expect `<rev>^` */
> +	return i > 0 && range[i] == '^';

This is still way too complex for my liking, but at least I cannot
immediately spot a glaring off-by-one like the previous round ;-)

This is a tangent ([*1*]), but we often equate an omission to
implicitly specified HEAD; e.g. "git log @{u}.." is what we did
since we started building on top of our upstream.  I wonder if it
makes sense to allow similar short-hand so that ^! alone would mean
HEAD^!, ^@ alone would mean HEAD^@, and so on.

Thanks.


[Footnote]

*1* read: this has nothing to do with how ready I think this patch
    is, but the topic reminds me of it strongly enough that I raise
    it here, because I know the opinions on this unrelated thing on
    recipients of this response are worth listening to.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Fri, 22 Jan 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +	i = strlen(range);
> > +	c = i > 2 ? range[--i] : 0;
> > +	if (c == '!')
> > +		i--; /* might be ...^! */
> > +	else if (isdigit(c)) {
> > +		/* handle ...^-<n> */
> > +		while (i > 2 && isdigit(range[--i]))
> > +			; /* keep trimming trailing digits */
> > +		if (i < 2 || range[i--] != '-')
> > +			return 0;
> > +	} else
> > +		return 0;
> > +
> > +	/* Before the `!` or the `-<n>`, we expect `<rev>^` */
> > +	return i > 0 && range[i] == '^';
>
> This is still way too complex for my liking, but at least I cannot
> immediately spot a glaring off-by-one like the previous round ;-)

Maybe I am too stuck with the idea of avoiding regular expressions after
that StackOverflow incident... Maybe

	static regex_t *regex;

	if (strstr(range, ".."))
		return 1;

	if (!regex) {
		regex = xmalloc(sizeof(*regex));
		if (regcomp(regex, "\\^(!|-[0-9]*)$", REG_EXTENDED))
			BUG("could not compile regex");
	}

	return !regexec(regex, range, 0, NULL, 0);

is more readable, and acceptable in this context?

> This is a tangent ([*1*]), but we often equate an omission to
> implicitly specified HEAD; e.g. "git log @{u}.." is what we did
> since we started building on top of our upstream.  I wonder if it
> makes sense to allow similar short-hand so that ^! alone would mean
> HEAD^!, ^@ alone would mean HEAD^@, and so on.

I don't know. On one hand, it would make things more consistent. On the
other hand, it would be relatively hard to explain, I think. And we
already have the shortcut `@!^`, which is hard enough to explain as it is.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Maybe I am too stuck with the idea of avoiding regular expressions after
> that StackOverflow incident... Maybe
>
> 	static regex_t *regex;
>
> 	if (strstr(range, ".."))
> 		return 1;
>
> 	if (!regex) {
> 		regex = xmalloc(sizeof(*regex));
> 		if (regcomp(regex, "\\^(!|-[0-9]*)$", REG_EXTENDED))
> 			BUG("could not compile regex");
> 	}
>
> 	return !regexec(regex, range, 0, NULL, 0);
>
> is more readable, and acceptable in this context?

Readable, yes, acceptable, I don't know, and I am not even sure if I
want to be the one to judge to be honest ;-)

Have you tried the approach to feed the thing to setup_revisions()
and inspect what objects are in revs.pending?  

When you got a valid range, you should find one or more positive and
one or more negative commits , and the approach won't be fooled by
strings like "HEAD^{/other than A..B/}".

Or does the revision parsing machinery too eager to "die" when we
find a syntax error?  If so, scratch that idea, but in the longer
haul, fixing these die's would also be something we'd want to do to
make more parts of libgit.a callable as a proper library.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 27 Jan 2021, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Maybe I am too stuck with the idea of avoiding regular expressions after
> > that StackOverflow incident... Maybe
> >
> > 	static regex_t *regex;
> >
> > 	if (strstr(range, ".."))
> > 		return 1;
> >
> > 	if (!regex) {
> > 		regex = xmalloc(sizeof(*regex));
> > 		if (regcomp(regex, "\\^(!|-[0-9]*)$", REG_EXTENDED))
> > 			BUG("could not compile regex");
> > 	}
> >
> > 	return !regexec(regex, range, 0, NULL, 0);
> >
> > is more readable, and acceptable in this context?
>
> Readable, yes, acceptable, I don't know, and I am not even sure if I
> want to be the one to judge to be honest ;-)
>
> Have you tried the approach to feed the thing to setup_revisions()
> and inspect what objects are in revs.pending?

I hadn't thought of that.

> When you got a valid range, you should find one or more positive and
> one or more negative commits , and the approach won't be fooled by
> strings like "HEAD^{/other than A..B/}".
>
> Or does the revision parsing machinery too eager to "die" when we
> find a syntax error?  If so, scratch that idea, but in the longer
> haul, fixing these die's would also be something we'd want to do to
> make more parts of libgit.a callable as a proper library.

It should probably be libified anyway if it calles `die()` (I don't know
off the top of my head).

Worse than the `die()` is probably the allocated memory; IIRC there is no
way to release the memory allocated by `setup_revisions()` unless one
performs a revwalk.

Will try to find some time to play with the `setup_revisions()` idea.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 28 Jan 2021, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Worse than the `die()` is probably the allocated memory; IIRC there is no
> > way to release the memory allocated by `setup_revisions()` unless one
> > performs a revwalk.
>
> Hmph, do you mean the singleton instances of "struct object" and
> those specific types that contain it?

I am concerned when I see what `add_rev_cmdline()` does, for example.

Ciao,
Dscho

@@ -1680,7 +1680,7 @@ static void infer_range_diff_ranges(struct strbuf *r1,
struct commit *head)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> Currently, when called with exactly two arguments, `git range-diff`
> tests for a literal `..` in each of the two. Likewise, the argument
> provided via `--range-diff` to `git format-patch` is checked in the same
> manner.
>
> However, `<commit>^!` is a perfectly valid commit range, equivalent to
> `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
> gitrevisions[7].
>
> In preparation for allowing more sophisticated ways to specify commit
> ranges, let's refactor the check into its own function.

I think the sharing between the two makes sense, but the helper
function should make it clear in its name that this is "the kind of
commit range range-diff wants to take".  Among the commit range "git
log" and friends can take, range-diff can take only a subset of it,
and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
still a commit range you can give to "git log", but it would not
make much sense to give it to range-diff).

Perhaps

    s/specifies_commit_range/is_range_diff_range/

or something.

> diff --git a/revision.h b/revision.h

Move this to range-diff.h, not revision.h which is about true commit
ranges.

> index 086ff10280d..66777c8e60f 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -457,4 +457,11 @@ int rewrite_parents(struct rev_info *revs,
>   */
>  struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
>  
> +/*
> + * Determine whether the given argument defines a commit range, e.g. A..B.
> + * Note that this only validates the format but does _not_ parse it, i.e.
> + * it does _not_ look up the specified commits in the local repository.
> + */

And s/defines a commit range/is usable as a range to range-diff/

Thanks.

> +int specifies_commit_range(const char *range);
> +
>  #endif

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Uwe Kleine-König wrote (reply to this):


--aftbhbozzha65n2k
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jan 22, 2021 at 12:27:10PM -0800, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>=20
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > Currently, when called with exactly two arguments, `git range-diff`
> > tests for a literal `..` in each of the two. Likewise, the argument
> > provided via `--range-diff` to `git format-patch` is checked in the same
> > manner.
> >
> > However, `<commit>^!` is a perfectly valid commit range, equivalent to
> > `<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
> > gitrevisions[7].
> >
> > In preparation for allowing more sophisticated ways to specify commit
> > ranges, let's refactor the check into its own function.
>=20
> I think the sharing between the two makes sense, but the helper
> function should make it clear in its name that this is "the kind of
> commit range range-diff wants to take".  Among the commit range "git
> log" and friends can take, range-diff can take only a subset of it,
> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
> still a commit range you can give to "git log", but it would not
> make much sense to give it to range-diff).

Does it make so little sense to forbid passing HEAD^@ as a range to
range-diff? I can imagine situations where is would make sense, e.g. I
often create customer patch stacks from a set of topic branches using
octopus merge. To compare two of these ^@ might be handy.

My POV is that if it's easy to use the same function (and so the same
set of range descriptors) for git log and git range-diff then do so.
This yields a consistent behaviour which is IMHO better than preventing
people to do things that are considered strange today.

Best regards
Uwe

--=20
Pengutronix e.K.                           | Uwe Kleine-K=F6nig            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

--aftbhbozzha65n2k
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAABCgAdFiEEfnIqFpAYrP8+dKQLwfwUeK3K7AkFAmAOdKkACgkQwfwUeK3K
7AlxZwf/SLEM299R6ngrm/9LBrEqqypGPZp9sysvqH2WJBLYnn4TtJW2eyVHqoAY
22T8N3cVzRgWgx+cuQO0stTtlqKKu31Ruw5anR2KGJwF88t2bZoYSw9ePoKmgXLc
HR8XJKPhGCl17InySausPGBJiXqFLjMzH7wW3bF0jlwaOeBtxBCp0tsFbe0xKccU
VVHnLQkaxNd6R81rEMBxgDRzt+SSI4vgkd/6BZKbIPM9X58BdsuMr2MgC0OWmCA8
QZqmKGT5GviAbw96jT2rrO/IBJwJyoCXc/VFPIkhG2ODXcQMAF3uTmrFZjpuocFQ
o7OJW7rZb1MD2uu0nsTo0o3WGEYa9w==
=zI9f
-----END PGP SIGNATURE-----

--aftbhbozzha65n2k--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Uwe Kleine-König wrote (reply to this):


--oxlh2ygc52upvp5w
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello Junio,

On Mon, Jan 25, 2021 at 11:24:39AM -0800, Junio C Hamano wrote:
> Uwe Kleine-K=F6nig <u.kleine-koenig@pengutronix.de> writes:
>=20
> >> > In preparation for allowing more sophisticated ways to specify commit
> >> > ranges, let's refactor the check into its own function.
> >>=20
> >> I think the sharing between the two makes sense, but the helper
> >> function should make it clear in its name that this is "the kind of
> >> commit range range-diff wants to take".  Among the commit range "git
> >> log" and friends can take, range-diff can take only a subset of it,
> >> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
> >> still a commit range you can give to "git log", but it would not
> >> make much sense to give it to range-diff).
> >
> > Does it make so little sense to forbid passing HEAD^@ as a range to
> > range-diff? I can imagine situations where is would make sense, e.g. I
> > often create customer patch stacks from a set of topic branches using
> > octopus merge. To compare two of these ^@ might be handy.
>=20
> You can discuss for each individual syntax of a single-token range
> and decide which ones are and are not suitable for range-diff, but I
> suspect the reason behind this business started with dot-dot is to
> perform a superficial "sanity check" at the command line parser
> level before passing them to the revision machinery, and having to
> deal with potential errors and/or having to compare unreasonably
> large segments of history that the user did not intend.
>=20
> Also I first thought that the command changes the behaviour, given
> two tokens, depending on the shape of these two tokens (i.e. when
> they satisfy the "is-range?" we are discussing, they are taken as
> two ranges to be compared, and otherwise does something else), but
> after revisiting the code and "git help range-diff", it always does
> one thing when given=20
>=20
>  (1) one arg: gives a symmetric range and what is to be compared
>      is its left and right half,
>=20
>  (2) two args: each is meant to name a set of commits and these two
>      are to be compared) or
>=20
>  (3) three args: each is meant to name a commit, and the arg1..arg2
>      and arg1..arg3 are the ranges to be compared.
>=20
> so ...
>=20
> > My POV is that if it's easy to use the same function (and so the same
> > set of range descriptors) for git log and git range-diff then do so.
> > This yields a consistent behaviour which is IMHO better than preventing
> > people to do things that are considered strange today.
>=20
> ... I am OK with that point of view.  It certainly is simpler to
> explain to end users.

It seems you understood my argument :-)

> Having said that, it would make it much harder to implement
> efficiently, though.  For example, when your user says
>=20
> 	git range-diff A B
>=20
> to compare "git log A" (all the way down to the root) and "git log
> B" (likewise), you'd certainly optimize the older common part of the
> history out, essentially turning it into
>=20
> 	git range-diff A..B B..A
>=20
> or its moral equivalent
>=20
> 	git range-diff A...B
>=20
> But you cannot apply such an optimization blindly.  When the user
> gives A..B and B..A as two args, you somehow need to notice that=20
> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
> need some "parsing" of these args.

I agree that for a long history

	git range-diff A B

is an expensive request and I wouldn't invest too many cycles optimizing
it. (And if I'd optimize it, it wouldn't be done using textual
combination of the two strings but by checking if the two ranges
intersect. This way something like

	git range-diff v4.0..v4.6-rc1 v4.0..v4.5.6

and maybe even

	git range-diff v4.0..v4.6-rc1 v4.0-rc1..v4.5.6

would benefit, too. But note I'm not (anymore) familiar with the git
source code, so I don't know if this is easy/sensible to do and I'm just
looking at the problem from an architectural and theoretical POV.)

> So, I dunno.  Limiting the second form to only forms that the
> implementation does not have to do such optimization would certainly
> make it simpler for Dscho to implement ;-)

I don't want to make it more complicated for Dscho, I'm happy if I can
in the near future use range-diff with $rev1^! $ref2^! . So feel free to
ignore me.

Best regards
Uwe

--=20
Pengutronix e.K.                           | Uwe Kleine-K=F6nig            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

--oxlh2ygc52upvp5w
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQEzBAABCgAdFiEEfnIqFpAYrP8+dKQLwfwUeK3K7AkFAmAPN0IACgkQwfwUeK3K
7AkT2Qf/U0hjauGpubVi6j2+BMClfZDlVRMYx+9TV+KWKoSTvqbNC1BngzO/fBmS
eBIIpS8u4jCMKKfxRRW1vl5OW+b/tYRAME3RYO2shUroXVeUWrY4KaIwwBxgcHOR
4Rw0e2Pi7vkK8Zrd6FWZCYVTzs9YOXfgBRJJf9OBzSGsjV66qkonVKY1SCwfIur1
Za8+Abw8VTg4syBWwcuKvnbeyYyJuvCKE4vi17I6gqEppNfxZ9CJEe0Vo3QM7zKy
mEnznudTd2fF2NyIuz5EdwUlStket/1PsT9m8dMPyfqnpTQUjTJ6x+/k4R0t3Sgk
U5+1znHBEsUP1/dfAcA+reDHHyfPgw==
=urxk
-----END PGP SIGNATURE-----

--oxlh2ygc52upvp5w--

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

>> > In preparation for allowing more sophisticated ways to specify commit
>> > ranges, let's refactor the check into its own function.
>> 
>> I think the sharing between the two makes sense, but the helper
>> function should make it clear in its name that this is "the kind of
>> commit range range-diff wants to take".  Among the commit range "git
>> log" and friends can take, range-diff can take only a subset of it,
>> and only a subset of it is meaningful to range-diff (e.g. HEAD^@ is
>> still a commit range you can give to "git log", but it would not
>> make much sense to give it to range-diff).
>
> Does it make so little sense to forbid passing HEAD^@ as a range to
> range-diff? I can imagine situations where is would make sense, e.g. I
> often create customer patch stacks from a set of topic branches using
> octopus merge. To compare two of these ^@ might be handy.

You can discuss for each individual syntax of a single-token range
and decide which ones are and are not suitable for range-diff, but I
suspect the reason behind this business started with dot-dot is to
perform a superficial "sanity check" at the command line parser
level before passing them to the revision machinery, and having to
deal with potential errors and/or having to compare unreasonably
large segments of history that the user did not intend.

Also I first thought that the command changes the behaviour, given
two tokens, depending on the shape of these two tokens (i.e. when
they satisfy the "is-range?" we are discussing, they are taken as
two ranges to be compared, and otherwise does something else), but
after revisiting the code and "git help range-diff", it always does
one thing when given 

 (1) one arg: gives a symmetric range and what is to be compared
     is its left and right half,

 (2) two args: each is meant to name a set of commits and these two
     are to be compared) or

 (3) three args: each is meant to name a commit, and the arg1..arg2
     and arg1..arg3 are the ranges to be compared.

so ...

> My POV is that if it's easy to use the same function (and so the same
> set of range descriptors) for git log and git range-diff then do so.
> This yields a consistent behaviour which is IMHO better than preventing
> people to do things that are considered strange today.

... I am OK with that point of view.  It certainly is simpler to
explain to end users.

Having said that, it would make it much harder to implement
efficiently, though.  For example, when your user says

	git range-diff A B

to compare "git log A" (all the way down to the root) and "git log
B" (likewise), you'd certainly optimize the older common part of the
history out, essentially turning it into

	git range-diff A..B B..A

or its moral equivalent

	git range-diff A...B

But you cannot apply such an optimization blindly.  When the user
gives A..B and B..A as two args, you somehow need to notice that 
you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
need some "parsing" of these args.

So, I dunno.  Limiting the second form to only forms that the
implementation does not have to do such optimization would certainly
make it simpler for Dscho to implement ;-)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> writes:

>> > My POV is that if it's easy to use the same function (and so the same
>> > set of range descriptors) for git log and git range-diff then do so.
>> > This yields a consistent behaviour which is IMHO better than preventing
>> > people to do things that are considered strange today.
>> 
>> ... I am OK with that point of view.  It certainly is simpler to
>> explain to end users.
>
> It seems you understood my argument :-)

So it seems ;-).

>> Having said that, it would make it much harder to implement
>> efficiently, though.  For example, when your user says
>> 
>> 	git range-diff A B
>> 
>> to compare "git log A" (all the way down to the root) and "git log
>> B" (likewise), you'd certainly optimize the older common part of the
>> history out, essentially turning it into
>> 
>> 	git range-diff A..B B..A
>> 
>> or its moral equivalent
>> 
>> 	git range-diff A...B
>> 
>> But you cannot apply such an optimization blindly.  When the user
>> gives A..B and B..A as two args, you somehow need to notice that 
>> you shouldn't rewrite it to "A..B...B..A", and for that, you'd still
>> need some "parsing" of these args.
>
> I agree that for a long history
>
> 	git range-diff A B
>
> is an expensive request and I wouldn't invest too many cycles optimizing
> it.

Well, your devil's advocate can argue that accepting an input that
can easily make the tool unusable would be irresponsible, though.

And there are two possible ways out:

 (1) declare that optimizing "A B" into "A...B" is way too difficult
     to do in general, and find a good enough way to see if A and B
     individually gives a "range" that should be manageable; or

 (2) invest cycles to optimize, so your users do not have to care.

I think the series takes the former approach, and I find it
understandable.

It is a different matter if the way found and implemented in the
patch is "good enough" to tell if a given argument names a
manageable range.  You said something about "rev^{/^here are
two..dots}" that can be mistaken as a "good enough" range, but it
actually names a revision and all the way down to the root.

> (And if I'd optimize it, it wouldn't be done using textual
> combination of the two strings but by checking if the two ranges
> intersect.

Yup, that is in line with what I mumbled earlier about
setup_revisions() and inspecting the rev_info.pending, I think.

>> So, I dunno.  Limiting the second form to only forms that the
>> implementation does not have to do such optimization would certainly
>> make it simpler for Dscho to implement ;-)
>
> I don't want to make it more complicated for Dscho, I'm happy if I can
> in the near future use range-diff with $rev1^! $ref2^! . So feel free to
> ignore me.
>
> Best regards
> Uwe

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2021

This patch series was integrated into seen via git@71dd3b5.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2021

This patch series was integrated into seen via git@7301e53.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2021

This patch series was integrated into seen via git@e0c26c4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 25, 2021

This patch series was integrated into seen via git@f103fb4.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2021

This patch series was integrated into seen via git@07def50.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 26, 2021

This patch series was integrated into seen via git@c537d63.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2021

This patch series was integrated into seen via git@21469a8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2021

This patch series was integrated into seen via git@791a076.

Currently, when called with exactly two arguments, `git range-diff`
tests for a literal `..` in each of the two. Likewise, the argument
provided via `--range-diff` to `git format-patch` is checked in the same
manner.

However, `<commit>^!` is a perfectly valid commit range, equivalent to
`<commit>^..<commit>` according to the `SPECIFYING RANGES` section of
gitrevisions[7].

In preparation for allowing more sophisticated ways to specify commit
ranges, let's refactor the check into its own function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch 2 times, most recently from c595b36 to 5ab9321 Compare January 27, 2021 16:06
@dscho
Copy link
Member Author

dscho commented Jan 27, 2021

/submit

@@ -11,6 +11,7 @@
#include "pretty.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +test_expect_success 'A^! and A^-<n> (unmodified)' '
> +	git range-diff --no-color topic^! unmodified^-1 >actual &&
> +	cat >expect <<-EOF &&
> +	1:  $(test_oid t4) = 1:  $(test_oid u4) s/12/B/
> +	EOF
> +	test_cmp expect actual
> +'

Now we actually parse the single-token range, instead of relying on
"does it have dot-dot" heuristics, we can make sure that we reject

    "HEAD^{/^string with .. in it}"

as "not a range with negative and positive ends".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 4 Feb 2021, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +test_expect_success 'A^! and A^-<n> (unmodified)' '
> > +	git range-diff --no-color topic^! unmodified^-1 >actual &&
> > +	cat >expect <<-EOF &&
> > +	1:  $(test_oid t4) = 1:  $(test_oid u4) s/12/B/
> > +	EOF
> > +	test_cmp expect actual
> > +'
>
> Now we actually parse the single-token range, instead of relying on
> "does it have dot-dot" heuristics, we can make sure that we reject
>
>     "HEAD^{/^string with .. in it}"
>
> as "not a range with negative and positive ends".

Sure, why not.

Ciao,
Dscho

@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch from 295fdc1 to bc5de80 Compare February 4, 2021 21:54
@dscho
Copy link
Member Author

dscho commented Feb 4, 2021

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 4, 2021

Submitted as pull.841.v5.git.1612481392.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v5

To fetch this version to local tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v5:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v5

@@ -11,6 +11,7 @@
#include "pretty.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +	init_revisions(&revs, NULL);
> +	if (setup_revisions(3, argv, &revs, 0) == 1)
> +		for (i = 0; i < revs.pending.nr; i++) {
> +			struct object *obj = revs.pending.objects[i].item;
> +
> +			if (obj->flags & UNINTERESTING)
> +				negative++;
> +			else
> +				positive++;
> +			if (obj->type == OBJ_COMMIT)
> +				clear_commit_marks((struct commit *)obj,
> +						   ALL_REV_FLAGS);

Ah, I didn't think of adding this inside the same loop.  As long as
the pending objects are independent, this should work, but it is
unsafe, I think.  Imagine what happens if revs.pending[0].item can
reach the object revs.pending[1].item?  By the time the loop wants
to inspect the latter, its flags may have been cleared already
because we clear flags in objects that are reachable from the
former.

Let's do this as a post-processing for safety (or correctness).

	int negpos = 0;

	if (setup_revisions(3, argv, &revs, 0) != 1)
		goto cleanup;
	for (i = 0; i < revs.pending.nr && negpos != 3; i++) {
		struct object *obj = revs.pending.objects[i].item;
		negpos |= (obj->flags & UNINTERESTING) ? 01 : 02;
	}

	for (i = 0; i < revs.pending.nr; i++) {
		struct object *obj = revs.pending.objects[i].item;
                if (obj->type == OBJ_COMMIT)
			clear_commit_marks((struct commit *)obj, ALL_REV_FLAGS);
	}

    cleanup:
> +	free(copy);
> +	object_array_clear(&revs.pending);
> +	return negative > 0 && positive > 0;
>  }

or something like that.

> +test_expect_success 'A^{/..} is not mistaken for a range' '
> +	test_must_fail git range-diff topic^.. topic^{/..} 2>error &&
> +	test_i18ngrep "not a commit rang" error
> +'

s/rang/range/, I would think.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> Ah, I didn't think of adding this inside the same loop.  As long as
> the pending objects are independent, this should work, but it is
> unsafe, I think.

Here is what I added on top of the 3-patch series for tonight's
pushout.

 * No need to count positive/negative; just seeing both exists is
   sufficient and there is no need to examine any later elements, if
   any.

 * Clearing commit marks needs to be done after we have seen enough,
   or it can clear the stuff we would want to inspect before we
   have a chance to.  Do it in a separate post-cleaning loop.

 * Dedent by judicious use of 'goto'.

 * The last parameter to setup_revisions() is a pointer, so spell it
   NULL not 0.

 * "rang" -> "range" typofix (it might be even better to look for
   "range:")

----- >8 ---------- >8 ---------- >8 ---------- >8 ---------- >8 ---------- >8 -----
Subject: [PATCH] fixup! range-diff/format-patch: handle commit ranges other than A..B

 range-diff.c          | 30 +++++++++++++++++-------------
 t/t3206-range-diff.sh |  2 +-
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/range-diff.c b/range-diff.c
index c307bca9de..7a38dc8715 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -570,24 +570,28 @@ int is_range_diff_range(const char *arg)
 {
 	char *copy = xstrdup(arg); /* setup_revisions() modifies it */
 	const char *argv[] = { "", copy, "--", NULL };
-	int i, positive = 0, negative = 0;
+	int i;
 	struct rev_info revs;
+	unsigned npmask = 0;
 
 	init_revisions(&revs, NULL);
-	if (setup_revisions(3, argv, &revs, 0) == 1)
-		for (i = 0; i < revs.pending.nr; i++) {
-			struct object *obj = revs.pending.objects[i].item;
+	if (setup_revisions(3, argv, &revs, NULL) != 1)
+		goto cleanup;
 
-			if (obj->flags & UNINTERESTING)
-				negative++;
-			else
-				positive++;
-			if (obj->type == OBJ_COMMIT)
-				clear_commit_marks((struct commit *)obj,
-						   ALL_REV_FLAGS);
-		}
+	for (i = 0; i < revs.pending.nr && npmask != 3; i++) {
+		struct object *obj = revs.pending.objects[i].item;
+
+		npmask |= (obj->flags & UNINTERESTING) ? 01 : 02;
+	}
+
+	for (i = 0; i < revs.pending.nr; i++) {
+		struct object *obj = revs.pending.objects[i].item;
+		if (obj->type == OBJ_COMMIT)
+			clear_commit_marks((struct commit *)obj, ALL_REV_FLAGS);
+	}
 
+cleanup:
 	free(copy);
 	object_array_clear(&revs.pending);
-	return negative > 0 && positive > 0;
+	return npmask == 3;
 }
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 45f21ee215..2b518378d4 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -160,7 +160,7 @@ test_expect_success 'A^! and A^-<n> (unmodified)' '
 
 test_expect_success 'A^{/..} is not mistaken for a range' '
 	test_must_fail git range-diff topic^.. topic^{/..} 2>error &&
-	test_i18ngrep "not a commit rang" error
+	test_i18ngrep "not a commit range" error
 '
 
 test_expect_success 'trivial reordering' '
-- 
2.30.0-601-g9b6178ed87

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 4 Feb 2021, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Ah, I didn't think of adding this inside the same loop.  As long as
> > the pending objects are independent, this should work, but it is
> > unsafe, I think.
>
> Here is what I added on top of the 3-patch series for tonight's
> pushout.
>
>  * No need to count positive/negative; just seeing both exists is
>    sufficient and there is no need to examine any later elements, if
>    any.

Your code is substantially harder to read. I really dislike that `npmask`
idea.

>  * Clearing commit marks needs to be done after we have seen enough,
>    or it can clear the stuff we would want to inspect before we
>    have a chance to.  Do it in a separate post-cleaning loop.

I implemented that.

>  * Dedent by judicious use of 'goto'.

Not necessary: the code is short and narrow enough.

>  * The last parameter to setup_revisions() is a pointer, so spell it
>    NULL not 0.

I had missed that in your review. Fixed, too.

>  * "rang" -> "range" typofix (it might be even better to look for
>    "range:")

Technically, it is not necessary, as we're only verifying that the
intended error message is shown, and that "e" does not make any
difference.

But it _was_ a typo, so I fixed it, too.

Next iteration is coming soon,
Dscho

>
> ----- >8 ---------- >8 ---------- >8 ---------- >8 ---------- >8 ---------- >8 -----
> Subject: [PATCH] fixup! range-diff/format-patch: handle commit ranges other than A..B
>
>  range-diff.c          | 30 +++++++++++++++++-------------
>  t/t3206-range-diff.sh |  2 +-
>  2 files changed, 18 insertions(+), 14 deletions(-)
>
> diff --git a/range-diff.c b/range-diff.c
> index c307bca9de..7a38dc8715 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -570,24 +570,28 @@ int is_range_diff_range(const char *arg)
>  {
>  	char *copy = xstrdup(arg); /* setup_revisions() modifies it */
>  	const char *argv[] = { "", copy, "--", NULL };
> -	int i, positive = 0, negative = 0;
> +	int i;
>  	struct rev_info revs;
> +	unsigned npmask = 0;
>
>  	init_revisions(&revs, NULL);
> -	if (setup_revisions(3, argv, &revs, 0) == 1)
> -		for (i = 0; i < revs.pending.nr; i++) {
> -			struct object *obj = revs.pending.objects[i].item;
> +	if (setup_revisions(3, argv, &revs, NULL) != 1)
> +		goto cleanup;
>
> -			if (obj->flags & UNINTERESTING)
> -				negative++;
> -			else
> -				positive++;
> -			if (obj->type == OBJ_COMMIT)
> -				clear_commit_marks((struct commit *)obj,
> -						   ALL_REV_FLAGS);
> -		}
> +	for (i = 0; i < revs.pending.nr && npmask != 3; i++) {
> +		struct object *obj = revs.pending.objects[i].item;
> +
> +		npmask |= (obj->flags & UNINTERESTING) ? 01 : 02;
> +	}
> +
> +	for (i = 0; i < revs.pending.nr; i++) {
> +		struct object *obj = revs.pending.objects[i].item;
> +		if (obj->type == OBJ_COMMIT)
> +			clear_commit_marks((struct commit *)obj, ALL_REV_FLAGS);
> +	}
>
> +cleanup:
>  	free(copy);
>  	object_array_clear(&revs.pending);
> -	return negative > 0 && positive > 0;
> +	return npmask == 3;
>  }
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 45f21ee215..2b518378d4 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -160,7 +160,7 @@ test_expect_success 'A^! and A^-<n> (unmodified)' '
>
>  test_expect_success 'A^{/..} is not mistaken for a range' '
>  	test_must_fail git range-diff topic^.. topic^{/..} 2>error &&
> -	test_i18ngrep "not a commit rang" error
> +	test_i18ngrep "not a commit range" error
>  '
>
>  test_expect_success 'trivial reordering' '
> --
> 2.30.0-601-g9b6178ed87
>
>

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2021

This patch series was integrated into seen via git@a024319.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2021

This patch series was integrated into seen via git@3677c7c.

@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch from bc5de80 to 16801ce Compare February 5, 2021 13:32
In the `SPECIFYING RANGES` section of gitrevisions[7], two ways are
described to specify commit ranges that `range-diff` does not yet
accept: "<commit>^!" and "<commit>^-<n>".

Let's accept them, by parsing them via the revision machinery and
looking for at least one interesting and one uninteresting revision in
the resulting `pending` array.

This also finally lets us reject arguments that _do_ contain `..` but
are not actually ranges, e.g. `HEAD^{/do.. match this}`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There are three forms, depending whether the user specifies one, two or
three non-option arguments. We've never actually explained how this
works in the manual, so let's explain it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the range-diff-with-ranges-lacking-dotdot branch from 16801ce to 08c5f87 Compare February 5, 2021 14:06
@dscho
Copy link
Member Author

dscho commented Feb 5, 2021

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2021

Submitted as pull.841.v6.git.1612536290.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v6

To fetch this version to local tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v6:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-841/dscho/range-diff-with-ranges-lacking-dotdot-v6

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 6, 2021

This patch series was integrated into seen via git@c3102dc.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 8, 2021

This patch series was integrated into seen via git@cf594fe.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 8, 2021

This patch series was integrated into next via git@9ae8b46.

@gitgitgadget gitgitgadget bot added the next label Feb 8, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 10, 2021

This patch series was integrated into seen via git@9ae8b46.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 10, 2021

This patch series was integrated into seen via git@40e1b9b.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 10, 2021

This patch series was integrated into seen via git@a64c060.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 11, 2021

This patch series was integrated into seen via git@31076c2.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 11, 2021

This patch series was integrated into seen via git@b98cf07.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2021

This patch series was integrated into seen via git@4779459.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 19, 2021

This patch series was integrated into seen via git@77348b0.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 19, 2021

This patch series was integrated into next via git@77348b0.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 19, 2021

This patch series was integrated into master via git@77348b0.

@gitgitgadget gitgitgadget bot added the master label Feb 19, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 19, 2021

Closed via 77348b0.

@gitgitgadget gitgitgadget bot closed this Feb 19, 2021
@dscho dscho deleted the range-diff-with-ranges-lacking-dotdot branch February 19, 2021 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant