Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-fix rebase -i with SHA-1 collisions #529

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Jan 16, 2020

Triggered by one of brian's SHA-256 patch series, I looked at the reason why the SHA-1 collision test case passed when it shouldn't. Turns out that the regression test was not quite thorough enough, and the interactive rebase did regress recently.

While in the area, I realized that the same bug exists in the code backing the rebase.missingCommitsCheck feature: the backed-up todo list uses shortened commit IDs that may very well become ambiguous during the rebase. For good measure, this patch series fixes that, too.

Finally, I saw that git rebase --edit-todo reported the line in an awkward, maybe even incorrect, way when there was an ambiguous commit ID, and I also fixed that.

To make sure that the code can be easily adapted to SHA-256 after these patches, I actually already made those adjustments on top and offered them up at bk2204#1.

Changes since v2:

  • Clarified in the first commit message that the change from a positive return value to a negative return value (to indicate failure, in both cases) both was intentional, and does not require any change in the corresponding function's only caller.

Changes since v1:

  • Turned the error condition when parsing the todo list with just-expanded commit IDs failed into a BUG().

Cc: brian m. carlson sandals@crustytoothpaste.net, Alban Gruin alban.gruin@gmail.com, Eric Sunshine sunshine@sunshineco.com, Junio C Hamano gitster@pobox.com

@dscho dscho force-pushed the re-fix-rebase-i-with-sha-collisions branch 3 times, most recently from 890fb1c to e7d9ea8 Compare January 16, 2020 14:50
@dscho
Copy link
Member Author

dscho commented Jan 16, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2020

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 16, 2020

On the Git mailing list, "brian m. carlson" wrote (reply to this):


--/w6WUUxYkubDgwa5
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2020-01-16 at 21:18:23, Johannes Schindelin via GitGitGadget wrote:
> Triggered by one of brian's SHA-256 patch series, I looked at the reason =
why
> the SHA-1 collision test case passed when it shouldn't. Turns out that the
> regression test was not quite thorough enough, and the interactive rebase=
=20
> did regress recently.
>=20
> While in the area, I realized that the same bug exists in the code backing
> the rebase.missingCommitsCheck feature: the backed-up todo list uses
> shortened commit IDs that may very well become ambiguous during the rebas=
e.
> For good measure, this patch series fixes that, too.
>=20
> Finally, I saw that git rebase --edit-todo reported the line in an awkwar=
d,
> maybe even incorrect, way when there was an ambiguous commit ID, and I al=
so
> fixed that.
>=20
> To make sure that the code can be easily adapted to SHA-256 after these
> patches, I actually already made those adjustments on top and offered them
> up at https://github.com/bk2204/git/pull/1.

This series looks great to me, and thanks for fixing this.

As mentioned in the PR, I'm happy for you to drop the SHA-256 patch into
this series if you like, or I can carry it in a future series.  Either
way is fine with me.
--=20
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

--/w6WUUxYkubDgwa5
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.2.19 (GNU/Linux)

iQIzBAABCgAdFiEEX8OngXdrJt+H9ww3v1NdgR9S9osFAl4g96MACgkQv1NdgR9S
9ouVXhAAwMl1vy8nQxbIyAnq+YIm/SkwvbPoK00XHJ3KlS8AR70d+RoV6oq8RchO
Swzo/54QYaP0/Jn3zCQv3p/OD7CLLA3Aq7ln21XvZKn7DW8eu0h8ZgN1dE5uuHjF
voFEJT3v7/p9jTocOKIR5XBh7dFiYa++mAswnE+IYENXp727fm+qnFqUj6ZvXliV
AmC+Z/MzK1Ni1ivXfRkHfS6H9Cv4Lz84GgTKUOkStjvS+DHT4HK6sXSAwnL9dlzd
qeS7Kldtez4uVEzFE4CaVJLEO12SCFJ9kyq9m1Hx4cIrTnFPnFBirgp0z7aGR2qk
HABzuFVT5i4m4rSuzBwtLJ+m8V+FqbZCd/j5xInHgI3+XY2y1N/RsB5B3QEDMwQt
Yn1O/agpYDc1BTHnUdJq1CI2Rl4hSfV29f+PBMzPVUHx4+M9AEH7BjQci6ensDKr
LR675U37A6OzkuC2gqpDp3utWI5pa1YsUX5eYi10TviMrJ7Um7OUsCV442NsggeN
S+f859XQzCDcvevWBh4vp4kJ7BSLzatVXYfSVDUnXIEzAVWE0qMDbFd3QqwYRP3U
xa/0TTQX1+izNnZ9wPh7gNftvACTj4HVfn/9iZ439Aqjt2Uej5w0cscNzq3WGFgt
CBKbWB7Wvf3qRW/KBvH4fA1MTJhZ1oZdPutPjwj8kt/rUcS2aGU=
=oprG
-----END PGP SIGNATURE-----

--/w6WUUxYkubDgwa5--

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 17, 2020

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi brian,

On Thu, 16 Jan 2020, brian m. carlson wrote:

> On 2020-01-16 at 21:18:23, Johannes Schindelin via GitGitGadget wrote:
> > Triggered by one of brian's SHA-256 patch series, I looked at the reason why
> > the SHA-1 collision test case passed when it shouldn't. Turns out that the
> > regression test was not quite thorough enough, and the interactive rebase
> > did regress recently.
> >
> > While in the area, I realized that the same bug exists in the code backing
> > the rebase.missingCommitsCheck feature: the backed-up todo list uses
> > shortened commit IDs that may very well become ambiguous during the rebase.
> > For good measure, this patch series fixes that, too.
> >
> > Finally, I saw that git rebase --edit-todo reported the line in an awkward,
> > maybe even incorrect, way when there was an ambiguous commit ID, and I also
> > fixed that.
> >
> > To make sure that the code can be easily adapted to SHA-256 after these
> > patches, I actually already made those adjustments on top and offered them
> > up at https://github.com/bk2204/git/pull/1.
>
> This series looks great to me, and thanks for fixing this.
>
> As mentioned in the PR, I'm happy for you to drop the SHA-256 patch into
> this series if you like, or I can carry it in a future series.  Either
> way is fine with me.

Excellent. Given that the re-fix to avoid short commit ID collisions has
little to do with supporting SHA-256, I would like to keep the patch
series separate, then.

The question whether to move the SHA-256 support patch into your series is
more a question to Junio, i.e. which patch series will be merged down
faster.

Junio, what is your preference here?

Ciao,
Dscho

@@ -2118,18 +2118,19 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
saved = *end_of_object_name;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In the case that a `get_oid()` call failed, we showed some rather bogus
> part of the line instead of the precise string we sent to said function.
> That makes it rather hard for users to understand what is going wrong,
> so let's fix that.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  sequencer.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..7c30dad59c 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2118,6 +2118,8 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  	saved = *end_of_object_name;
>  	*end_of_object_name = '\0';
>  	status = get_oid(bol, &commit_oid);
> +	if (status < 0)
> +		error(_("could not parse '%s'"), bol); /* return later */
>  	*end_of_object_name = saved;

OK, so after making sure the line begins a command that takes an
object name, we first NUL terminated the token after the command
but it turns out the string is not a valid oid.  We show the part
we thought was the object name before reverting the NUL termination.

Makes sense.  And then later...

>  	bol = end_of_object_name + strspn(end_of_object_name, " \t");
> @@ -2125,11 +2127,10 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  	item->arg_len = (int)(eol - bol);
>  
>  	if (status < 0)
> -		return error(_("could not parse '%.*s'"),
> -			     (int)(end_of_object_name - bol), bol);
> +		return status;

...this is the *only* code that uses the status that was assigned to
the return value of get_oid().  

Because the implementation of this function (mis)uses "bol", which
stands for "beginning of line", as a moving pointer of "beginning of
the token we are looking at", the value of "bol" at this point of
the control in the original code is not where the "bol" pointer used
to be when end-of-object-name was computed (if it were, the '%.*s'
argument would have been correct) and shows a token after where the
object name would have been, which may help the user identify the
line but does not directly show what token we had trouble with
parsing.

And the updated code will avoid the issue by using the "bol" pointer
when it is still pointing at the right place.


>  	item->commit = lookup_commit_reference(r, &commit_oid);
> -	return !item->commit;
> +	return item->commit ? 0 : -1;

This changes the polarity of the error exit from positive 1 to
negative 1.  The only caller of this function takes anything
non-zero as a failure so this would not cause behaviour change, but
returning negative is more in line with the practice so it is an
improvement (it is unrelated to the theme of this patch and is not
explained, though).

OK.

@@ -2118,18 +2118,19 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
saved = *end_of_object_name;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> However, the bug resurfaced as a side effect of 393adf7a6f6 (sequencer:
> directly call pick_commits() from complete_action(), 2019-11-24): as of
> this commit, the sequencer no longer re-reads the todo list after
> writing it out with expanded commit IDs.

Ouch.  The analysis above is quite understandable.

> @@ -5128,6 +5128,18 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
>  		return -1;
>  	}
>  
> +	/* Expand the commit IDs */
> +	todo_list_to_strbuf(r, &new_todo, &buf2, -1, 0);
> +	strbuf_swap(&new_todo.buf, &buf2);
> +	strbuf_release(&buf2);
> +	new_todo.total_nr -= new_todo.nr;
> +	if (todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo) < 0) {
> +		fprintf(stderr, _(edit_todo_list_advice));
> +		checkout_onto(r, opts, onto_name, &onto->object.oid, orig_head);
> +		todo_list_release(&new_todo);
> +		return error(_("invalid todo list after expanding IDs"));
> +	}

The above happens after edit_todo_list() returns and then the
resulting data (i.e. new_todo) that came from the on-disk file has
been parsed with an existing call to todo_lsit_parse_insn_buffer()?

I am wondering when this if() statement would trigger, iow, under
what condition the result of "Expand the commit IDs" operation fails
to be parsed correctly, and what the user can do to remedy it.
Especially given that incoming new_todo has passed the existing
parse and check without hitting "return -1" we see above in the
context, I am not sure if there is anything, other than any
potential glitch in the added code above, that can cause this second
parse_insn_buffer() to fail.  Shouldn't the body of if() be simply a
BUG()?

Or am I somehow missing a hunk that removes an existing call to
parse&check?

> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index ae6e55ce79..1cc9f36bc7 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1264,13 +1264,24 @@ test_expect_success SHA1 'short SHA-1 setup' '
>  test_expect_success SHA1 'short SHA-1 collide' '
>  	test_when_finished "reset_rebase && git checkout master" &&
>  	git checkout collide &&
> +	colliding_sha1=6bcda37 &&
> +	test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
>  	(
>  		unset test_tick &&
>  		test_tick &&
>  		set_fake_editor &&
>  		FAKE_COMMIT_MESSAGE="collide2 ac4f2ee" \
> -		FAKE_LINES="reword 1 2" git rebase -i HEAD~2
> -	)
> +		FAKE_LINES="reword 1 break 2" git rebase -i HEAD~2 &&
> +		test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> +		grep "^pick $colliding_sha1 " \
> +			.git/rebase-merge/git-rebase-todo.tmp &&
> +		grep "^pick [0-9a-f]\{40\}" \
> +			.git/rebase-merge/git-rebase-todo &&
> +		git rebase --continue
> +	) &&
> +	collide2="$(git rev-parse HEAD~1 | cut -c 1-4)" &&
> +	collide3="$(git rev-parse collide3 | cut -c 1-4)" &&
> +	test "$collide2" = "$collide3"
>  '
>  
>  test_expect_success 'respect core.abbrev' '

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Fri, 17 Jan 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > +	/* Expand the commit IDs */
> > +	todo_list_to_strbuf(r, &new_todo, &buf2, -1, 0);
> > +	strbuf_swap(&new_todo.buf, &buf2);
> > +	strbuf_release(&buf2);
> > +	new_todo.total_nr -= new_todo.nr;
> > +	if (todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo) < 0) {
> > +		fprintf(stderr, _(edit_todo_list_advice));
> > +		checkout_onto(r, opts, onto_name, &onto->object.oid, orig_head);
> > +		todo_list_release(&new_todo);
> > +		return error(_("invalid todo list after expanding IDs"));
> > +	}
>
> The above happens after edit_todo_list() returns and then the
> resulting data (i.e. new_todo) that came from the on-disk file has
> been parsed with an existing call to todo_lsit_parse_insn_buffer()?
>
> I am wondering when this if() statement would trigger, iow, under
> what condition the result of "Expand the commit IDs" operation fails
> to be parsed correctly, and what the user can do to remedy it.

You're right, this is defensive code to guard against bugs, but the error
message suggests that it might be a user error: it isn't.

> Especially given that incoming new_todo has passed the existing
> parse and check without hitting "return -1" we see above in the
> context, I am not sure if there is anything, other than any
> potential glitch in the added code above, that can cause this second
> parse_insn_buffer() to fail.  Shouldn't the body of if() be simply a
> BUG()?

It should.

Will fix,
Dscho

> Or am I somehow missing a hunk that removes an existing call to
> parse&check?
>
> > diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> > index ae6e55ce79..1cc9f36bc7 100755
> > --- a/t/t3404-rebase-interactive.sh
> > +++ b/t/t3404-rebase-interactive.sh
> > @@ -1264,13 +1264,24 @@ test_expect_success SHA1 'short SHA-1 setup' '
> >  test_expect_success SHA1 'short SHA-1 collide' '
> >  	test_when_finished "reset_rebase && git checkout master" &&
> >  	git checkout collide &&
> > +	colliding_sha1=6bcda37 &&
> > +	test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> >  	(
> >  		unset test_tick &&
> >  		test_tick &&
> >  		set_fake_editor &&
> >  		FAKE_COMMIT_MESSAGE="collide2 ac4f2ee" \
> > -		FAKE_LINES="reword 1 2" git rebase -i HEAD~2
> > -	)
> > +		FAKE_LINES="reword 1 break 2" git rebase -i HEAD~2 &&
> > +		test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> > +		grep "^pick $colliding_sha1 " \
> > +			.git/rebase-merge/git-rebase-todo.tmp &&
> > +		grep "^pick [0-9a-f]\{40\}" \
> > +			.git/rebase-merge/git-rebase-todo &&
> > +		git rebase --continue
> > +	) &&
> > +	collide2="$(git rev-parse HEAD~1 | cut -c 1-4)" &&
> > +	collide3="$(git rev-parse collide3 | cut -c 1-4)" &&
> > +	test "$collide2" = "$collide3"
> >  '
> >
> >  test_expect_success 'respect core.abbrev' '
>
>

@dscho dscho force-pushed the re-fix-rebase-i-with-sha-collisions branch from e7d9ea8 to 486e941 Compare January 17, 2020 21:40
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 17, 2020

On the Git mailing list, "brian m. carlson" wrote (reply to this):


--Oh6wehMCKJhhoxlo
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2020-01-17 at 09:51:42, Johannes Schindelin wrote:
> > This series looks great to me, and thanks for fixing this.
> >
> > As mentioned in the PR, I'm happy for you to drop the SHA-256 patch into
> > this series if you like, or I can carry it in a future series.  Either
> > way is fine with me.
>=20
> Excellent. Given that the re-fix to avoid short commit ID collisions has
> little to do with supporting SHA-256, I would like to keep the patch
> series separate, then.
>=20
> The question whether to move the SHA-256 support patch into your series is
> more a question to Junio, i.e. which patch series will be merged down
> faster.

I need to do a reroll of part 8, so I'll pick it into a future series
(part 9) of test fixes and drop my patch.  That way it won't interfere
with either series making progress, but it will still be included at the
end.
--=20
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

--Oh6wehMCKJhhoxlo
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.2.19 (GNU/Linux)

iQIzBAABCgAdFiEEX8OngXdrJt+H9ww3v1NdgR9S9osFAl4iNy4ACgkQv1NdgR9S
9ovwwBAAkBauEqjtA+CwuFRw1DVOyVxWJSvIf/42vQ2Djt+Q6wj9eRzMJXJ85QEY
9Pdw47iZ1R43hPLy7vLngnh+Oxw98jSDielHR+L/4I8fGsJ6UcARKS/KMdqDPB7Z
sNcU44VJ+oSN0ohRG+5iNhQfj2oR9ngJybUSlgQd6I372kqRt19yFyTK/p6eTrm6
gpc5ym/Abf6FpUBFj4syXFyK/wu/LKqXNDS/ybcfkN4r/6StIl+sEAytO7ZuEcDF
vNHgJbA8n0cX0YgWWn0d76vaZ4+Uf59QEL6WiST9wEFbo00NiWsgUj/wvfGlFd4Y
yb2z6eFt8OXbpdXNRTexwi9dzXyJN6QCW1zDSAJV4jmZXrxwjZWcjyHDjmEjFM+G
zvilEyasURhh8n/UcSUIMyMQe0b7PswcbyRKgU1AIAZnGY/d+dB0d6KZmIadNVu4
hmy0X0t91uPBGfd3EwtM7/tnvXWD5cdmCItyRk0pY6PI7PBvUywBbynPioBJnosc
6h3jhjdmjsMIeVASYXxHt9ZV4g8KUuvorTRnsU/Mt4/nL6/kikVQZK3nfxLYK/OV
aQSdHB/nUNxXU+pOwv5tlHOdiv7Z05XXEQg++hfzejfbpiV5MKP5ebWyGeXJEeVT
GQuckOeaTVEZy1NcqQhn9AJlvYFmaER4NYG1gN1jBPLlbR67VqA=
=JzCz
-----END PGP SIGNATURE-----

--Oh6wehMCKJhhoxlo--

@dscho
Copy link
Member Author

dscho commented Jan 17, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 17, 2020

@@ -2118,18 +2118,19 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
saved = *end_of_object_name;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> @@ -1264,13 +1264,24 @@ test_expect_success SHA1 'short SHA-1 setup' '
>  test_expect_success SHA1 'short SHA-1 collide' '
>         test_when_finished "reset_rebase && git checkout master" &&
>         git checkout collide &&
> +       colliding_sha1=6bcda37 &&
> +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&

How much do we care that this is introducing new code with git
upstream of a pipe (considering recent efforts to eradicate such
usage)? Same question regarding several other new instances introduce
by this patch.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Mon, 20 Jan 2020, Eric Sunshine wrote:

> On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> > diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> > @@ -1264,13 +1264,24 @@ test_expect_success SHA1 'short SHA-1 setup' '
> >  test_expect_success SHA1 'short SHA-1 collide' '
> >         test_when_finished "reset_rebase && git checkout master" &&
> >         git checkout collide &&
> > +       colliding_sha1=6bcda37 &&
> > +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
>
> How much do we care that this is introducing new code with git
> upstream of a pipe (considering recent efforts to eradicate such
> usage)? Same question regarding several other new instances introduce
> by this patch.

I would argue that the test case will fail if the `git` call fails. So I
am not overly concerned if that `git` call is upstream of a pipe.

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Mon, Jan 20, 2020 at 3:04 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Mon, 20 Jan 2020, Eric Sunshine wrote:
> > On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > > +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> >
> > How much do we care that this is introducing new code with git
> > upstream of a pipe (considering recent efforts to eradicate such
> > usage)? Same question regarding several other new instances introduce
> > by this patch.
>
> I would argue that the test case will fail if the `git` call fails. So I
> am not overly concerned if that `git` call is upstream of a pipe.

Unless the git command crashes _after_ it produces the correct output...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Eric,

On Mon, 20 Jan 2020, Eric Sunshine wrote:

> On Mon, Jan 20, 2020 at 3:04 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > On Mon, 20 Jan 2020, Eric Sunshine wrote:
> > > On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > > > +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> > >
> > > How much do we care that this is introducing new code with git
> > > upstream of a pipe (considering recent efforts to eradicate such
> > > usage)? Same question regarding several other new instances introduce
> > > by this patch.
> >
> > I would argue that the test case will fail if the `git` call fails. So I
> > am not overly concerned if that `git` call is upstream of a pipe.
>
> Unless the git command crashes _after_ it produces the correct output...

Yes, that is true. In a very hypothetical way, of course.

:-)

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Mon, Jan 20, 2020 at 3:04 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> On Mon, 20 Jan 2020, Eric Sunshine wrote:
>> > On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
>> > <gitgitgadget@gmail.com> wrote:
>> > > +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
>> >
>> > How much do we care that this is introducing new code with git
>> > upstream of a pipe (considering recent efforts to eradicate such
>> > usage)? Same question regarding several other new instances introduce
>> > by this patch.
>>
>> I would argue that the test case will fail if the `git` call fails. So I
>> am not overly concerned if that `git` call is upstream of a pipe.
>
> Unless the git command crashes _after_ it produces the correct output...

True.  Doesn't rev-parse have an appropriate option for this kind of
thing that gets rid of the need for "cut" in the first place?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Tue, 21 Jan 2020, Junio C Hamano wrote:

> Eric Sunshine <sunshine@sunshineco.com> writes:
>
> > On Mon, Jan 20, 2020 at 3:04 PM Johannes Schindelin
> > <Johannes.Schindelin@gmx.de> wrote:
> >> On Mon, 20 Jan 2020, Eric Sunshine wrote:
> >> > On Fri, Jan 17, 2020 at 6:38 PM Johannes Schindelin via GitGitGadget
> >> > <gitgitgadget@gmail.com> wrote:
> >> > > +       test $colliding_sha1 = "$(git rev-parse HEAD | cut -c 1-7)" &&
> >> >
> >> > How much do we care that this is introducing new code with git
> >> > upstream of a pipe (considering recent efforts to eradicate such
> >> > usage)? Same question regarding several other new instances introduce
> >> > by this patch.
> >>
> >> I would argue that the test case will fail if the `git` call fails. So I
> >> am not overly concerned if that `git` call is upstream of a pipe.
> >
> > Unless the git command crashes _after_ it produces the correct output...
>
> True.  Doesn't rev-parse have an appropriate option for this kind of
> thing that gets rid of the need for "cut" in the first place?

You mean `git rev-parse --short=4`? That does something _sligthly_
different: it tries to shorten the OID to 4 characters _unless that would
be ambiguous_. In our case, it _will_ be ambiguous. That's why I use
`cut`.

As to the crash in `rev-parse` _after_ printing out the OID: yes, there is
a possibility for that. But that regression test is not about `rev-parse`,
so it is actually a good thing that it would not trigger on such a bug ;-)

Ciao,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> True.  Doesn't rev-parse have an appropriate option for this kind of
>> thing that gets rid of the need for "cut" in the first place?
>
> You mean `git rev-parse --short=4`? That does something _sligthly_
> different: it tries to shorten the OID to 4 characters _unless that would
> be ambiguous_. In our case, it _will_ be ambiguous. That's why I use
> `cut`.

Ah, yes of course; we want ambiguous prefix.  I think a more
thorough test would be to see that the output with --short=$n (where
n is the length of the abbreviated object name in $colliding_sha1)
is longer than $colliding_sha1 and the output prefix-matches
$colliding_sha1 iow, something like

	abbreviated=$(git rev-parse --short=7 HEAD) &&
	case "$abbreviated" in
	"$colliding_sha1"?*) : happy ;;
	*) false ;;
	esac &&
	...

which would make sure that we are testing colliding case.


> As to the crash in `rev-parse` _after_ printing out the OID: yes, there is
> a possibility for that. But that regression test is not about `rev-parse`,
> so it is actually a good thing that it would not trigger on such a bug ;-)

No, I do not think this test should be about rev-parse working
correctly---just that if it is easy enough to make the test robust
enough against such a breakage, it would be nice to do so, that's
all.

I'm not Eric but I suspect his primary point was not about worrying
about rev-parse crashing but more about avoiding to add a pattern
less experienced developers can copy&paste without thinking enough
to realize why it would be OK here and not OK in the context of the
tests they are adding.  That would be what I would worry about more
than rev-parse crashing in the part of the test under discussion.

Thanks.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 21, 2020

On the Git mailing list, Junio C Hamano wrote (reply to this):

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> The question whether to move the SHA-256 support patch into your series is
>> more a question to Junio, i.e. which patch series will be merged down
>> faster.
>
> I need to do a reroll of part 8, so I'll pick it into a future series
> (part 9) of test fixes and drop my patch.  That way it won't interfere
> with either series making progress, but it will still be included at the
> end.

Alright.  Thanks for coordinating between you two.

@@ -2118,18 +2118,19 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
saved = *end_of_object_name;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In the case that a `get_oid()` call failed, we showed some rather bogus
> part of the line instead of the precise string we sent to said function.
> That makes it rather hard for users to understand what is going wrong,
> so let's fix that.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ...
> @@ -2125,11 +2127,10 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  	item->arg_len = (int)(eol - bol);
>  
>  	if (status < 0)
> -		return error(_("could not parse '%.*s'"),
> -			     (int)(end_of_object_name - bol), bol);
> +		return status;
>  
>  	item->commit = lookup_commit_reference(r, &commit_oid);
> -	return !item->commit;
> +	return item->commit ? 0 : -1;

This changes the polarity of the error exit from positive 1 to
negative 1.  The only caller of this function takes anything
non-zero as a failure so this would not cause behaviour change, but
returning negative is more in line with the practice so it is an
improvement.

It is unrelated to the theme of this patch, and the proposed log
message does not even mention it, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Tue, 21 Jan 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In the case that a `get_oid()` call failed, we showed some rather bogus
> > part of the line instead of the precise string we sent to said function.
> > That makes it rather hard for users to understand what is going wrong,
> > so let's fix that.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ...
> > @@ -2125,11 +2127,10 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
> >  	item->arg_len = (int)(eol - bol);
> >
> >  	if (status < 0)
> > -		return error(_("could not parse '%.*s'"),
> > -			     (int)(end_of_object_name - bol), bol);
> > +		return status;
> >
> >  	item->commit = lookup_commit_reference(r, &commit_oid);
> > -	return !item->commit;
> > +	return item->commit ? 0 : -1;
>
> This changes the polarity of the error exit from positive 1 to
> negative 1.  The only caller of this function takes anything
> non-zero as a failure so this would not cause behaviour change, but
> returning negative is more in line with the practice so it is an
> improvement.
>
> It is unrelated to the theme of this patch, and the proposed log
> message does not even mention it, though.

You're right. I amended the commit message. Will send out a v3 soon.

Thanks,
Dscho

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2020

This patch series was integrated into pu via git@76318f9.

@gitgitgadget gitgitgadget bot added the pu label Jan 22, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jan 22, 2020

This branch is now known as js/rebase-i-with-colliding-hash.

In the case that a `get_oid()` call failed, we showed some rather bogus
part of the line instead of the precise string we sent to said function.
That makes it rather hard for users to understand what is going wrong,
so let's fix that.

While at it, return a negative value from `parse_insn_line()` in case of
an error, as per our convention. This function's only caller,
`todo_list_parse_insn_buffer()`, cares only whether that return value is
non-zero or not, i.e. does not need to be changed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In 66ae9a5 (t3404: rebase -i: demonstrate short SHA-1 collision,
2013-08-23), we added a test case that demonstrated how it is possible
that a previously unambiguous short commit ID could become ambiguous
*during* a rebase.

In 75c6976 (rebase -i: fix short SHA-1 collision, 2013-08-23), we
fixed that problem simply by writing out the todo list with expanded
commit IDs (except *right* before letting the user edit the todo list,
in which case we shorten them, but we expand them right after the file
was edited).

However, the bug resurfaced as a side effect of 393adf7 (sequencer:
directly call pick_commits() from complete_action(), 2019-11-24): as of
this commit, the sequencer no longer re-reads the todo list after
writing it out with expanded commit IDs.

The only redeeming factor is that the todo list is already parsed at
that stage, including all the commits corresponding to the commands,
therefore the sequencer can continue even if the internal todo list has
short commit IDs.

That does not prevent problems, though: the sequencer writes out the
`done` and `git-rebase-todo` files incrementally (i.e. overwriting the
todo list with a version that has _short_ commit IDs), and if a merge
conflict happens, or if an `edit` or a `break` command is encountered, a
subsequent `git rebase --continue` _will_ re-read the todo list, opening
an opportunity for the "short SHA-1 collision" bug again.

To avoid that, let's make sure that we do expand the commit IDs in the
todo list as soon as we have parsed it after letting the user edit it.

Additionally, we improve the 'short SHA-1 collide' test case in t3404 to
test specifically for the case where the rebase is resumed. We also
hard-code the expected colliding short SHA-1s, to document the
expectation (and to make it easier on future readers).

Note that we specifically test that the short commit ID is used in the
`git-rebase-todo.tmp` file: this file is created by the fake editor in
the test script and reflects the state that would have been presented to
the user to edit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When `rebase.missingCommitsCheck` is in effect, we use the backup of the
todo list that was copied just before the user was allowed to edit it.

That backup is, of course, just as susceptible to the hash collision as
the todo list itself: a reworded commit could make a previously
unambiguous short commit ID ambiguous all of a sudden.

So let's not just copy the todo list, but let's instead write out the
backup with expanded commit IDs.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the re-fix-rebase-i-with-sha-collisions branch from 486e941 to b7b0634 Compare January 23, 2020 11:47
@dscho
Copy link
Member Author

dscho commented Jan 23, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 23, 2020

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 24, 2020

This patch series was integrated into pu via git@ac924fe.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 27, 2020

This patch series was integrated into pu via git@5b403db.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 29, 2020

This patch series was integrated into pu via git@f16ef31.

@gitgitgadget
Copy link

gitgitgadget bot commented Jan 30, 2020

This patch series was integrated into pu via git@8c0352b.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 5, 2020

This patch series was integrated into pu via git@ac03faf.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 7, 2020

This patch series was integrated into pu via git@ca6cd45.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 7, 2020

This patch series was integrated into next via git@670ba07.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 11, 2020

This patch series was integrated into pu via git@306348a.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2020

This patch series was integrated into pu via git@ed3c8e8.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into pu via git@2511870.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into next via git@2511870.

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

This patch series was integrated into master via git@2511870.

@gitgitgadget gitgitgadget bot added the master label Feb 14, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Feb 14, 2020

Closed via 2511870.

@gitgitgadget gitgitgadget bot closed this Feb 14, 2020
@dscho dscho deleted the re-fix-rebase-i-with-sha-collisions branch February 15, 2020 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant