Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix difftool problem with intent-to-add files #654

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Jun 9, 2020

This problem was reported in git-for-windows#2677, but the problem actually lies with git diff --raw, and it seems that the bug has been with us ever since --intent-to-add was introduced.

Changes since v4:

  • Improved both commit messages after feedback from Junio.

Changes since v3:

  • Instead of showing the same OID in git diff-files --raw as in git diff-files -p for intent-to-add files, we now imitate the logic for modified (non-intent-to-add) files: we show the "null" OID (i.e. all-zero).

  • As a consequence, we no longer just undo the inadvertent change where intent-to-add files were reported with the empty tree instead of the empty blob, but we fix the bug that has been with us since run_diff_files() was taught about intent-to-add files, i.e. we fix the original bug of 425a28e (diff-lib: allow ita entries treated as "not yet exist in index", 2016-10-24), at long last.

Changes since v2:

  • Now we also drop the no-longer-used definition of hash_t in t2203.

Changes since v1:

  • Rebased onto sk/diff-files-show-i-t-a-as-new.
  • Verified that sk/diff-files-show-i-t-a-as-new does not completely resolve the issue (the --raw output still claims the empty blob as the post-image, although the difftool symptom "went away").
  • Amended the central patch of this PR to include a fix for the regression test that was introduced in sk/diff-files-show-i-t-a-as-new: it expected the raw diff to contain the hash of the empty tree object (which is incorrect no matter how you turn it: any hash in any raw diff should refer to blob objects).
  • Reordered the patches so that the central patch comes first (otherwise, the "empty tree" fix would cause a test failure in t2203).

Cc: Srinidhi Kaushik shrinidhi.kaushik@gmail.com

@dscho
Copy link
Member Author

dscho commented Jun 11, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 11, 2020

@dscho dscho changed the base branch from master to sk/diff-files-show-i-t-a-as-new June 23, 2020 10:27
@dscho
Copy link
Member Author

dscho commented Jun 23, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 23, 2020

diff-lib.c Outdated
@@ -217,6 +217,20 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
!is_null_oid(&ce->oid),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In `run_diff_files()`, files that have been staged with the intention to
> add are queued without a valid OID in the `diff_filepair`.
>
> When the output mode is, say, `DIFF_FORMAT_PATCH`, the
> `diff_fill_oid_info()` function, called from `run_diff()`, will remedy
> that situation by reading the file contents from disk.

The above is true.  What do we do for a path that is actually added
to the index but is stat-dirty or actually modified in the working
tree when we are giving the raw output?  Don't we give 0{40} to mean
"we dunno---you go look at the working tree"?  I think we should do
the same for i-t-a file wrt the object name.  In both cases, the
index does not know what the actual object name is, and we do not
want to run the index_path() and write out a new object in the
object database.

Using the status letter 'A' would also be appropriate, as we would
show "new file ..." in the --patch output in this case, which would
be consistent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Tue, 23 Jun 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In `run_diff_files()`, files that have been staged with the intention to
> > add are queued without a valid OID in the `diff_filepair`.
> >
> > When the output mode is, say, `DIFF_FORMAT_PATCH`, the
> > `diff_fill_oid_info()` function, called from `run_diff()`, will remedy
> > that situation by reading the file contents from disk.
>
> The above is true.  What do we do for a path that is actually added
> to the index but is stat-dirty or actually modified in the working
> tree when we are giving the raw output?  Don't we give 0{40} to mean
> "we dunno---you go look at the working tree"?  I think we should do
> the same for i-t-a file wrt the object name.  In both cases, the
> index does not know what the actual object name is, and we do not
> want to run the index_path() and write out a new object in the
> object database.

Sure, but my intention was to synchronize the `--raw` vs the `--patch`
output: the latter _already_ shows the correct hash. This here patch makes
the hash in the former's output match the latter's.

Besides, we're talking about the post-image of `diff-files`, i.e. the
worktree version, here. I seem to remember that the pre-image already uses
the all-zero hash to indicate precisely what you mentioned above.

Ciao,
Dscho

> Using the status letter 'A' would also be appropriate, as we would
> show "new file ..." in the --patch output in this case, which would
> be consistent.
>
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Sure, but my intention was to synchronize the `--raw` vs the `--patch`
> output: the latter _already_ shows the correct hash. This here patch makes
> the hash in the former's output match the latter's.

That is shooting for a wrong uniformity and breaking consistency
among the `--raw` modes.

 $ git reset --hard
 $ echo "/* end */" >cache.h ;# taint
 $ git diff-files --raw
 ... this shows (M)odified with 0{40} on the postimage
 ... 0{40} for side that is known to have contents from low-level diff
 ... means "object name unknown; figure it out yourself if you need it"
 $ git update-index cache.h
 $ git diff-files --raw
 ... of course we see nothing here.  Wait for a bit.
 $ touch cache.h ;# smudge
 $ git diff-files --raw
 ... this shows (M)odified with 0{40} on the postimage
 ... again, it says "it is stat dirty so I do not bother to compute"
 $ git update-index --refresh
 $ git diff-files --raw
 ... again we see nothing.

Any tools that work on "--raw" output must be already prepared to
see 0{40} on the side that is known to have contents and must know
to grab the contents from the working tree file if they need them,
so showing the 0{40} for i-t-a entry (whose definition is "the user
said in the past that the final contents of the file will be added
later, but Git does not know what object it will be yet") cannot
break them.  And the behaviour of giving 0{40} in such a case aligns
well with what is already done for paths already added to the index
when Git does not have an already-computed object name handy.

> Besides, we're talking about the post-image of `diff-files`, i.e. the
> worktree version, here. I seem to remember that the pre-image already uses
> the all-zero hash to indicate precisely what you mentioned above.

The 0{40} you see for pre-image for (A)dded paths means a completely
different thing from the 0{40} I have been explaining in the above,
so that is not relevant here.

By definition, there is *no* contents for the pre-image side of
(A)dded paths (that is why I stressed the "side that must have
contents" in the above description---it is determined by the type of
the change), but because the format requires us to place some
hexadecimal there, we fill the space with 0{40}.  

When we do not know the object name for the side that is known to
have contents without performing extra computation (including "stat
dirty so we cannot tell without rehashing"), we also use 0{40} as a
signal to say "we do not know the actual contents", but the consumer
of "--raw" format is expected to know the difference between "this
side is known to have no data and 0{40} is here as filler" and "this
side must have contents but we see 0{40} because Git does not have
it handy in precomputed form".

The above is the same for "diff-index --raw" without "--cached";
when we have to hash before we can give the object name (e.g. the
path is stat-dirty), we give 0{40} and let the consumer figure it
out if it needs to.

 $ git reset --hard
 $ touch COPYING
 $ git diff-index --raw HEAD
 ... we see (M)odified with 0{40} on the right hand side.

When the caller asks for "--patch" or any other output format that
actually needs contents for output, however, these low-level tools
do read the contents, and as a side effect, they may hash to obtain
the object name and show it [*1*].

By the way, as I do not want to see you waste your time going in a
wrong direction just to be "different", let me make it clear that as
far as the design of low level diff plumbing is concerned, what I
said here is final.  Please don't waste your time on arguing for
changing the design now after 15 years.  I want to see your time
used in a more productive way for the project.

Thanks.


[Footnote]

*1* This division of labor to free "--raw" mode of anything remotely
    unnecessary stems from the original diff plumbing design in May
    2005 where the "--raw" mode was the only output mode, and there
    was a separate "git-diff-helper" (look for it in the mailing
    list archive if you want to learn more) that reads a "--raw"
    output and transforms it into the patch form.  That "once we
    have the raw diff, we can pipe it to post-processing and do more
    interesting things" eventually led to the design of the diffcore
    pipeline where we match up (A)dded and (D)eleted entries to
    detect renames, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Sure, but my intention was to synchronize the `--raw` vs the `--patch`
>> output: the latter _already_ shows the correct hash. This here patch makes
>> the hash in the former's output match the latter's.
>
> That is shooting for a wrong uniformity and breaking consistency
> among the `--raw` modes.
> ...
>
> [Footnote]
>
> *1* This division of labor to free "--raw" mode of anything remotely
>     unnecessary stems from the original diff plumbing design in May
>     2005 where the "--raw" mode was the only output mode, and there
>     was a separate "git-diff-helper" (look for it in the mailing
>     list archive if you want to learn more) that reads a "--raw"
>     output and transforms it into the patch form.  That "once we
>     have the raw diff, we can pipe it to post-processing and do more
>     interesting things" eventually led to the design of the diffcore
>     pipeline where we match up (A)dded and (D)eleted entries to
>     detect renames, etc.

Having said all that.

If somebody wants to shift the burden of computing object names from
the consumers of "diff --raw" output to generators like diff-files
and diff-index, I do not mind if it is done under a new command line
option and done consistently for not just I-T-A additions, but
"modified but not added yet" and "not modified but stat information
is dirty" paths.  As I said, I would not recommend it, though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 24 Jun 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Sure, but my intention was to synchronize the `--raw` vs the `--patch`
> > output: the latter _already_ shows the correct hash. This here patch makes
> > the hash in the former's output match the latter's.
>
> That is shooting for a wrong uniformity and breaking consistency
> among the `--raw` modes.
>
>  $ git reset --hard
>  $ echo "/* end */" >cache.h ;# taint
>  $ git diff-files --raw
>  ... this shows (M)odified with 0{40} on the postimage
>  ... 0{40} for side that is known to have contents from low-level diff
>  ... means "object name unknown; figure it out yourself if you need it"
>  $ git update-index cache.h
>  $ git diff-files --raw
>  ... of course we see nothing here.  Wait for a bit.
>  $ touch cache.h ;# smudge
>  $ git diff-files --raw
>  ... this shows (M)odified with 0{40} on the postimage
>  ... again, it says "it is stat dirty so I do not bother to compute"
>  $ git update-index --refresh
>  $ git diff-files --raw
>  ... again we see nothing.
>
> Any tools that work on "--raw" output must be already prepared to
> see 0{40} on the side that is known to have contents and must know
> to grab the contents from the working tree file if they need them,
> so showing the 0{40} for i-t-a entry (whose definition is "the user
> said in the past that the final contents of the file will be added
> later, but Git does not know what object it will be yet") cannot
> break them.  And the behaviour of giving 0{40} in such a case aligns
> well with what is already done for paths already added to the index
> when Git does not have an already-computed object name handy.

Well, don't you know, I never realized that the hash shown by `git
diff-files --raw` for modified files was all-zero while `git diff-files
-p` showed the computed one matching the current worktree version!

> > Besides, we're talking about the post-image of `diff-files`, i.e. the
> > worktree version, here. I seem to remember that the pre-image already uses
> > the all-zero hash to indicate precisely what you mentioned above.
>
> The 0{40} you see for pre-image for (A)dded paths means a completely
> different thing from the 0{40} I have been explaining in the above,
> so that is not relevant here.
>
> By definition, there is *no* contents for the pre-image side of
> (A)dded paths (that is why I stressed the "side that must have
> contents" in the above description---it is determined by the type of
> the change), but because the format requires us to place some
> hexadecimal there, we fill the space with 0{40}.
>
> When we do not know the object name for the side that is known to
> have contents without performing extra computation (including "stat
> dirty so we cannot tell without rehashing"), we also use 0{40} as a
> signal to say "we do not know the actual contents", but the consumer
> of "--raw" format is expected to know the difference between "this
> side is known to have no data and 0{40} is here as filler" and "this
> side must have contents but we see 0{40} because Git does not have
> it handy in precomputed form".
>
> The above is the same for "diff-index --raw" without "--cached";
> when we have to hash before we can give the object name (e.g. the
> path is stat-dirty), we give 0{40} and let the consumer figure it
> out if it needs to.
>
>  $ git reset --hard
>  $ touch COPYING
>  $ git diff-index --raw HEAD
>  ... we see (M)odified with 0{40} on the right hand side.
>
> When the caller asks for "--patch" or any other output format that
> actually needs contents for output, however, these low-level tools
> do read the contents, and as a side effect, they may hash to obtain
> the object name and show it [*1*].
>
> By the way, as I do not want to see you waste your time going in a
> wrong direction just to be "different", let me make it clear that as
> far as the design of low level diff plumbing is concerned, what I
> said here is final.  Please don't waste your time on arguing for
> changing the design now after 15 years.  I want to see your time
> used in a more productive way for the project.

Thank you for patienty explaining to me something I managed to miss for a
decade and a half.

I'll send out v4 in a moment.

Ciao,
Dscho

>
> Thanks.
>
>
> [Footnote]
>
> *1* This division of labor to free "--raw" mode of anything remotely
>     unnecessary stems from the original diff plumbing design in May
>     2005 where the "--raw" mode was the only output mode, and there
>     was a separate "git-diff-helper" (look for it in the mailing
>     list archive if you want to learn more) that reads a "--raw"
>     output and transforms it into the patch form.  That "once we
>     have the raw diff, we can pipe it to post-processing and do more
>     interesting things" eventually led to the design of the diffcore
>     pipeline where we match up (A)dded and (D)eleted entries to
>     detect renames, etc.
>
>

diff-lib.c Outdated
@@ -217,10 +217,24 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
!is_null_oid(&ce->oid),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> 2017-05-30), the OID to use for intent-to-add files was inadvertently
> changed from the empty blob to the empty tree.

Well spotted.  Regardless of the other two patches, this is
absolutely the right thing to do.  It is quite embarrassing
that nobody noticed the typo so far.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2020

This branch is now known as js/diff-files-i-t-a-fix-for-difftool.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2020

This patch series was integrated into seen via git@559fb15.

@gitgitgadget gitgitgadget bot added the seen label Jun 24, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2020

This patch series was integrated into seen via git@36248c6.

diff-lib.c Outdated
@@ -217,6 +217,20 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
!is_null_oid(&ce->oid),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Srinidhi Kaushik wrote (reply to this):

Hi Johannes,

> The underlying problem is that some time ago, the (already incorrect)
> empty blob constant was replaced by the empty tree constant, by mistake. I
> contributed a patch series to fix that, and Cc:ed you you in v2 that I
> sent out earlier today.

Thanks for CC-ing me!

[...]

> +			} else if (ce_intent_to_add(ce) &&
> +				   !(revs->diffopt.output_format &
> +				     ~(DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS))) {
> +				struct object_id oid;
> +				int ret = lstat(ce->name, &st);
> +
> +				if (ret < 0)
> +					oidclr(&oid);
> +				else
> +					ret = index_path(istate, &oid,
> +						 ce->name, &st, 0);
> +				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> +					       &oid, ret >= 0, ce->name, 0);
> +				continue;

Instead of showing the hash for empty blobs for all entries previously,
introducing this shows the hash of non-empty "i-t-a" files correctly.
Nice.

[...]

>  			} else if (revs->diffopt.ita_invisible_in_index &&
>  				   ce_intent_to_add(ce)) {
>  				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> -					       the_hash_algo->empty_tree, 0,
> +					       the_hash_algo->empty_blob, 0,
>  					       ce->name, 0);
>  				continue;
>  			}

Oh, I totally missed this in my patch; the change looks good!

> -  :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> -  :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> +  :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
> +  :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty

Changing the test-case to reflect to the hash of the blob also makes
sense.


[...]

> > +     hash_e=$(git hash-object empty) &&
> > +     hash_n=$(git hash-object not-empty) &&
> > +     hash_t=$(git hash-object -t tree /dev/null) &&
>
> > So this is the hash of the empty tree object, and...

I guess we can get rid of the `hash_t' assignment here, because it
won't be used anywhere else in the test.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Srinidhi,

On Wed, 24 Jun 2020, Srinidhi Kaushik wrote:

> > The underlying problem is that some time ago, the (already incorrect)
> > empty blob constant was replaced by the empty tree constant, by mistake. I
> > contributed a patch series to fix that, and Cc:ed you you in v2 that I
> > sent out earlier today.
>
> Thanks for CC-ing me!

Of course!

> > -  :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> > -  :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> > +  :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
> > +  :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty
>
> Changing the test-case to reflect to the hash of the blob also makes
> sense.

Yes, that's the post-image side. The pre-image side is marked with 0000000
(which is Git's way to say "dunno! no current information in the index").

And it is reflecting the `--patch` case that is unfortunately not visible
in the diff context:

        cat >expect.diff_p <<-EOF &&
        diff --git a/empty b/empty
        new file mode 100644
        index 0000000..$(git rev-parse --short $hash_e)
        diff --git a/not-empty b/not-empty
        new file mode 100644
        index 0000000..$(git rev-parse --short $hash_n)
        --- /dev/null
        +++ b/not-empty
        @@ -0,0 +1 @@
        +$content
        EOF

(This comment is not so much directed at you as it really is a
continuation of my reply to Junio, see
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2006241517320.54@tvgsbejvaqbjf.bet/)

> [...]
>
> > > +     hash_e=$(git hash-object empty) &&
> > > +     hash_n=$(git hash-object not-empty) &&
> > > +     hash_t=$(git hash-object -t tree /dev/null) &&
> >
> > > So this is the hash of the empty tree object, and...
>
> I guess we can get rid of the `hash_t' assignment here, because it
> won't be used anywhere else in the test.

Right! I _knew_ there was something I wanted to do, still, but then
forgot all about it...

Thanks,
Dscho

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > +  :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
>> > +  :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty
>>
>> Changing the test-case to reflect to the hash of the blob also makes
>> sense.
>
> Yes, that's the post-image side. The pre-image side is marked with 0000000
> (which is Git's way to say "dunno! no current information in the index").

The post-image side for (A)dded and (M)odified can have 0{40} when
Git wants to say "we don't have it in precomputed form.  you can
grab the working tree files and figure it out, if you need it", as
you said, but the pre-image side for (A)dded entries are "we know
there is nothing because we are Adding, but we need to have
something here, so we have 0+ as a filler".

@dscho
Copy link
Member Author

dscho commented Jun 24, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2020

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 24, 2020

This patch series was integrated into seen via git@f7a89e4.

@dscho
Copy link
Member Author

dscho commented Jun 25, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 25, 2020

@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
} else if (revs->diffopt.ita_invisible_in_index &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> diff --git a/diff-lib.c b/diff-lib.c
> index 61812f48c2..25fd2dee19 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			} else if (revs->diffopt.ita_invisible_in_index &&
>  				   ce_intent_to_add(ce)) {
>  				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> -					       the_hash_algo->empty_tree, 0,
> -					       ce->name, 0);
> +					       &null_oid, 0, ce->name, 0);

This (even if the preimage were correctly using empty_blob) is *so*
simple a change that it is very surprising that the new test in
[2/2] passes without any other code change.

It means that difftool was written correctly in the first place,
right?

Nicely done.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,


On Thu, 25 Jun 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > diff --git a/diff-lib.c b/diff-lib.c
> > index 61812f48c2..25fd2dee19 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> >  			} else if (revs->diffopt.ita_invisible_in_index &&
> >  				   ce_intent_to_add(ce)) {
> >  				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> > -					       the_hash_algo->empty_tree, 0,
> > -					       ce->name, 0);
> > +					       &null_oid, 0, ce->name, 0);
>
> This (even if the preimage were correctly using empty_blob) is *so*
> simple a change that it is very surprising that the new test in
> [2/2] passes without any other code change.
>
> It means that difftool was written correctly in the first place,
> right?

Well, it means that difftool does not even consume the OID. Sure, it
parses it, but then it ignores it. All it _really_ is interested in is
that status flag (`A`), so technically, my fix in 1/2 is not even needed
after `sk/diff-files-show-i-t-a-as-new` to let 2/2 pass.

Ciao,
Dscho

>
> Nicely done.
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Thu, 25 Jun 2020, Junio C Hamano wrote:
>
>> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>> writes:
>>
>> > diff --git a/diff-lib.c b/diff-lib.c
>> > index 61812f48c2..25fd2dee19 100644
>> > --- a/diff-lib.c
>> > +++ b/diff-lib.c
>> > @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>> >  			} else if (revs->diffopt.ita_invisible_in_index &&
>> >  				   ce_intent_to_add(ce)) {
>> >  				diff_addremove(&revs->diffopt, '+', ce->ce_mode,
>> > -					       the_hash_algo->empty_tree, 0,
>> > -					       ce->name, 0);
>> > +					       &null_oid, 0, ce->name, 0);
>>
>> This (even if the preimage were correctly using empty_blob) is *so*
>> simple a change that it is very surprising that the new test in
>> [2/2] passes without any other code change.
>>
>> It means that difftool was written correctly in the first place,
>> right?
>
> Well, it means that difftool does not even consume the OID. Sure, it
> parses it, but then it ignores it. All it _really_ is interested in is
> that status flag (`A`),

Ah, that's an ultimately defensive code.  No matter what is on the
right hand side of the raw output, as long as it knows that the
right hand side is a file on the working tree, it can safely ignore
the object name and directly go to the working tree file.

Nice ;-)

> so technically, my fix in 1/2 is not even needed
> after `sk/diff-files-show-i-t-a-as-new` to let 2/2 pass.

Sure, but I think it is the right thing to do nevertheless.  Giving
the object name for an empty blob would be wrong unless the working
tree file is known to be empty (and empty tree?  what were we even
thinking, gee...).

@@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
test_cmp expect actual
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In https://github.com/git-for-windows/git/issues/2677, a `git difftool
> -d` problem was reported. The underlying cause was a bug in `git
> diff-files --raw` that we just fixed.

That leaves a gap between "there is some unspecified problem" and
"the problem was cased by such and such" that forces readers to
either know the problem at heart (may apply to you and me right now,
but I am not sure about me 3 months in the future) or go check the
external website.

Can we fill the gap by saying how seeing the object name of empty
blob (or worse, tree) instead of 0{40} made "difftool -d" upset?

Thanks.


> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t7800-difftool.sh | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
> index 29b92907e2..524f30f7dc 100755
> --- a/t/t7800-difftool.sh
> +++ b/t/t7800-difftool.sh
> @@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'add -N and difftool -d' '
> +	test_when_finished git reset --hard &&
> +
> +	test_write_lines A B C >intent-to-add &&
> +	git add -N intent-to-add &&
> +	git difftool --dir-diff --extcmd ls
> +'
> +
>  test_expect_success 'outside worktree' '
>  	echo 1 >1 &&
>  	echo 2 >2 &&

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 25 Jun 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In https://github.com/git-for-windows/git/issues/2677, a `git difftool
> > -d` problem was reported. The underlying cause was a bug in `git
> > diff-files --raw` that we just fixed.
>
> That leaves a gap between "there is some unspecified problem" and
> "the problem was cased by such and such" that forces readers to
> either know the problem at heart (may apply to you and me right now,
> but I am not sure about me 3 months in the future) or go check the
> external website.
>
> Can we fill the gap by saying how seeing the object name of empty
> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?

Sorry about catching this only now, after the commit hit `next`.

Filling the gap is a slightly more complicated.

And now that I looked at the code again, to make sure that I don't say
anything stupid, I realize that I just provided incorrect information in
my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
t7800 with this here patch.

Your guess was almost spot on: the empty blob would have worked (as in:
not caused an error, but it would have shown incorrect information). The
problem really is the attempt trying to read the empty tree as if it was a
blob. That results in something like this:

	error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
	error: could not write 'intent-to-add'

And yes, it would have been good to adjust the commit message as you
suggested. Sorry for not getting to it in time before it hit `next`.

Do you want me to send out a v5 and drop v4 from `next` in favor of the
new iteration?

Ciao,
Dscho

>
> Thanks.
>
>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  t/t7800-difftool.sh | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
> > index 29b92907e2..524f30f7dc 100755
> > --- a/t/t7800-difftool.sh
> > +++ b/t/t7800-difftool.sh
> > @@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
> >  	test_cmp expect actual
> >  '
> >
> > +test_expect_success 'add -N and difftool -d' '
> > +	test_when_finished git reset --hard &&
> > +
> > +	test_write_lines A B C >intent-to-add &&
> > +	git add -N intent-to-add &&
> > +	git difftool --dir-diff --extcmd ls
> > +'
> > +
> >  test_expect_success 'outside worktree' '
> >  	echo 1 >1 &&
> >  	echo 2 >2 &&
>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Can we fill the gap by saying how seeing the object name of empty
>> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
>
> Sorry about catching this only now, after the commit hit `next`.
>
> Filling the gap is a slightly more complicated.
>
> And now that I looked at the code again, to make sure that I don't say
> anything stupid, I realize that I just provided incorrect information in
> my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
> t7800 with this here patch.
>
> Your guess was almost spot on: the empty blob would have worked (as in:
> not caused an error, but it would have shown incorrect information). The
> problem really is the attempt trying to read the empty tree as if it was a
> blob. That results in something like this:
>
> 	error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
> 	error: could not write 'intent-to-add'
>
> And yes, it would have been good to adjust the commit message as you
> suggested. Sorry for not getting to it in time before it hit `next`.
>
> Do you want me to send out a v5 and drop v4 from `next` in favor of the
> new iteration?

That would help the future "us" quite a lot.  Thanks for carefully
thinking it through.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Wed, 1 Jul 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> Can we fill the gap by saying how seeing the object name of empty
> >> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
> >
> > Sorry about catching this only now, after the commit hit `next`.
> >
> > Filling the gap is a slightly more complicated.
> >
> > And now that I looked at the code again, to make sure that I don't say
> > anything stupid, I realize that I just provided incorrect information in
> > my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
> > t7800 with this here patch.
> >
> > Your guess was almost spot on: the empty blob would have worked (as in:
> > not caused an error, but it would have shown incorrect information). The
> > problem really is the attempt trying to read the empty tree as if it was a
> > blob. That results in something like this:
> >
> > 	error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
> > 	error: could not write 'intent-to-add'
> >
> > And yes, it would have been good to adjust the commit message as you
> > suggested. Sorry for not getting to it in time before it hit `next`.
> >
> > Do you want me to send out a v5 and drop v4 from `next` in favor of the
> > new iteration?
>
> That would help the future "us" quite a lot.  Thanks for carefully
> thinking it through.

You're welcome. Here goes v5:
https://lore.kernel.org/git/pull.654.v5.git.1593638347.gitgitgadget@gmail.com/

Ciao,
Dscho

@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
} else if (revs->diffopt.ita_invisible_in_index &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The documented behavior of `git diff-files --raw` is to display
>
> 	[...] 0{40} if creation, unmerged or "look at work tree".

"on the right hand (i.e. postimage) side" is missing, which is
crucial in this description.

> This happens for example when showing modified, unstaged files.

Modified I would understand.  We notice that the contents on the
work tree is different from what is in the index, and we tell the
consumer "look at work tree".  I do not think you meant "unstaged",
though.  If truly removed from the index, diff-files won't even know
about the path.  You probably meant "removed from the working tree",
but 0{40} in that case means totally different thing (i.e. it would
be a (D)eletion record, so 0{40} on the postimage side is a filler,
not even "look at work tree").  What would make more sense to
describe might be

	This happens for paths that are modified, or unmodified but
	stat-dirty.

Either case, we tell the consumer to check the "work tree".

> For intent-to-add files, we used to show the empty blob's hash instead.
> In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> 2017-05-30), we made that worse by inadvertently changing that to the
> hash of the empty tree.
>
> Let's make the behavior consistent with modified files by showing
> all-zero values also for intent-to-add files.

Well described.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 25 Jun 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > The documented behavior of `git diff-files --raw` is to display
> >
> > 	[...] 0{40} if creation, unmerged or "look at work tree".
>
> "on the right hand (i.e. postimage) side" is missing, which is
> crucial in this description.

True. Sorry for omitting that.

> > This happens for example when showing modified, unstaged files.
>
> Modified I would understand.  We notice that the contents on the
> work tree is different from what is in the index, and we tell the
> consumer "look at work tree".  I do not think you meant "unstaged",

Right. I probably should have written "[...] when showing files with
unstaged modifications".

> though.  If truly removed from the index, diff-files won't even know
> about the path.  You probably meant "removed from the working tree",
> but 0{40} in that case means totally different thing (i.e. it would
> be a (D)eletion record, so 0{40} on the postimage side is a filler,
> not even "look at work tree").  What would make more sense to
> describe might be
>
> 	This happens for paths that are modified, or unmodified but
> 	stat-dirty.

Yes, but that includes more than I wanted to describe because the
stat-dirty does not matter for intent-to-add files, and I wanted to point
out the analogy between unstaged modifications and intent-to-add-files.

I noticed that you merged this commit into `next` already, so I am
assuming that you do not want me to send a fifth iteration ;-)

Ciao,
Dscho

> Either case, we tell the consumer to check the "work tree".
>
> > For intent-to-add files, we used to show the empty blob's hash instead.
> > In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> > 2017-05-30), we made that worse by inadvertently changing that to the
> > hash of the empty tree.
> >
> > Let's make the behavior consistent with modified files by showing
> > all-zero values also for intent-to-add files.
>
> Well described.
>
> Thanks.
>

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 25, 2020

This patch series was integrated into seen via git@fea4f28.

@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
} else if (revs->diffopt.ita_invisible_in_index &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Srinidhi Kaushik wrote (reply to this):

Sorry for the late reply.

> Let's make the behavior consistent with modified files by showing
> all-zero values also for intent-to-add files.
>
-- >8 --
>
>                               diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> -                                            the_hash_algo->empty_tree, 0,
> -                                            ce->name, 0);
> +                                            &null_oid, 0, ce->name, 0);
>                               continue;
>                       }
>
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index 8a5d55054f..cf0175ad6e 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -240,7 +240,6 @@ test_expect_success 'i-t-a files shown as new for "diff", "diff-files"; not-new
>
>       hash_e=$(git hash-object empty) &&
>       hash_n=$(git hash-object not-empty) &&
> -     hash_t=$(git hash-object -t tree /dev/null) &&
>
>       cat >expect.diff_p <<-EOF &&
>       diff --git a/empty b/empty
> @@ -259,8 +258,8 @@ test_expect_success 'i-t-a files shown as new for "diff", "diff-files"; not-new
>        create mode 100644 not-empty
>       EOF
>       cat >expect.diff_a <<-EOF &&
> -     :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> -     :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> +     :000000 100644 0000000 0000000 A$(printf "\t")empty
> +     :000000 100644 0000000 0000000 A$(printf "\t")not-empty

This change (and the new test in [PATCH v4 2/2]) looks good to me.

I learnt quite a bit about what the string "0{40}" means in the context
of `diff'  post-image from the conversations around this patch. It was
very helpful.

Thank you.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 29, 2020

This patch series was integrated into seen via git@30064d9.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 29, 2020

This patch series was integrated into next via git@a9fcdab.

@gitgitgadget gitgitgadget bot added the next label Jun 29, 2020
dscho added 2 commits July 1, 2020 23:11
The documented behavior of `git diff-files --raw` is to display

	[...] 0{40} if creation, unmerged or "look at work tree".

on the right hand (i.e. postimage) side. This happens for files that
have unstaged modifications, and for files that are unmodified but
stat-dirty.

For intent-to-add files, we used to show the empty blob's hash instead.
In c26022e (diff: convert diff_addremove to struct object_id,
2017-05-30), we made that worse by inadvertently changing that to the
hash of the empty tree.

Let's make the behavior consistent with files that have unstaged
modifications (which applies to intent-to-add files, too) by showing
all-zero values also for intent-to-add files.

Accordingly, this patch adjusts the expectations set by the regression
test introduced in feea694 (diff-files: treat "i-t-a" files as
"not-in-index", 2020-06-20).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In git-for-windows#2677, a `git difftool
-d` problem was reported. The underlying cause was a bug in `git
diff-files --raw` that we just fixed: it reported intent-to-add files
with the empty _tree_ as the post-image OID, when we need to show
an all-zero (or, "null") OID instead, to indicate to the caller that
they have to look at the worktree file.

The symptom of that problem shown by `git difftool` was this:

	error: unable to read sha1 file of <path> (<empty-tree-OID>)
	error: could not write '<filename>'

Make sure that the reported `difftool` problem stays fixed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Jul 1, 2020

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 1, 2020

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2020

This patch series was integrated into seen via git@6387277.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 2, 2020

This patch series was integrated into next via git@cf5d186.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 7, 2020

This patch series was integrated into seen via git@0ac0947.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 7, 2020

This patch series was integrated into next via git@0ac0947.

@gitgitgadget
Copy link

gitgitgadget bot commented Jul 7, 2020

This patch series was integrated into master via git@0ac0947.

@gitgitgadget gitgitgadget bot added the master label Jul 7, 2020
@gitgitgadget gitgitgadget bot closed this Jul 7, 2020
@gitgitgadget
Copy link

gitgitgadget bot commented Jul 7, 2020

Closed via 0ac0947.

@dscho dscho deleted the difftool-ita branch July 8, 2020 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant