-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix difftool problem with intent-to-add files #654
Fix difftool problem with intent-to-add files #654
Conversation
/submit |
Submitted as pull.654.git.1591879139.gitgitgadget@gmail.com |
/submit |
Submitted as pull.654.v2.git.1592916485.gitgitgadget@gmail.com |
diff-lib.c
Outdated
@@ -217,6 +217,20 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
!is_null_oid(&ce->oid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In `run_diff_files()`, files that have been staged with the intention to
> add are queued without a valid OID in the `diff_filepair`.
>
> When the output mode is, say, `DIFF_FORMAT_PATCH`, the
> `diff_fill_oid_info()` function, called from `run_diff()`, will remedy
> that situation by reading the file contents from disk.
The above is true. What do we do for a path that is actually added
to the index but is stat-dirty or actually modified in the working
tree when we are giving the raw output? Don't we give 0{40} to mean
"we dunno---you go look at the working tree"? I think we should do
the same for i-t-a file wrt the object name. In both cases, the
index does not know what the actual object name is, and we do not
want to run the index_path() and write out a new object in the
object database.
Using the status letter 'A' would also be appropriate, as we would
show "new file ..." in the --patch output in this case, which would
be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Tue, 23 Jun 2020, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In `run_diff_files()`, files that have been staged with the intention to
> > add are queued without a valid OID in the `diff_filepair`.
> >
> > When the output mode is, say, `DIFF_FORMAT_PATCH`, the
> > `diff_fill_oid_info()` function, called from `run_diff()`, will remedy
> > that situation by reading the file contents from disk.
>
> The above is true. What do we do for a path that is actually added
> to the index but is stat-dirty or actually modified in the working
> tree when we are giving the raw output? Don't we give 0{40} to mean
> "we dunno---you go look at the working tree"? I think we should do
> the same for i-t-a file wrt the object name. In both cases, the
> index does not know what the actual object name is, and we do not
> want to run the index_path() and write out a new object in the
> object database.
Sure, but my intention was to synchronize the `--raw` vs the `--patch`
output: the latter _already_ shows the correct hash. This here patch makes
the hash in the former's output match the latter's.
Besides, we're talking about the post-image of `diff-files`, i.e. the
worktree version, here. I seem to remember that the pre-image already uses
the all-zero hash to indicate precisely what you mentioned above.
Ciao,
Dscho
> Using the status letter 'A' would also be appropriate, as we would
> show "new file ..." in the --patch output in this case, which would
> be consistent.
>
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Sure, but my intention was to synchronize the `--raw` vs the `--patch`
> output: the latter _already_ shows the correct hash. This here patch makes
> the hash in the former's output match the latter's.
That is shooting for a wrong uniformity and breaking consistency
among the `--raw` modes.
$ git reset --hard
$ echo "/* end */" >cache.h ;# taint
$ git diff-files --raw
... this shows (M)odified with 0{40} on the postimage
... 0{40} for side that is known to have contents from low-level diff
... means "object name unknown; figure it out yourself if you need it"
$ git update-index cache.h
$ git diff-files --raw
... of course we see nothing here. Wait for a bit.
$ touch cache.h ;# smudge
$ git diff-files --raw
... this shows (M)odified with 0{40} on the postimage
... again, it says "it is stat dirty so I do not bother to compute"
$ git update-index --refresh
$ git diff-files --raw
... again we see nothing.
Any tools that work on "--raw" output must be already prepared to
see 0{40} on the side that is known to have contents and must know
to grab the contents from the working tree file if they need them,
so showing the 0{40} for i-t-a entry (whose definition is "the user
said in the past that the final contents of the file will be added
later, but Git does not know what object it will be yet") cannot
break them. And the behaviour of giving 0{40} in such a case aligns
well with what is already done for paths already added to the index
when Git does not have an already-computed object name handy.
> Besides, we're talking about the post-image of `diff-files`, i.e. the
> worktree version, here. I seem to remember that the pre-image already uses
> the all-zero hash to indicate precisely what you mentioned above.
The 0{40} you see for pre-image for (A)dded paths means a completely
different thing from the 0{40} I have been explaining in the above,
so that is not relevant here.
By definition, there is *no* contents for the pre-image side of
(A)dded paths (that is why I stressed the "side that must have
contents" in the above description---it is determined by the type of
the change), but because the format requires us to place some
hexadecimal there, we fill the space with 0{40}.
When we do not know the object name for the side that is known to
have contents without performing extra computation (including "stat
dirty so we cannot tell without rehashing"), we also use 0{40} as a
signal to say "we do not know the actual contents", but the consumer
of "--raw" format is expected to know the difference between "this
side is known to have no data and 0{40} is here as filler" and "this
side must have contents but we see 0{40} because Git does not have
it handy in precomputed form".
The above is the same for "diff-index --raw" without "--cached";
when we have to hash before we can give the object name (e.g. the
path is stat-dirty), we give 0{40} and let the consumer figure it
out if it needs to.
$ git reset --hard
$ touch COPYING
$ git diff-index --raw HEAD
... we see (M)odified with 0{40} on the right hand side.
When the caller asks for "--patch" or any other output format that
actually needs contents for output, however, these low-level tools
do read the contents, and as a side effect, they may hash to obtain
the object name and show it [*1*].
By the way, as I do not want to see you waste your time going in a
wrong direction just to be "different", let me make it clear that as
far as the design of low level diff plumbing is concerned, what I
said here is final. Please don't waste your time on arguing for
changing the design now after 15 years. I want to see your time
used in a more productive way for the project.
Thanks.
[Footnote]
*1* This division of labor to free "--raw" mode of anything remotely
unnecessary stems from the original diff plumbing design in May
2005 where the "--raw" mode was the only output mode, and there
was a separate "git-diff-helper" (look for it in the mailing
list archive if you want to learn more) that reads a "--raw"
output and transforms it into the patch form. That "once we
have the raw diff, we can pipe it to post-processing and do more
interesting things" eventually led to the design of the diffcore
pipeline where we match up (A)dded and (D)eleted entries to
detect renames, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Junio C Hamano <gitster@pobox.com> writes:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Sure, but my intention was to synchronize the `--raw` vs the `--patch`
>> output: the latter _already_ shows the correct hash. This here patch makes
>> the hash in the former's output match the latter's.
>
> That is shooting for a wrong uniformity and breaking consistency
> among the `--raw` modes.
> ...
>
> [Footnote]
>
> *1* This division of labor to free "--raw" mode of anything remotely
> unnecessary stems from the original diff plumbing design in May
> 2005 where the "--raw" mode was the only output mode, and there
> was a separate "git-diff-helper" (look for it in the mailing
> list archive if you want to learn more) that reads a "--raw"
> output and transforms it into the patch form. That "once we
> have the raw diff, we can pipe it to post-processing and do more
> interesting things" eventually led to the design of the diffcore
> pipeline where we match up (A)dded and (D)eleted entries to
> detect renames, etc.
Having said all that.
If somebody wants to shift the burden of computing object names from
the consumers of "diff --raw" output to generators like diff-files
and diff-index, I do not mind if it is done under a new command line
option and done consistently for not just I-T-A additions, but
"modified but not added yet" and "not modified but stat information
is dirty" paths. As I said, I would not recommend it, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Wed, 24 Jun 2020, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Sure, but my intention was to synchronize the `--raw` vs the `--patch`
> > output: the latter _already_ shows the correct hash. This here patch makes
> > the hash in the former's output match the latter's.
>
> That is shooting for a wrong uniformity and breaking consistency
> among the `--raw` modes.
>
> $ git reset --hard
> $ echo "/* end */" >cache.h ;# taint
> $ git diff-files --raw
> ... this shows (M)odified with 0{40} on the postimage
> ... 0{40} for side that is known to have contents from low-level diff
> ... means "object name unknown; figure it out yourself if you need it"
> $ git update-index cache.h
> $ git diff-files --raw
> ... of course we see nothing here. Wait for a bit.
> $ touch cache.h ;# smudge
> $ git diff-files --raw
> ... this shows (M)odified with 0{40} on the postimage
> ... again, it says "it is stat dirty so I do not bother to compute"
> $ git update-index --refresh
> $ git diff-files --raw
> ... again we see nothing.
>
> Any tools that work on "--raw" output must be already prepared to
> see 0{40} on the side that is known to have contents and must know
> to grab the contents from the working tree file if they need them,
> so showing the 0{40} for i-t-a entry (whose definition is "the user
> said in the past that the final contents of the file will be added
> later, but Git does not know what object it will be yet") cannot
> break them. And the behaviour of giving 0{40} in such a case aligns
> well with what is already done for paths already added to the index
> when Git does not have an already-computed object name handy.
Well, don't you know, I never realized that the hash shown by `git
diff-files --raw` for modified files was all-zero while `git diff-files
-p` showed the computed one matching the current worktree version!
> > Besides, we're talking about the post-image of `diff-files`, i.e. the
> > worktree version, here. I seem to remember that the pre-image already uses
> > the all-zero hash to indicate precisely what you mentioned above.
>
> The 0{40} you see for pre-image for (A)dded paths means a completely
> different thing from the 0{40} I have been explaining in the above,
> so that is not relevant here.
>
> By definition, there is *no* contents for the pre-image side of
> (A)dded paths (that is why I stressed the "side that must have
> contents" in the above description---it is determined by the type of
> the change), but because the format requires us to place some
> hexadecimal there, we fill the space with 0{40}.
>
> When we do not know the object name for the side that is known to
> have contents without performing extra computation (including "stat
> dirty so we cannot tell without rehashing"), we also use 0{40} as a
> signal to say "we do not know the actual contents", but the consumer
> of "--raw" format is expected to know the difference between "this
> side is known to have no data and 0{40} is here as filler" and "this
> side must have contents but we see 0{40} because Git does not have
> it handy in precomputed form".
>
> The above is the same for "diff-index --raw" without "--cached";
> when we have to hash before we can give the object name (e.g. the
> path is stat-dirty), we give 0{40} and let the consumer figure it
> out if it needs to.
>
> $ git reset --hard
> $ touch COPYING
> $ git diff-index --raw HEAD
> ... we see (M)odified with 0{40} on the right hand side.
>
> When the caller asks for "--patch" or any other output format that
> actually needs contents for output, however, these low-level tools
> do read the contents, and as a side effect, they may hash to obtain
> the object name and show it [*1*].
>
> By the way, as I do not want to see you waste your time going in a
> wrong direction just to be "different", let me make it clear that as
> far as the design of low level diff plumbing is concerned, what I
> said here is final. Please don't waste your time on arguing for
> changing the design now after 15 years. I want to see your time
> used in a more productive way for the project.
Thank you for patienty explaining to me something I managed to miss for a
decade and a half.
I'll send out v4 in a moment.
Ciao,
Dscho
>
> Thanks.
>
>
> [Footnote]
>
> *1* This division of labor to free "--raw" mode of anything remotely
> unnecessary stems from the original diff plumbing design in May
> 2005 where the "--raw" mode was the only output mode, and there
> was a separate "git-diff-helper" (look for it in the mailing
> list archive if you want to learn more) that reads a "--raw"
> output and transforms it into the patch form. That "once we
> have the raw diff, we can pipe it to post-processing and do more
> interesting things" eventually led to the design of the diffcore
> pipeline where we match up (A)dded and (D)eleted entries to
> detect renames, etc.
>
>
diff-lib.c
Outdated
@@ -217,10 +217,24 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
!is_null_oid(&ce->oid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> 2017-05-30), the OID to use for intent-to-add files was inadvertently
> changed from the empty blob to the empty tree.
Well spotted. Regardless of the other two patches, this is
absolutely the right thing to do. It is quite embarrassing
that nobody noticed the typo so far.
This branch is now known as |
This patch series was integrated into seen via git@559fb15. |
This patch series was integrated into seen via git@36248c6. |
diff-lib.c
Outdated
@@ -217,6 +217,20 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
!is_null_oid(&ce->oid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Srinidhi Kaushik wrote (reply to this):
Hi Johannes,
> The underlying problem is that some time ago, the (already incorrect)
> empty blob constant was replaced by the empty tree constant, by mistake. I
> contributed a patch series to fix that, and Cc:ed you you in v2 that I
> sent out earlier today.
Thanks for CC-ing me!
[...]
> + } else if (ce_intent_to_add(ce) &&
> + !(revs->diffopt.output_format &
> + ~(DIFF_FORMAT_RAW | DIFF_FORMAT_NAME_STATUS))) {
> + struct object_id oid;
> + int ret = lstat(ce->name, &st);
> +
> + if (ret < 0)
> + oidclr(&oid);
> + else
> + ret = index_path(istate, &oid,
> + ce->name, &st, 0);
> + diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> + &oid, ret >= 0, ce->name, 0);
> + continue;
Instead of showing the hash for empty blobs for all entries previously,
introducing this shows the hash of non-empty "i-t-a" files correctly.
Nice.
[...]
> } else if (revs->diffopt.ita_invisible_in_index &&
> ce_intent_to_add(ce)) {
> diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> - the_hash_algo->empty_tree, 0,
> + the_hash_algo->empty_blob, 0,
> ce->name, 0);
> continue;
> }
Oh, I totally missed this in my patch; the change looks good!
> - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> + :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
> + :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty
Changing the test-case to reflect to the hash of the blob also makes
sense.
[...]
> > + hash_e=$(git hash-object empty) &&
> > + hash_n=$(git hash-object not-empty) &&
> > + hash_t=$(git hash-object -t tree /dev/null) &&
>
> > So this is the hash of the empty tree object, and...
I guess we can get rid of the `hash_t' assignment here, because it
won't be used anywhere else in the test.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Srinidhi,
On Wed, 24 Jun 2020, Srinidhi Kaushik wrote:
> > The underlying problem is that some time ago, the (already incorrect)
> > empty blob constant was replaced by the empty tree constant, by mistake. I
> > contributed a patch series to fix that, and Cc:ed you you in v2 that I
> > sent out earlier today.
>
> Thanks for CC-ing me!
Of course!
> > - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> > - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> > + :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
> > + :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty
>
> Changing the test-case to reflect to the hash of the blob also makes
> sense.
Yes, that's the post-image side. The pre-image side is marked with 0000000
(which is Git's way to say "dunno! no current information in the index").
And it is reflecting the `--patch` case that is unfortunately not visible
in the diff context:
cat >expect.diff_p <<-EOF &&
diff --git a/empty b/empty
new file mode 100644
index 0000000..$(git rev-parse --short $hash_e)
diff --git a/not-empty b/not-empty
new file mode 100644
index 0000000..$(git rev-parse --short $hash_n)
--- /dev/null
+++ b/not-empty
@@ -0,0 +1 @@
+$content
EOF
(This comment is not so much directed at you as it really is a
continuation of my reply to Junio, see
https://lore.kernel.org/git/nycvar.QRO.7.76.6.2006241517320.54@tvgsbejvaqbjf.bet/)
> [...]
>
> > > + hash_e=$(git hash-object empty) &&
> > > + hash_n=$(git hash-object not-empty) &&
> > > + hash_t=$(git hash-object -t tree /dev/null) &&
> >
> > > So this is the hash of the empty tree object, and...
>
> I guess we can get rid of the `hash_t' assignment here, because it
> won't be used anywhere else in the test.
Right! I _knew_ there was something I wanted to do, still, but then
forgot all about it...
Thanks,
Dscho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> > + :000000 100644 0000000 $(git rev-parse --short $hash_e) A$(printf "\t")empty
>> > + :000000 100644 0000000 $(git rev-parse --short $hash_n) A$(printf "\t")not-empty
>>
>> Changing the test-case to reflect to the hash of the blob also makes
>> sense.
>
> Yes, that's the post-image side. The pre-image side is marked with 0000000
> (which is Git's way to say "dunno! no current information in the index").
The post-image side for (A)dded and (M)odified can have 0{40} when
Git wants to say "we don't have it in precomputed form. you can
grab the working tree files and figure it out, if you need it", as
you said, but the pre-image side for (A)dded entries are "we know
there is nothing because we are Adding, but we need to have
something here, so we have 0+ as a filler".
/submit |
Submitted as pull.654.v3.git.1593010066.gitgitgadget@gmail.com |
This patch series was integrated into seen via git@f7a89e4. |
/submit |
Submitted as pull.654.v4.git.1593107621.gitgitgadget@gmail.com |
@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
} else if (revs->diffopt.ita_invisible_in_index && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> diff --git a/diff-lib.c b/diff-lib.c
> index 61812f48c2..25fd2dee19 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> } else if (revs->diffopt.ita_invisible_in_index &&
> ce_intent_to_add(ce)) {
> diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> - the_hash_algo->empty_tree, 0,
> - ce->name, 0);
> + &null_oid, 0, ce->name, 0);
This (even if the preimage were correctly using empty_blob) is *so*
simple a change that it is very surprising that the new test in
[2/2] passes without any other code change.
It means that difftool was written correctly in the first place,
right?
Nicely done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Thu, 25 Jun 2020, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > diff --git a/diff-lib.c b/diff-lib.c
> > index 61812f48c2..25fd2dee19 100644
> > --- a/diff-lib.c
> > +++ b/diff-lib.c
> > @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
> > } else if (revs->diffopt.ita_invisible_in_index &&
> > ce_intent_to_add(ce)) {
> > diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> > - the_hash_algo->empty_tree, 0,
> > - ce->name, 0);
> > + &null_oid, 0, ce->name, 0);
>
> This (even if the preimage were correctly using empty_blob) is *so*
> simple a change that it is very surprising that the new test in
> [2/2] passes without any other code change.
>
> It means that difftool was written correctly in the first place,
> right?
Well, it means that difftool does not even consume the OID. Sure, it
parses it, but then it ignores it. All it _really_ is interested in is
that status flag (`A`), so technically, my fix in 1/2 is not even needed
after `sk/diff-files-show-i-t-a-as-new` to let 2/2 pass.
Ciao,
Dscho
>
> Nicely done.
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> On Thu, 25 Jun 2020, Junio C Hamano wrote:
>
>> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
>> writes:
>>
>> > diff --git a/diff-lib.c b/diff-lib.c
>> > index 61812f48c2..25fd2dee19 100644
>> > --- a/diff-lib.c
>> > +++ b/diff-lib.c
>> > @@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>> > } else if (revs->diffopt.ita_invisible_in_index &&
>> > ce_intent_to_add(ce)) {
>> > diff_addremove(&revs->diffopt, '+', ce->ce_mode,
>> > - the_hash_algo->empty_tree, 0,
>> > - ce->name, 0);
>> > + &null_oid, 0, ce->name, 0);
>>
>> This (even if the preimage were correctly using empty_blob) is *so*
>> simple a change that it is very surprising that the new test in
>> [2/2] passes without any other code change.
>>
>> It means that difftool was written correctly in the first place,
>> right?
>
> Well, it means that difftool does not even consume the OID. Sure, it
> parses it, but then it ignores it. All it _really_ is interested in is
> that status flag (`A`),
Ah, that's an ultimately defensive code. No matter what is on the
right hand side of the raw output, as long as it knows that the
right hand side is a file on the working tree, it can safely ignore
the object name and directly go to the working tree file.
Nice ;-)
> so technically, my fix in 1/2 is not even needed
> after `sk/diff-files-show-i-t-a-as-new` to let 2/2 pass.
Sure, but I think it is the right thing to do nevertheless. Giving
the object name for an empty blob would be wrong unless the working
tree file is known to be empty (and empty tree? what were we even
thinking, gee...).
@@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' ' | |||
test_cmp expect actual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In https://github.com/git-for-windows/git/issues/2677, a `git difftool
> -d` problem was reported. The underlying cause was a bug in `git
> diff-files --raw` that we just fixed.
That leaves a gap between "there is some unspecified problem" and
"the problem was cased by such and such" that forces readers to
either know the problem at heart (may apply to you and me right now,
but I am not sure about me 3 months in the future) or go check the
external website.
Can we fill the gap by saying how seeing the object name of empty
blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
Thanks.
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> t/t7800-difftool.sh | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
> index 29b92907e2..524f30f7dc 100755
> --- a/t/t7800-difftool.sh
> +++ b/t/t7800-difftool.sh
> @@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
> test_cmp expect actual
> '
>
> +test_expect_success 'add -N and difftool -d' '
> + test_when_finished git reset --hard &&
> +
> + test_write_lines A B C >intent-to-add &&
> + git add -N intent-to-add &&
> + git difftool --dir-diff --extcmd ls
> +'
> +
> test_expect_success 'outside worktree' '
> echo 1 >1 &&
> echo 2 >2 &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Thu, 25 Jun 2020, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In https://github.com/git-for-windows/git/issues/2677, a `git difftool
> > -d` problem was reported. The underlying cause was a bug in `git
> > diff-files --raw` that we just fixed.
>
> That leaves a gap between "there is some unspecified problem" and
> "the problem was cased by such and such" that forces readers to
> either know the problem at heart (may apply to you and me right now,
> but I am not sure about me 3 months in the future) or go check the
> external website.
>
> Can we fill the gap by saying how seeing the object name of empty
> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
Sorry about catching this only now, after the commit hit `next`.
Filling the gap is a slightly more complicated.
And now that I looked at the code again, to make sure that I don't say
anything stupid, I realize that I just provided incorrect information in
my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
t7800 with this here patch.
Your guess was almost spot on: the empty blob would have worked (as in:
not caused an error, but it would have shown incorrect information). The
problem really is the attempt trying to read the empty tree as if it was a
blob. That results in something like this:
error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
error: could not write 'intent-to-add'
And yes, it would have been good to adjust the commit message as you
suggested. Sorry for not getting to it in time before it hit `next`.
Do you want me to send out a v5 and drop v4 from `next` in favor of the
new iteration?
Ciao,
Dscho
>
> Thanks.
>
>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> > t/t7800-difftool.sh | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
> > index 29b92907e2..524f30f7dc 100755
> > --- a/t/t7800-difftool.sh
> > +++ b/t/t7800-difftool.sh
> > @@ -720,6 +720,14 @@ test_expect_success SYMLINKS 'difftool --dir-diff handles modified symlinks' '
> > test_cmp expect actual
> > '
> >
> > +test_expect_success 'add -N and difftool -d' '
> > + test_when_finished git reset --hard &&
> > +
> > + test_write_lines A B C >intent-to-add &&
> > + git add -N intent-to-add &&
> > + git difftool --dir-diff --extcmd ls
> > +'
> > +
> > test_expect_success 'outside worktree' '
> > echo 1 >1 &&
> > echo 2 >2 &&
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> Can we fill the gap by saying how seeing the object name of empty
>> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
>
> Sorry about catching this only now, after the commit hit `next`.
>
> Filling the gap is a slightly more complicated.
>
> And now that I looked at the code again, to make sure that I don't say
> anything stupid, I realize that I just provided incorrect information in
> my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
> t7800 with this here patch.
>
> Your guess was almost spot on: the empty blob would have worked (as in:
> not caused an error, but it would have shown incorrect information). The
> problem really is the attempt trying to read the empty tree as if it was a
> blob. That results in something like this:
>
> error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
> error: could not write 'intent-to-add'
>
> And yes, it would have been good to adjust the commit message as you
> suggested. Sorry for not getting to it in time before it hit `next`.
>
> Do you want me to send out a v5 and drop v4 from `next` in favor of the
> new iteration?
That would help the future "us" quite a lot. Thanks for carefully
thinking it through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Wed, 1 Jul 2020, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> Can we fill the gap by saying how seeing the object name of empty
> >> blob (or worse, tree) instead of 0{40} made "difftool -d" upset?
> >
> > Sorry about catching this only now, after the commit hit `next`.
> >
> > Filling the gap is a slightly more complicated.
> >
> > And now that I looked at the code again, to make sure that I don't say
> > anything stupid, I realize that I just provided incorrect information in
> > my reply elsewhere in this thread: Srinidhi's fix is _not_ enough to fix
> > t7800 with this here patch.
> >
> > Your guess was almost spot on: the empty blob would have worked (as in:
> > not caused an error, but it would have shown incorrect information). The
> > problem really is the attempt trying to read the empty tree as if it was a
> > blob. That results in something like this:
> >
> > error: unable to read sha1 file of /tmp/git-difftool.O8CoK9/right/intent-to-add (4b825dc642cb6eb9a060e54bf8d69288fbee4904)
> > error: could not write 'intent-to-add'
> >
> > And yes, it would have been good to adjust the commit message as you
> > suggested. Sorry for not getting to it in time before it hit `next`.
> >
> > Do you want me to send out a v5 and drop v4 from `next` in favor of the
> > new iteration?
>
> That would help the future "us" quite a lot. Thanks for carefully
> thinking it through.
You're welcome. Here goes v5:
https://lore.kernel.org/git/pull.654.v5.git.1593638347.gitgitgadget@gmail.com/
Ciao,
Dscho
@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
} else if (revs->diffopt.ita_invisible_in_index && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The documented behavior of `git diff-files --raw` is to display
>
> [...] 0{40} if creation, unmerged or "look at work tree".
"on the right hand (i.e. postimage) side" is missing, which is
crucial in this description.
> This happens for example when showing modified, unstaged files.
Modified I would understand. We notice that the contents on the
work tree is different from what is in the index, and we tell the
consumer "look at work tree". I do not think you meant "unstaged",
though. If truly removed from the index, diff-files won't even know
about the path. You probably meant "removed from the working tree",
but 0{40} in that case means totally different thing (i.e. it would
be a (D)eletion record, so 0{40} on the postimage side is a filler,
not even "look at work tree"). What would make more sense to
describe might be
This happens for paths that are modified, or unmodified but
stat-dirty.
Either case, we tell the consumer to check the "work tree".
> For intent-to-add files, we used to show the empty blob's hash instead.
> In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> 2017-05-30), we made that worse by inadvertently changing that to the
> hash of the empty tree.
>
> Let's make the behavior consistent with modified files by showing
> all-zero values also for intent-to-add files.
Well described.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Thu, 25 Jun 2020, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > The documented behavior of `git diff-files --raw` is to display
> >
> > [...] 0{40} if creation, unmerged or "look at work tree".
>
> "on the right hand (i.e. postimage) side" is missing, which is
> crucial in this description.
True. Sorry for omitting that.
> > This happens for example when showing modified, unstaged files.
>
> Modified I would understand. We notice that the contents on the
> work tree is different from what is in the index, and we tell the
> consumer "look at work tree". I do not think you meant "unstaged",
Right. I probably should have written "[...] when showing files with
unstaged modifications".
> though. If truly removed from the index, diff-files won't even know
> about the path. You probably meant "removed from the working tree",
> but 0{40} in that case means totally different thing (i.e. it would
> be a (D)eletion record, so 0{40} on the postimage side is a filler,
> not even "look at work tree"). What would make more sense to
> describe might be
>
> This happens for paths that are modified, or unmodified but
> stat-dirty.
Yes, but that includes more than I wanted to describe because the
stat-dirty does not matter for intent-to-add files, and I wanted to point
out the analogy between unstaged modifications and intent-to-add-files.
I noticed that you merged this commit into `next` already, so I am
assuming that you do not want me to send a fifth iteration ;-)
Ciao,
Dscho
> Either case, we tell the consumer to check the "work tree".
>
> > For intent-to-add files, we used to show the empty blob's hash instead.
> > In c26022ea8f5 (diff: convert diff_addremove to struct object_id,
> > 2017-05-30), we made that worse by inadvertently changing that to the
> > hash of the empty tree.
> >
> > Let's make the behavior consistent with modified files by showing
> > all-zero values also for intent-to-add files.
>
> Well described.
>
> Thanks.
>
This patch series was integrated into seen via git@fea4f28. |
@@ -220,8 +220,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option) | |||
} else if (revs->diffopt.ita_invisible_in_index && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Srinidhi Kaushik wrote (reply to this):
Sorry for the late reply.
> Let's make the behavior consistent with modified files by showing
> all-zero values also for intent-to-add files.
>
-- >8 --
>
> diff_addremove(&revs->diffopt, '+', ce->ce_mode,
> - the_hash_algo->empty_tree, 0,
> - ce->name, 0);
> + &null_oid, 0, ce->name, 0);
> continue;
> }
>
> diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
> index 8a5d55054f..cf0175ad6e 100755
> --- a/t/t2203-add-intent.sh
> +++ b/t/t2203-add-intent.sh
> @@ -240,7 +240,6 @@ test_expect_success 'i-t-a files shown as new for "diff", "diff-files"; not-new
>
> hash_e=$(git hash-object empty) &&
> hash_n=$(git hash-object not-empty) &&
> - hash_t=$(git hash-object -t tree /dev/null) &&
>
> cat >expect.diff_p <<-EOF &&
> diff --git a/empty b/empty
> @@ -259,8 +258,8 @@ test_expect_success 'i-t-a files shown as new for "diff", "diff-files"; not-new
> create mode 100644 not-empty
> EOF
> cat >expect.diff_a <<-EOF &&
> - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")empty
> - :000000 100644 0000000 $(git rev-parse --short $hash_t) A$(printf "\t")not-empty
> + :000000 100644 0000000 0000000 A$(printf "\t")empty
> + :000000 100644 0000000 0000000 A$(printf "\t")not-empty
This change (and the new test in [PATCH v4 2/2]) looks good to me.
I learnt quite a bit about what the string "0{40}" means in the context
of `diff' post-image from the conversations around this patch. It was
very helpful.
Thank you.
This patch series was integrated into seen via git@30064d9. |
This patch series was integrated into next via git@a9fcdab. |
The documented behavior of `git diff-files --raw` is to display [...] 0{40} if creation, unmerged or "look at work tree". on the right hand (i.e. postimage) side. This happens for files that have unstaged modifications, and for files that are unmodified but stat-dirty. For intent-to-add files, we used to show the empty blob's hash instead. In c26022e (diff: convert diff_addremove to struct object_id, 2017-05-30), we made that worse by inadvertently changing that to the hash of the empty tree. Let's make the behavior consistent with files that have unstaged modifications (which applies to intent-to-add files, too) by showing all-zero values also for intent-to-add files. Accordingly, this patch adjusts the expectations set by the regression test introduced in feea694 (diff-files: treat "i-t-a" files as "not-in-index", 2020-06-20). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In git-for-windows#2677, a `git difftool -d` problem was reported. The underlying cause was a bug in `git diff-files --raw` that we just fixed: it reported intent-to-add files with the empty _tree_ as the post-image OID, when we need to show an all-zero (or, "null") OID instead, to indicate to the caller that they have to look at the worktree file. The symptom of that problem shown by `git difftool` was this: error: unable to read sha1 file of <path> (<empty-tree-OID>) error: could not write '<filename>' Make sure that the reported `difftool` problem stays fixed. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
/submit |
Submitted as pull.654.v5.git.1593638347.gitgitgadget@gmail.com |
This patch series was integrated into seen via git@6387277. |
This patch series was integrated into next via git@cf5d186. |
This patch series was integrated into seen via git@0ac0947. |
This patch series was integrated into next via git@0ac0947. |
This patch series was integrated into master via git@0ac0947. |
Closed via 0ac0947. |
This problem was reported in git-for-windows#2677, but the problem actually lies with
git diff --raw
, and it seems that the bug has been with us ever since--intent-to-add
was introduced.Changes since v4:
Changes since v3:
Instead of showing the same OID in
git diff-files --raw
as ingit diff-files -p
for intent-to-add files, we now imitate the logic for modified (non-intent-to-add) files: we show the "null" OID (i.e. all-zero).As a consequence, we no longer just undo the inadvertent change where intent-to-add files were reported with the empty tree instead of the empty blob, but we fix the bug that has been with us since
run_diff_files()
was taught about intent-to-add files, i.e. we fix the original bug of 425a28e (diff-lib: allow ita entries treated as "not yet exist in index", 2016-10-24), at long last.Changes since v2:
hash_t
in t2203.Changes since v1:
sk/diff-files-show-i-t-a-as-new
.sk/diff-files-show-i-t-a-as-new
does not completely resolve the issue (the--raw
output still claims the empty blob as the post-image, although thedifftool
symptom "went away").sk/diff-files-show-i-t-a-as-new
: it expected the raw diff to contain the hash of the empty tree object (which is incorrect no matter how you turn it: any hash in any raw diff should refer to blob objects).Cc: Srinidhi Kaushik shrinidhi.kaushik@gmail.com