Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix yet another bug with diff file names #12771

Merged

Conversation

zeripath
Copy link
Contributor

@zeripath zeripath commented Sep 8, 2020

Following further testing it has become apparent that the diff line
cannot be used to determine filenames for diffs with any sort of predictability
the answer therefore is to use the other lines that are provided with a diff

Fix #12768
Closes #12769
Replaces #12769

Signed-off-by: Andrew Thornton art27@cantab.net

Following further testing it has become apparent that the diff line
cannot be used to determine filenames for diffs with any sort of predictability
the answer therefore is to use the other lines that are provided with a diff

Fix go-gitea#12768

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath zeripath added this to the 1.13.0 milestone Sep 8, 2020
@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2020

Codecov Report

Merging #12771 into master will increase coverage by 0.02%.
The diff coverage is 69.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12771      +/-   ##
==========================================
+ Coverage   43.21%   43.24%   +0.02%     
==========================================
  Files         649      649              
  Lines       72000    72046      +46     
==========================================
+ Hits        31117    31156      +39     
- Misses      35839    35844       +5     
- Partials     5044     5046       +2     
Impacted Files Coverage Δ
services/gitdiff/gitdiff.go 72.85% <69.41%> (-1.70%) ⬇️
modules/templates/helper.go 47.04% <100.00%> (ø)
modules/process/manager.go 72.50% <0.00%> (-2.50%) ⬇️
models/gpg_key.go 54.81% <0.00%> (-0.59%) ⬇️
services/pull/pull.go 41.57% <0.00%> (-0.47%) ⬇️
models/issue_comment.go 53.75% <0.00%> (-0.16%) ⬇️
modules/git/commit.go 54.72% <0.00%> (+0.67%) ⬆️
modules/log/event.go 59.43% <0.00%> (+1.88%) ⬆️
modules/queue/workerpool.go 60.81% <0.00%> (+2.04%) ⬆️
modules/git/tree_blob.go 88.23% <0.00%> (+2.94%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fbc50f...d8b9d7c. Read the comment docs.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Sep 8, 2020
@fashberg
Copy link
Contributor

fashberg commented Sep 8, 2020

@zeripath

zeripath/gitea/tree/fix-12768-I-hate-our-diff-infrastructure 😁

Haha, yes, i was very surprised about the buggy function of getting the filenames from diff, one of the most basic functionality of a git-gui.

Your current approach look great.

You must not forget to trim the tab-character at the end of the filename in the +++/--- lines, it results currently in broken links at "View file" .

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Sep 8, 2020
@lunny
Copy link
Member

lunny commented Sep 9, 2020

@zeripath Could you add some tests from #12768?

@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

@zeripath

You must not forget to trim the tab-character at the end of the filename in the +++/--- lines, it results currently in broken links at "View file" .

What do you mean tab character?

$ git diff master | grep '^+++' | xxd 

00000000: 2b2b 2b20 622f 6d6f 6475 6c65 732f 7465  +++ b/modules/te
00000010: 6d70 6c61 7465 732f 6865 6c70 6572 2e67  mplates/helper.g
00000020: 6f0a 2b2b 2b20 622f 7365 7276 6963 6573  o.+++ b/services
00000030: 2f67 6974 6469 6666 2f67 6974 6469 6666  /gitdiff/gitdiff
00000040: 2e67 6f0a                                .go.

There is no tab character.

@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

oh that's interesting...

a file with spacing gets \t at the end of the line.

WHY GIT WHY?!

@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

$ git diff -M 25d51d73c7 25d51d73c7'^' | xxd                        [08:46:01]
00000000: 6469 6666 202d 2d67 6974 2061 2f66 696c  diff --git a/fil
00000010: 6520 7769 7468 2062 6c61 6e6b 7320 622f  e with blanks b/
00000020: 6669 6c65 2077 6974 6820 626c 616e 6b73  file with blanks
00000030: 0a64 656c 6574 6564 2066 696c 6520 6d6f  .deleted file mo
00000040: 6465 2031 3030 3634 340a 696e 6465 7820  de 100644.index 
00000050: 3839 3836 3531 612e 2e30 3030 3030 3030  898651a..0000000
00000060: 0a2d 2d2d 2061 2f66 696c 6520 7769 7468  .--- a/file with
00000070: 2062 6c61 6e6b 7309 0a2b 2b2b 202f 6465   blanks..+++ /de
00000080: 762f 6e75 6c6c 0a40 4020 2d31 2c35 202b  v/null.@@ -1,5 +
00000090: 302c 3020 4040 0a2d 6120 626c 616e 6b20  0,0 @@.-a blank 
000000a0: 6669 6c65 0a2d 0a2d 6861 7320 6120 636f  file.-.-has a co
000000b0: 7570 6c65 206f 206c 696e 650a 2d0a 2d74  uple o line.-.-t
000000c0: 6865 2035 7468 206c 696e 6520 6973 2074  he 5th line is t
000000d0: 6865 206c 6173 740a                      he last.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@fashberg
Copy link
Contributor

fashberg commented Sep 9, 2020

Hi @zeripath

GNU itself says: blanks are not supported: https://www.gnu.org/software/diffutils/manual/html_node/Unusual-File-Names.html#Unusual%20File%20Names

They append a timestamp after the filename: https://www.gnu.org/software/diffutils/manual/html_node/Example-Unified.html#Example-Unified - but this cannot be seen at GIT

GIT adds a 'tab' character after the ---/+++ filename, if it contains a blank

Take a look: https://github.com/git/git/blob/master/diff.c#L1504

	case DIFF_SYMBOL_FILEPAIR_MINUS:
		meta = diff_get_color_opt(o, DIFF_METAINFO);
		reset = diff_get_color_opt(o, DIFF_RESET);
		fprintf(o->file, "%s%s--- %s%s%s\n", diff_line_prefix(o), meta,
			line, reset,
			strchr(line, ' ') ? "\t" : "");
		break;

test:

cd testrepo 
date > 'testfile'
date > 'testfile A'
git add testfile*
git commit -m 'test"
git log -p -n 1 | grep "+++" | xxd
00000000: 2b2b 2b20 622f 7465 7374 6669 6c65 0a2b  +++ b/testfile.+
00000010: 2b2b 2062 2f74 6573 7466 696c 6520 4109  ++ b/testfile A.
00000020: 0a   

After the filename 'testfile A' there is a tab-character (0x09)

@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

yeah that's what I've found from looking at the git source code myself.

This whole section of code needs rewriting.

@fashberg
Copy link
Contributor

fashberg commented Sep 9, 2020

Hi @zeripath you were faster.

Maybe my better testing: fashberg@8c6231c

It iterates all variants and adds a timestamp after \t (timestamp has to be removed)

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

@lunny I've added several testcases to the ParsePatch test and renamed it singlefile and fixed a few more issues in the PR

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Sep 9, 2020
@zeripath
Copy link
Contributor Author

zeripath commented Sep 9, 2020

Hi @zeripath you were faster.

Maybe my better testing: fashberg@8c6231c

It iterates all variants and adds a timestamp after \t (timestamp has to be removed)

So the reason why I went for the if contains(name, " ") route was because it exactly reverses what git does:

			strchr(line, ' ') ? "\t" : "");

I didn't want to consider what would happen if say git decided that \t was not a reason to quote a filename.

In general I'm not really a fan of Split(...) because it creates (potentially n-1) strings unnecessarily, and if you really just want the 0th element you should just use Index and grab the preceding bit of the string i.e.:

index := strings.Index(foo, "\t")
if index > -1 {
  foo = foo[:index]
}

that way if the rest of foo contains mulitple \t you don't end up with n individual strings. (It's also worth realising that go actually copies the bytes arrays around for each of those new strings.)


As I've said above this whole section of code needs rewriting with proper reference to what git does and can do. It's highly inefficient, makes bad assumptions about sizes, copies this data around multiple times and the resulting AST it creates is rather large.

@zeripath zeripath merged commit 96969dd into go-gitea:master Sep 9, 2020
@zeripath zeripath deleted the fix-12768-I-hate-our-diff-infrastructure branch September 9, 2020 13:08
zeripath added a commit to zeripath/gitea that referenced this pull request Sep 9, 2020
Backport go-gitea#12771

Following further testing it has become apparent that the diff line
cannot be used to determine filenames for diffs with any sort of predictability
the answer therefore is to use the other lines that are provided with a diff

Fix go-gitea#12768

Signed-off-by: Andrew Thornton <art27@cantab.net>
zeripath added a commit that referenced this pull request Sep 9, 2020
Backport #12771

Following further testing it has become apparent that the diff line
cannot be used to determine filenames for diffs with any sort of predictability
the answer therefore is to use the other lines that are provided with a diff

Fix #12768

Signed-off-by: Andrew Thornton <art27@cantab.net>
@lafriks lafriks added the backport/done All backports for this PR have been created label Sep 9, 2020
@zeripath zeripath linked an issue Sep 11, 2020 that may be closed by this pull request
7 tasks
zeripath added a commit to zeripath/gitea that referenced this pull request Oct 13, 2020
go-gitea#12771 attempted to fix diff by avoiding the git diff line as
it is possible to have an ambiguous line here.

go-gitea#12254 attempted to fix diff by assuming that names would quoted
if they needed to be and if one was quoted then both would be.

Both of these were wrong.

I have now discovered `--src-prefix` and `--dst-prefix` which
means that we can set this in such a way to force the git diff
to always be unambiguous.

Therefore this PR rollsback most of the changes in go-gitea#12771 and
uses these options to fix this.

Signed-off-by: Andrew Thornton <art27@cantab.net>
lafriks added a commit that referenced this pull request Oct 14, 2020
* Finally fix diff names

#12771 attempted to fix diff by avoiding the git diff line as
it is possible to have an ambiguous line here.

#12254 attempted to fix diff by assuming that names would quoted
if they needed to be and if one was quoted then both would be.

Both of these were wrong.

I have now discovered `--src-prefix` and `--dst-prefix` which
means that we can set this in such a way to force the git diff
to always be unambiguous.

Therefore this PR rollsback most of the changes in #12771 and
uses these options to fix this.

Signed-off-by: Andrew Thornton <art27@cantab.net>

* Update services/gitdiff/gitdiff.go

* Update services/gitdiff/gitdiff.go

* Update modules/repofiles/temp_repo.go

* fix test

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: Lauris BH <lauris@nix.lv>
zeripath added a commit to zeripath/gitea that referenced this pull request Oct 14, 2020
Backport go-gitea#13136

it is possible to have an ambiguous line here.

if they needed to be and if one was quoted then both would be.

Both of these were wrong.

I have now discovered `--src-prefix` and `--dst-prefix` which
means that we can set this in such a way to force the git diff
to always be unambiguous.

Therefore this PR rollsback most of the changes in go-gitea#12771 and
uses these options to fix this.

Signed-off-by: Andrew Thornton <art27@cantab.net>
lafriks pushed a commit that referenced this pull request Oct 14, 2020
Backport #13136

it is possible to have an ambiguous line here.

if they needed to be and if one was quoted then both would be.

Both of these were wrong.

I have now discovered `--src-prefix` and `--dst-prefix` which
means that we can set this in such a way to force the git diff
to always be unambiguous.

Therefore this PR rollsback most of the changes in #12771 and
uses these options to fix this.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/bug
Projects
None yet
7 participants