Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display issue in commit diff #14231

Closed
fnetX opened this issue Jan 3, 2021 · 10 comments · Fixed by #14678
Closed

Display issue in commit diff #14231

fnetX opened this issue Jan 3, 2021 · 10 comments · Fixed by #14678
Labels
Milestone

Comments

@fnetX
Copy link
Contributor

fnetX commented Jan 3, 2021

Description

First reported on https://codeberg.org/Codeberg/Community/issues/374

I found a display issue in this commit diff view: https://codeberg.org/eventbike/vturn_server/commit/8d672a398f22ea465edd42a6d9de7cd2e88c7ab5 (main.v, line 27)

Displayed line:
run(dban class="o">)
Actual file content:
run(db)

HTML code at this is point:

<td class="lines-code lines-code-new halfwidth add-code"><span class="mono wrap">
<span class="n">run</span><span class="added-code"><span class="o">(
</span><span class="n">d</span>b</span>
<sp<span class="added-code">an class="o</sp<span></span>"&gt;)</td>

Screenshots

image

Thank you a lot.

@6543 6543 added the type/bug label Jan 3, 2021
@bagasme
Copy link
Contributor

bagasme commented Jan 5, 2021

@fnetX i see that for your case as noted by your screenshot, the wrong color for brackets at run().

@fnetX
Copy link
Contributor Author

fnetX commented Jan 5, 2021

You mean the brackets have the wrong colour? Yes, but that's because the whole HTML is messed up here.

@zeripath
Copy link
Contributor

zeripath commented Jan 5, 2021

diff --git a/services/gitdiff/gitdiff_test.go b/services/gitdiff/gitdiff_test.go
index cd7b2273c..813260f3b 100644
--- a/services/gitdiff/gitdiff_test.go
+++ b/services/gitdiff/gitdiff_test.go
@@ -15,6 +15,7 @@ import (
 
 	"code.gitea.io/gitea/models"
 	"code.gitea.io/gitea/modules/git"
+	"code.gitea.io/gitea/modules/highlight"
 	"code.gitea.io/gitea/modules/setting"
 	dmp "github.com/sergi/go-diff/diffmatchpatch"
 	"github.com/stretchr/testify/assert"
@@ -101,6 +102,15 @@ func TestDiffToHTML(t *testing.T) {
 	}, DiffLineAdd))
 }
 
+func TestDiffToHTML_14231(t *testing.T) {
+	setting.Cfg = ini.Empty()
+
+	diffRecord := diffMatchPatch.DiffMain(highlight.Code("main.v", "		run()\n"), highlight.Code("main.v", "		run(db)\n"), true)
+	diffRecord = diffMatchPatch.DiffCleanupEfficiency(diffRecord)
+	assertEqual(t, `		<span class="n">run</span><span class="added-code"><span class="o">(</span><span class="n">db</span><span class="o">)</span></span>`,
+		diffToHTML("main.v", diffRecord, DiffLineAdd))
+}
+
 func TestParsePatch_singlefile(t *testing.T) {
 	type testcase struct {
 		name        string

will add a testcase to work with

@zeripath
Copy link
Contributor

zeripath commented Jan 6, 2021

the diffRecord is:

 [
    {Equal 		<span class="n">run</span><span class="}
    {Insert o">(</span><span class="n">d}
    {Equal b}
    {Insert </span><s}
    {Equal p}
    {Insert an class="o}
    {Equal ">}
    {Delete (} {Equal )</span>}
]

@zeripath
Copy link
Contributor

zeripath commented Jan 6, 2021

I honestly think the regexp approach cannot work here.

Record 2 here is an attribute in deleted case and is text in added case.

@zeripath
Copy link
Contributor

zeripath commented Jan 6, 2021

In my previous reverted pr I attempted to do the kind of necessary fixup.

Now, we have several options:

  1. Abandon highlighting of diffs.
  2. Do the kind of fixup I initially suggested - but correctly.
  3. Tokenize the output of chroma highlight and use a token aware diff or block diff. We'd need to copy the code from the current diff library and adjust it to make the work in tokens.
  4. Improve the regexp but add some kind of fixup.

@zeripath
Copy link
Contributor

zeripath commented Jan 9, 2021

thinking on here - I think the fixup technique is the correct answer.

Intention of the proposed algo - No Equals/Insert/Delete block starts or ends within an element, entity or even within an unclosed span. (EDIT: this last case needs a bit more thought)

  1. If an equals tag and normal state ends with an incomplete element or entity - split off the incomplete bit and add it to insert & delete stacks. Set state for both insert/delete to in entity in element, in element, in entity or normal, and move to next tag.
  2. If an insert tag - prepend insert stack and clear stack. Check end of tag and split off incomplete bit adding it to insert stack. Set state for insert to in entity in element, in element, in entity or normal, and move to next tag.
  3. If a delete tag - prepend delete stack and clear stack. Check end of tag and split off incomplete bit adding it to delete stack. Set state for delete to in entity in element, in element, in entity or normal, and move to next tag.
  4. If an equals tag (not normal state) then this has to be removed at least until the end of element/entity and its content added to both the insert and delete stack. If there is a complete element/entity then insert and delete tags should be inserted before the Equal. Do the rest as per the end of the stack.
  5. Merge insert/delete tags.

I think that would do it.

@6543 6543 added this to the 1.14.0 milestone Jan 28, 2021
This was referenced Feb 11, 2021
zeripath added a commit to zeripath/gitea that referenced this issue Feb 14, 2021
Gitea runs diff on highlighted code fragment for each line in order to provide
code highlight diffs. Unfortunately this diff algorithm is not aware that span tags
and entities are atomic and cannot be split.

The current fixup code makes some attempt to fix these broken tags however, it cannot
handle situations where a tag is split over multiple blocks.

This PR provides a more algorithmic fixup mechanism whereby spans and entities are
completely coalesced into their respective blocks.

This may result in a incompletely reduced diff but - it will definitely prevent the
broken entities and spans that are currently possible.

As a result of this fixup several inconsistencies were discovered in our testcases
and these were also fixed.

Fix go-gitea#14231

Signed-off-by: Andrew Thornton <art27@cantab.net>
6543 pushed a commit that referenced this issue Feb 14, 2021
Gitea runs diff on highlighted code fragment for each line in order to provide
code highlight diffs. Unfortunately this diff algorithm is not aware that span tags
and entities are atomic and cannot be split.

The current fixup code makes some attempt to fix these broken tags however, it cannot
handle situations where a tag is split over multiple blocks.

This PR provides a more algorithmic fixup mechanism whereby spans and entities are
completely coalesced into their respective blocks.

This may result in a incompletely reduced diff but - it will definitely prevent the
broken entities and spans that are currently possible.

As a result of this fixup several inconsistencies were discovered in our testcases
and these were also fixed.

Fix #14231

Signed-off-by: Andrew Thornton <art27@cantab.net>
@fnetX
Copy link
Contributor Author

fnetX commented Feb 14, 2021

@zeripath Thank you!

zeripath added a commit to zeripath/gitea that referenced this issue Feb 14, 2021
Backport go-gitea#14678

Gitea runs diff on highlighted code fragment for each line in order to
provide code highlight diffs. Unfortunately this diff algorithm is not
aware that span tags and entities are atomic and cannot be split.

The current fixup code makes some attempt to fix these broken tags
however, it cannot handle situations where a tag is split over multiple
blocks.

This PR provides a more algorithmic fixup mechanism whereby spans and
entities are completely coalesced into their respective blocks.

This may result in a incompletely reduced diff but - it will definitely
prevent the broken entities and spans that are currently possible.

As a result of this fixup several inconsistencies were discovered in our
testcases and these were also fixed.

Fix go-gitea#14231

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor

@fnetX Sorry it's taken so long. It was unfortunately a non-trivial bug.

@zeripath
Copy link
Contributor

I should also add I was significantly delayed by breaking my left hand over the Christmas period.

6543 added a commit that referenced this issue Feb 14, 2021
Backport #14678

Gitea runs diff on highlighted code fragment for each line in order to
provide code highlight diffs. Unfortunately this diff algorithm is not
aware that span tags and entities are atomic and cannot be split.

The current fixup code makes some attempt to fix these broken tags
however, it cannot handle situations where a tag is split over multiple
blocks.

This PR provides a more algorithmic fixup mechanism whereby spans and
entities are completely coalesced into their respective blocks.

This may result in a incompletely reduced diff but - it will definitely
prevent the broken entities and spans that are currently possible.

As a result of this fixup several inconsistencies were discovered in our
testcases and these were also fixed.

Fix #14231

Signed-off-by: Andrew Thornton <art27@cantab.net>

Co-authored-by: 6543 <6543@obermui.de>
@go-gitea go-gitea locked and limited conversation to collaborators May 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants