Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api-minor] Fix issues in text selection #13424

Merged
merged 1 commit into from Oct 18, 2021
Merged

Conversation

calixteman
Copy link
Contributor

@calixteman calixteman commented May 23, 2021

  • PR [api-minor] Fix the way to chunk the strings #13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
  • the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
    • no space are "drawn": it just moves the cursor but they aren't added in the chunk;
    • so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
  • to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
    • it was a pretty good idea in general but it fails with some fonts where space was too big:
    • in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).

Fixes #6705
Fixes #7833
Fixes #9247
Fixes #9998
Fixes #10448
Fixes #10640
Fixes #10900
Fixes #11913
Fixes #13201
Fixes #13226

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/ba7e5367fe2761d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 1

Live output at: http://3.101.106.178:8877/ff49f0a56667729/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/ba7e5367fe2761d/output.txt

Total script time: 25.83 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/ba7e5367fe2761d/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/ff49f0a56667729/output.txt

Total script time: 30.32 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/ff49f0a56667729/reftest-analyzer.html#web=eq.log

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this patch causes rendering regressions in eq-tests, and those tests should definitely not be changing for a text-selection only patch (see a possibly incomplete list below). My guess is that the src/core/fonts.js changes are not generally appropriate and/or correct here.

  • bug921760
  • issue9949
  • issue6127
  • font_ascent_descent (slight movement)
  • issue6737 (slight movement)

There's also some text-tests which looks slightly worse than before, some examples include:

  • tracemonkey-text-page14
  • issue11713
  • issue6019
  • issue5972
  • issue3925

src/core/fonts.js Outdated Show resolved Hide resolved
@calixteman calixteman force-pushed the chunks2 branch 2 times, most recently from fe968b1 to a008b8a Compare May 29, 2021 17:42
@calixteman
Copy link
Contributor Author

Unfortunately this patch causes rendering regressions in eq-tests, and those tests should definitely not be changing for a text-selection only patch (see a possibly incomplete list below). My guess is that the src/core/fonts.js changes are not generally appropriate and/or correct here.

* bug921760

* issue9949

* issue6127

* font_ascent_descent (slight movement)

* issue6737 (slight movement)

They should be fixed now (I reverted changes in fonts.js).

There's also some text-tests which looks slightly worse than before, some examples include:

* tracemonkey-text-page14

There's slightly less spans so it implies a shift between canvas and text layer.

* issue11713

There no font change between sigmas but scale factor changes so chunks are not flushed and glyphes with different scale factor can be merged.

* issue6019

I handled this in using the rotation angle from document: it's ok now.

* issue5972

I'm not sure to know if it's worth or better.
Anyway some chunks are flushed because of https://github.com/mozilla/pdf.js/pull/13424/files#diff-cf40014d9c7d352f3ae212e7fffe5386decf96a10e53066a4d900c3eb8fc559eR2339.
We can change the factor in HALF_LAST_CHAR (and the name too) to have all the chars in the same chunk, what do you think ?

* issue3925

I fixed the big space things but some shift are due to less spans.

src/core/fonts.js Outdated Show resolved Hide resolved
src/core/evaluator.js Show resolved Hide resolved
@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Oct 3, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/94d0bf5ff830e90/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 3, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/74a22de94fb2026/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 3, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/94d0bf5ff830e90/output.txt

Total script time: 22.36 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 112
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/94d0bf5ff830e90/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Oct 3, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/74a22de94fb2026/output.txt

Total script time: 41.21 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 105
  different first/second rendering: 1

Image differences available at: http://54.193.163.58:8877/74a22de94fb2026/reftest-analyzer.html#web=eq.log

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the test results, I do believe that there's too much movement (and some outright regressions) in existing test-cases. The following is a non-exhaustive list (focusing on Firefox), so please go through the test results and see what can be done to improve (some of) these cases:

  • tracemonkey-text-page1, one case of worse agreement between the canvas/textLayer.
  • issue3925-page1, many cases of disagreement between the canvas/textLayer.
  • taro-text-page1..4, worse agreement between the canvas/textLayer for the vertical text.
  • issue11713-page1, the sizing looks off for the last glyph.
  • issue1127-text-page1, there's both improvements and regressions,
  • issue4684-text-page1, lots of separate spaces being inserted.
  • issue5896-text-page1, another possible issue with space handling (based on the reference images).
  • mao-text-page1, slightly worse agreement between the canvas/textLayer in places.
  • issue6605-page1, all spaces seem to be missing in the textLayer (causing bad alignment).

src/core/fonts.js Outdated Show resolved Hide resolved
src/core/evaluator.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not really sure what to make of the test results, since regardless of the MAX_FACTOR value there's going to be some regressions.
/cc @brendandahl What's your opinion here, what sort of movement (in existing test-cases) are we willing to trade for a handful of fixed issues?

src/display/text_layer.js Outdated Show resolved Hide resolved
src/display/text_layer.js Outdated Show resolved Hide resolved
@brendandahl
Copy link
Contributor

Calixte mentioned .5 seemed to be the best compromise, so I looked through those ones. As mentioned above there are few regressions but there also some improvements (issue6019-text-page1, taro-text-page1). IIRC the PDF in issue3925-page1 is a really weird one, so I'm not too worried about regressing it.

One I'm a bit concerned about is mao-text-page1, we have very few CJK text layer tests. Is there something we should do differently for CJK fonts?

If we can fix mao, I'd be fine with taking a few minor regressions for all the copy/paste bugs this fixes.

  - PR mozilla#13257 fixed a lot of issues but not all and this patch aims to fix almost all remaining issues.
  - the idea in this new patch is to compare position of new glyph with the last position where a glyph has been drawn;
    - no space are "drawn": it just moves the cursor but they aren't added in the chunk;
    - so this way a space followed by a cursor move can be treated as only one space: it helps to merge all spaces into one.
  - to make difference between real spaces and tracking ones, we used a factor of the space width (from the font)
    - it was a pretty good idea in general but it fails with some fonts where space was too big:
    - in Poppler, they're using a factor of the font size: this is an excellent idea (<= 0.1 * fontSize implies tracking space).
@calixteman
Copy link
Contributor Author

I set the MAX_FACTOR to 0.6 and fixed few issues in mao test.
/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/6b6d60f532e2ded/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/d0283e3b1279d8d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/6b6d60f532e2ded/output.txt

Total script time: 20.51 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  errors: 654
  different ref/snapshot: 90

Image differences available at: http://54.241.84.105:8877/6b6d60f532e2ded/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/fb791b216583a90/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/d0283e3b1279d8d/output.txt

Total script time: 40.88 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 114
  different first/second rendering: 1

Image differences available at: http://54.193.163.58:8877/d0283e3b1279d8d/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/fb791b216583a90/output.txt

Total script time: 22.53 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 112

Image differences available at: http://54.241.84.105:8877/fb791b216583a90/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/2c304410c0dd850/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/2c304410c0dd850/output.txt

Total script time: 4.44 mins

Published

@calixteman calixteman merged commit bbb6436 into mozilla:master Oct 18, 2021
@calixteman
Copy link
Contributor Author

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/1e5d6828893ccdb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @calixteman received. Current queue size: 1

Live output at: http://54.193.163.58:8877/024a1819bef3382/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/1e5d6828893ccdb/output.txt

Total script time: 20.36 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/024a1819bef3382/output.txt

Total script time: 38.20 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment