Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifiy the way to compute baseline to have a better match between canvas and text layer #12896

Merged
merged 1 commit into from Feb 13, 2021

Conversation

calixteman
Copy link
Contributor

  • compute ascent as a ratio of font height
  • in case the font doesn't have ascent/descent/bbox info:
    • use TextMetrics.fontBoundingBoxAscent if available
    • use a basic heuristic to guess values in drawing char on a canvas

In showing text layer, before the patch:
Screenshot_2021-01-23 tracemonkey pdf

and after the patch:

image

src/core/evaluator.js Outdated Show resolved Hide resolved
src/display/text_layer.js Outdated Show resolved Hide resolved
src/display/text_layer.js Outdated Show resolved Hide resolved
src/display/text_layer.js Show resolved Hide resolved
src/display/text_layer.js Show resolved Hide resolved
src/display/text_layer.js Outdated Show resolved Hide resolved
@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/ab05c8e5c258f66/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 2

Live output at: http://3.101.106.178:8877/fcfc1981a9cbef0/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/ab05c8e5c258f66/output.txt

Total script time: 26.59 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/ab05c8e5c258f66/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

The Windows bot seems down again. Aside from that, while this seems to work nicely for the Tracemonkey file, it seems to regress quite a bit of other files unfortunately, so this needs some more work. Examples of noticeable movement are firefox-arabiccidtruetype-text-page1, firefox-issue4550-text-page2 and firefox-extgstate-text-page1.

@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.67.70.0:8877/cd302387c6f4e16/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://3.101.106.178:8877/38528dbb4f1a2b4/output.txt

@calixteman
Copy link
Contributor Author

So I reboot the windows bot.
The new version of the patch doesn't take anymore into account the ascent/descent of the font in the pdf to position the baseline of the fallback font in the text layer but use ones from the fallback font which makes sense finally.

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/cd302387c6f4e16/output.txt

Total script time: 26.63 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/cd302387c6f4e16/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/38528dbb4f1a2b4/output.txt

Total script time: 26.75 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/38528dbb4f1a2b4/reftest-analyzer.html#web=eq.log

@calixteman
Copy link
Contributor Author

I'm not sure to understand how to interpret some results of unit tests.
For example with firefox-taro-text-page1 (i.e. TaroUTR50SortedList112.pdf ), the text in text_layer is better positioned with patch locally but it isn't the case in reftest-analyzer where it seems to be a way better in the reference.
And the same for tracemonkey.pdf: it looks ok locally but neither in the test nor in the reference.

@Snuffleupagus
Copy link
Collaborator

[...] the text in text_layer is better positioned with patch locally but it isn't the case in reftest-analyzer where it seems to be a way better in the reference.

In theory the output of the text tests should agree with the development viewer, when it's opened with http://localhost:8888/web/viewer.html#textLayer=visible, so if that's not actually the case this suggests a pre-existing bug in our test-suite.

The relevant test-code is found in

pdf.js/test/driver.js

Lines 27 to 107 in 6249ef5

/**
* @class
*/
var rasterizeTextLayer = (function rasterizeTextLayerClosure() {
var SVG_NS = "http://www.w3.org/2000/svg";
var textLayerStylePromise = null;
function getTextLayerStyle() {
if (textLayerStylePromise) {
return textLayerStylePromise;
}
textLayerStylePromise = new Promise(function (resolve) {
var xhr = new XMLHttpRequest();
xhr.open("GET", "./text_layer_test.css");
xhr.onload = function () {
resolve(xhr.responseText);
};
xhr.send(null);
});
return textLayerStylePromise;
}
// eslint-disable-next-line no-shadow
function rasterizeTextLayer(
ctx,
viewport,
textContent,
enhanceTextSelection
) {
return new Promise(function (resolve, reject) {
// Building SVG with size of the viewport.
var svg = document.createElementNS(SVG_NS, "svg:svg");
svg.setAttribute("width", viewport.width + "px");
svg.setAttribute("height", viewport.height + "px");
// items are transformed to have 1px font size
svg.setAttribute("font-size", 1);
// Adding element to host our HTML (style + text layer div).
var foreignObject = document.createElementNS(SVG_NS, "svg:foreignObject");
foreignObject.setAttribute("x", "0");
foreignObject.setAttribute("y", "0");
foreignObject.setAttribute("width", viewport.width + "px");
foreignObject.setAttribute("height", viewport.height + "px");
var style = document.createElement("style");
var stylePromise = getTextLayerStyle();
foreignObject.appendChild(style);
var div = document.createElement("div");
div.className = "textLayer";
foreignObject.appendChild(div);
// Rendering text layer as HTML.
var task = pdfjsLib.renderTextLayer({
textContent,
container: div,
viewport,
enhanceTextSelection,
});
Promise.all([stylePromise, task.promise]).then(function (results) {
task.expandTextDivs(true);
style.textContent = results[0];
svg.appendChild(foreignObject);
// We need to have UTF-8 encoded XML.
var svg_xml = unescape(
encodeURIComponent(new XMLSerializer().serializeToString(svg))
);
var img = new Image();
img.src = "data:image/svg+xml;base64," + btoa(svg_xml);
img.onload = function () {
ctx.drawImage(img, 0, 0);
resolve();
};
img.onerror = function (e) {
reject(new Error("Error rasterizing text layer " + e));
};
});
});
}
return rasterizeTextLayer;
})();
and the CSS-file that it loads is found in https://github.com/mozilla/pdf.js/blob/master/test/text_layer_test.css


Unfortunately we're going to need to somehow debug/fix the underlying issue first, since it's virtually impossible to review this patch if we cannot actually trust the test-suite.

@timvandermeij
Copy link
Contributor

Now that the issue mentioned above should be fixed, could you rebase this and retrigger the tests to see how this goes now?

@calixteman
Copy link
Contributor Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://3.101.106.178:8877/0aa7a86a5d538bc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.67.70.0:8877/1bbd7ece8d4728b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/1bbd7ece8d4728b/output.txt

Total script time: 26.88 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/1bbd7ece8d4728b/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/0aa7a86a5d538bc/output.txt

Total script time: 28.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/0aa7a86a5d538bc/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 31, 2021

Looking at the latest test results, it seems to be a pretty even split between test-cases that improve respectively regress with this patch; hence it's unfortunately not really clear (at least to me) that this is an overall improvement as-is.

@timvandermeij
Copy link
Contributor

Yes, for example chrome-arabiccidtruetype-text-page1 is clearly improved and chrome-issue6387-text-page1 is even quite a dramatic improvement, but I also see firefox-issue8229-page1 for example which looks a bit worse. In general my feeling is that overall this is an improvement, but it would be good to have an additional look at the ones that look a bit worse here to see if it can be improved further.

@calixteman
Copy link
Contributor Author

@timvandermeij I've the impress that overall this is an improvement too even if it's a bit worse for some of them.

@brendandahl
Copy link
Contributor

/botio test
(re-running to see with chrome updates)

@calixteman
Copy link
Contributor Author

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://3.101.106.178:8877/8751e8aba462300/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://3.101.106.178:8877/8751e8aba462300/output.txt

Total script time: 29.56 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://3.101.106.178:8877/8751e8aba462300/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/c9a878dbc0da2b8/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/c9a878dbc0da2b8/output.txt

Total script time: 23.85 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/c9a878dbc0da2b8/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

I have looked at all reference test differences with this version of the patch and I'd say this is an overall improvement. Some files are improved quite dramatically, others have minor improvements and for a few it's a bit worse, but given that we have to rely on a heuristic here I don't think that will ever be perfect, so this will remain to be a trade-off. In this case the majority of cases is an improvements, so let's go with this and we can always tweak it further once more test cases are presented to us.

Thank you for putting effort into this!

@timvandermeij timvandermeij merged commit c79fd71 into mozilla:master Feb 13, 2021
@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/10bb771551af041/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://3.101.106.178:8877/4b96eb3e2509288/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/10bb771551af041/output.txt

Total script time: 21.91 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://3.101.106.178:8877/4b96eb3e2509288/output.txt

Total script time: 27.44 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij
Copy link
Contributor

This closed quite a few issues, some of them mentioned above, so nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants