Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #7833 - Test for considering using addingFakeSpaces in textMove op compares wrong width/height #7834

Closed
wants to merge 1 commit into from

Conversation

lightman76
Copy link

@lightman76 lightman76 commented Nov 21, 2016

The arguments of the textMove (Td) operator are relative to the last text positioning operator. Therefor to determine the remaining unused width to check for wether to consider adding spaces, need to look at the difference in the parameter and the lastAdvanceWidth/Height of the current textContentItem.

Fixes #7833.

@@ -1459,10 +1459,11 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
var isSameTextLine = !textState.font ? false :
((textState.font.vertical ? args[0] : args[1]) === 0);
advance = args[0] - args[1];
var relativeAdvance = advance - textContentItem.lastAdvanceWidth - textContentItem.lastAdvanceHeight;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's diff that we are calculating below. Also shall we calculate advance this way (instead of introducing relative advance)? Does it require resetting textContentItem.lastAdvanceWidth/lastAdvanceHeight to different value?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to adjust the OPS.setTextMatrix case similarly case as well?

@yurydelendik
Copy link
Contributor

@lightman76 Can you introduce a 'text' test case? (or at least point to PDF to verify the fix)?

@timvandermeij
Copy link
Contributor

timvandermeij commented Nov 21, 2016

The PDF file is in #7833. I have edited the PR description to include this so that GitHub will close the issue automatically if this is merged.

As for testing, I think we need something like https://github.com/mozilla/pdf.js/pull/5783/files#diff-7776f1033c4a5613443defe1ecb39256. I wonder if we will see the difference with reference tests.

@Snuffleupagus
Copy link
Collaborator

As for testing, I think we need something like https://github.com/mozilla/pdf.js/pull/5783/files#diff-7776f1033c4a5613443defe1ecb39256. I wonder if we will see the difference with reference tests.

A standard text reference test ought to be sufficient here, so no need to complicate things :-)

@lightman76
Copy link
Author

Ok - I've updated to move the diff calculation earlier in the OPS.moveText block and removed the relativeAdvance. This also resolved the linting check that failed.

As far as OPS.setTextMatrix @Snuffleupagus mentioned in the review of the earlier version of the pull request, I'm pretty new to the guts of the PDF spec, but at first glance, it would appear it suffers from a similar comparison issue.

From the same review of the earlier version of the pr by @yurydelendik "Does it require resetting textContentItem.lastAdvanceWidth/lastAdvanceHeight to different value?": Again, I'm pretty new to this but I think you are right, because if there's a series of textMove and showText commands it will continue appending to the same textContentItem since the moveText branch that calls addFakeSpaces then skips flushing the current textContentItem.

It seems to me (a bit sleep deprived at the moment though) that this branch doesn't account for the moveText command then in the textContentItem - seems it should add to lastAdvanceWidth/Height and/or the width/height for where this move was supposed to be. I haven't dug in enough to determine if it should just be accounting for the width of the fake spaces inserted or for the full "diff".

If anyone is familiar with this code and can help me wrap my head around this, would appreciate any insight. Otherwise will need to look at this with a clearer head, which probably may not happen until next week as I'm traveling right now.

@@ -1459,17 +1459,17 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
var isSameTextLine = !textState.font ? false :
((textState.font.vertical ? args[0] : args[1]) === 0);
advance = args[0] - args[1];
diff = (args[0] - textContentItem.lastAdvanceWidth) -
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the advance variable above this line is not used anymore, so we can probably remove it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I've updated pr to remove that.

@lightman76
Copy link
Author

I've updated the setTextMatrix to use the diff (distance from last rendered text to the new position) for comparing to the min/max space to consider adding fake spaces.

I've also added some test PDFs and test cases in the api_spec.js that exercise the addFakeSpaces cases for the moveText and setTextMatrix cases. These tests failed prior to these changes and now pass.

Just saw I have some linter errors - will fix those up and update.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Nov 28, 2016

I've also added some test PDFs and test cases in the api_spec.js that exercise the addFakeSpaces cases for the moveText and setTextMatrix cases.

Instead of adding a large number of test files, how about adding just one PDF file with multiple pages instead?

Edit: Also, please don't forget to squash the commits, see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits.

@lightman76
Copy link
Author

Ah - hadn't thought of that. Will consolidate the test PDFs...

Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few inline nits, and also a suggestion on how you can make the unit-tests a bit nicer.

A hopefully useful piece of advice, since I noticed that you've got a merge commit above, is to use git rebase rather than git merge when updating a patch series.
Please also remember to squash the commits, see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits.

@@ -1492,18 +1491,20 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
// Optimization to treat same line movement as advance.
advance = textState.calcTextLineMatrixAdvance(
args[0], args[1], args[2], args[3], args[4], args[5]);
if(advance !== null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing space between if and (, i.e. this line should read
if (advance !== null) {.

@@ -40,6 +40,7 @@
!issue7544.pdf
!issue7598.pdf
!issue7665.pdf
!issue7833*.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can remove the * now.

@@ -890,6 +890,141 @@ describe('api', function() {
done.fail(reason);
});
});

describe('adds whitespace', function() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this isn't a terribly descriptive name for a test, how about something like e.g. the following instead?
gets text content - check insertion of fake spaces (issue 7833)

@@ -890,6 +890,141 @@ describe('api', function() {
done.fail(reason);
});
});

describe('adds whitespace', function() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you use one PDF file, I think that you should be able to make the unit-test a lot more efficient by only loading the PDF file once, and by using a helper function for getting the text. E.g. something like this: https://gist.github.com/Snuffleupagus/152e9450f568fb03dc4a78e1c86d3aff

@lightman76
Copy link
Author

Thanks for the feedback. Have a deadline coming up later this week. After that will make the suggested changes.

@lightman76
Copy link
Author

Ok - updated the tests as you suggested - hadn't used the beforeAll before. Much cleaner - thanks!

Let me know if there are any other changes needed or other tests I should add around this. Thanks!

@Snuffleupagus
Copy link
Collaborator

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Jan 4, 2017

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/ef8920bc03b9afb/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 4, 2017

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/e7123fe3d16e848/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 4, 2017

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/ef8920bc03b9afb/output.txt

Total script time: 25.36 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/ef8920bc03b9afb/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Jan 4, 2017

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/e7123fe3d16e848/output.txt

Total script time: 25.93 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/e7123fe3d16e848/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

@lightman76 Really sorry for the delay with this PR, I think it fell off the radar for a bit!
As you can see in the test results just above, there seems to be a couple of regressions, could you please take a look?

For information about running tests locally, please refer to https://github.com/mozilla/pdf.js/wiki/Contributing and more specifically https://github.com/mozilla/pdf.js/wiki/Contributing#4-run-lint-and-testing.

…around when addFakeSpaces was triggered. Fixed some lint errors.
@lightman76
Copy link
Author

/botio test

@lightman76
Copy link
Author

I've fixed some linting errors that popped up. I'm not having any luck getting the image tests to run locally on my machine (OS X). I tried to generate the reference images from master, however i'm getting some failures there, so it looks like I can't generate the master set then. I tried to see if I could kick off a re-run here with bot.io, but appears it won't listen to me.

Going to try to get my linux box setup to build this and see if I have better luck with the image tests there.

@yurydelendik
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Jan 6, 2017

From: Bot.io (Windows)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.22.172.223:8877/4623faa81a98589/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 6, 2017

From: Bot.io (Linux)


Received

Command cmd_test from @yurydelendik received. Current queue size: 0

Live output at: http://107.21.233.14:8877/98cd36772f75a2f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jan 6, 2017

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/4623faa81a98589/output.txt

Total script time: 25.68 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/4623faa81a98589/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Jan 6, 2017

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/98cd36772f75a2f/output.txt

Total script time: 26.20 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/98cd36772f75a2f/reftest-analyzer.html#web=eq.log

@lightman76
Copy link
Author

Thanks. I'll take a look at those image failures now that this branch is up to date and see what's going on there.

@yurydelendik
Copy link
Contributor

yurydelendik commented Jan 6, 2017

I'm not having any luck getting the image tests to run locally on my machine (OS X).

@lightman76 There are some intermittent failures on mac osx. (you can copy/move generated "tmp" master images for "ref" folder for workaround fix the issue)

@brendandahl
Copy link
Contributor

@lightman76 There are some intermittent failures on mac osx. (you can copy/move generated "tmp" master images for "ref" folder for workaround fix the issue)

Or disable all the FBF tests temporarily.

@lightman76
Copy link
Author

Hey all - just wanted let you know I haven't forgotten about this. Been sick and have a major deadline coming up for another project. Will get this back up to date and figure out the remaining issues hopefully in a few weeks.

Most of the tests that were failing the screenshots I think are fine, but the text layer positioning is different from the previous version (generally aligns better I think). However there are 2 or 3 that are way off. I need to dig into those PDFs to understand which operations are used and why that's happening.

@timvandermeij
Copy link
Contributor

Awesome! Thank you for letting us know.

@timvandermeij
Copy link
Contributor

No more work has been done here and it's not in a mergeable state now, so I'm closing this. However, the associated issue will remain open with a reference to this PR so the work can be continued later.

@burtonator
Copy link

@lightman76 Thanks for your hard work on this issue.! Would love to see this fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants