Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider scaling by fontSize(Tf, Tfs) and text matrix (Tm) #559

Merged
merged 2 commits into from
Dec 21, 2022
Merged

consider scaling by fontSize(Tf, Tfs) and text matrix (Tm) #559

merged 2 commits into from
Dec 21, 2022

Conversation

oliver681
Copy link
Contributor

@oliver681 oliver681 commented Nov 13, 2022

closes #532

Previously, the coordinates in getDataTm() were calculated without taking into accout the scaling that happens by the set font size and text matrix.

In a PDF that would e.g. look like:

  • /R9 11.04 Tf (second operand is font size)
  • 0.999402 0 0 1 70.8 698.96 Tm (first and fourth operand set horizontal and vertical scaliing)

When the posiiton is changed using the text positioning operators Td or TD, the new coordinates are now calculated with the set scaling.

One line in the test files was changed to reflect the correct scaling.

This pull request also fixes a sign error in the calculation of the y-coordinate in TD.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliver681 Thank you for taking the time and this pull request.

In the following just a few remarks and questions:

One line in the test files was changed to reflect the correct scaling.

That is alright. Are there any edge cases now we have to take care of? I consider changing tests bad practice in general, but in this case core functionality was buggy in the first place. I would welcome at least one additional test.

This pull request also fixes a sign error in the calculation of the y-coordinate in TD.

What do you mean with "signing" here?

To fix the failing test just run PHP-CS-Fixer locally.

@oliver681
Copy link
Contributor Author

oliver681 commented Nov 14, 2022

@k00ni

Are there any edge cases now we have to take care of?

As far as I can see there aren't. But the coordinates are still not correct in some cases: see #558. I mentioned this issue there as well. I will remove the part that references #532 after this pull request has been merged to avoid confusion.

I would welcome at least one additional test.

Do you know how I can easily create a PDF manually? I was trying to write an extra test yesterday, but it is necessary to write the test file manually to ensure that Td, TD and next line operators are called to test all possible cases of position translation.

What do you mean with "signing" here?

See line 765. The y-translation was being subtracted instead of added:

$Ty += (float) $coord[1] * (float) $fontSize * (float) $Tm[$vSc];

To fix the failing test just run PHP-CS-Fixer locally.

Ok, i will do that. I wasn't sure if that was right, because when I run that yesterday it changed a lot of code that I haven't touched.

@oliver681 oliver681 marked this pull request as draft November 14, 2022 09:47
@oliver681 oliver681 requested a review from k00ni November 14, 2022 09:59
@oliver681
Copy link
Contributor Author

oliver681 commented Nov 14, 2022

@k00ni What about the errors phpstan finds in RawDataParser?

@k00ni
Copy link
Collaborator

k00ni commented Nov 15, 2022

Ok, i will do that. I wasn't sure if that was right, because when I run that yesterday it changed a lot of code that I haven't touched.

I thought about it and my proposal is to fix all problems which arise during a pull request (even though it "pollutes" your fix). This way we always have a stable code base and can't ignore errors (by mistake). So thanks for fixing it right away.

What about the errors PHPStan finds in RawDataParser?

We use PHPStan always in the latest stable version. A side effect is, that its getting better in finding errors 😆 Can you fix them? If not or you are unsure, I can try to help out.

Do you know how I can easily create a PDF manually? I was trying to write an extra test yesterday, but it is necessary to write the test file manually to ensure that Td, TD and next line operators are called to test all possible cases of position translation.

Please try to build a basic test around Page.php::getDataTm(). You could make faulty $dataCommands-instances to trigger your code part and see how it reacts. If that type of testing is new to you, give me a ping, I will give you an example.

@ilconfo
Copy link

ilconfo commented Nov 15, 2022

Hello,
I was working on some other issue and I've found a minus sign issue also under the T* case, I think it should not be:

$Ty -= $Tl;

but

$Ty += $Tl;

@oliver681
Copy link
Contributor Author

oliver681 commented Nov 15, 2022

@ilconfo No, in this case the minus is right. The text leading is given as a positive number, but the y-coordinate in a PDF is from bottom to top, so the coordinate must be decreased by Tl to obtain the new y-position.

But the comment above the code line is actually wrong. It should be changed to 0 -Tl Td instead of 0 Tl Td.

See also here in the PDF specification:

grafik

@k00ni
Copy link
Collaborator

k00ni commented Nov 24, 2022

@oliver681 can you tell me the status here? Any plans to finish this soon? If you need help, let me know.

@oliver681
Copy link
Contributor Author

@k00ni

can you tell me the status here? Any plans to finish this soon? If you need help, let me know.

I was busy unfortunately. I hope I can write the test case in the next days.

@oliver681
Copy link
Contributor Author

oliver681 commented Nov 29, 2022

@k00ni Sorry, I won't have time to work on this until 9th December.

@oliver681
Copy link
Contributor Author

oliver681 commented Dec 16, 2022

@k00ni

Finally I can finish this :)

Inside getDataCommands, Tf and TF might be igored by the following setting . This will lead to wrong positions by default (default value = false) if font size is changed by these operators.

Is there a good reason why to exclude these commands? Otherwise I would suggest removing this if-statement at that place (getDataCommands) and only leave it within getDataTm(), thus resulting in the desired behaviour (only including font information in dataTm if set to true, but still returning correct coordinates).

grafik

@k00ni
Copy link
Collaborator

k00ni commented Dec 16, 2022

Please merge in latest master branch so we can get rid of these coding style fixes.

Inside getDataCommands, Tf and TF might be igored by the following setting . This will lead to wrong positions by default (default value = false) if font size is changed by these operators.
Is there a good reason why to exclude these commands? Otherwise I would suggest removing this if-statement at that place (getDataCommands) and only leave it within getDataTm(), thus resulting in the desired behaviour (only including font information in dataTm if set to true, but still returning correct coordinates).

Which file do you mean? And can you please check via Git Blame, if you find the author of this line (He or she might be of help later on). You can do this via Github and it might look like here: https://github.com/smalot/pdfparser/blame/8ae72ac808a6d6913350545463058e1e7ca3561f/tests/Integration/ParserTest.php

@oliver681
Copy link
Contributor Author

oliver681 commented Dec 16, 2022

@k00ni

The line was changed by shtayerc.

grafik

As far as I can see, removing it won't affect functionality, because it just affects the data commands returned by get DataCommands() but the font info is still ignored in get DataTm().

Which file do you mean?

Page.php:617

@k00ni
Copy link
Collaborator

k00ni commented Dec 16, 2022

@shtayerc I would really appreciate if you could help us out here. There is a question about code you pushed in the past: #559 (comment)

@oliver681
Copy link
Contributor Author

@k00ni When running phpunit locally, i get the following errors (always the same error):

grafik

Any idea how I can get rid of them?

@shtayerc
Copy link
Contributor

I quickly checked the code. If statement is probably there to keep the same behavior as before and to pass tests for getDataCommands (

$this->assertCount(168, $dataCommands);
). I did not put that much thought in that. The correct way was to remove the if and fix tests.

@k00ni
Copy link
Collaborator

k00ni commented Dec 16, 2022

@shtayerc thank you for your fast response!

When running phpunit locally, i get the following errors (always the same error)
Any idea how I can get rid of them?

Do you have PHP-extension iconv installed? https://www.php.net/manual/en/book.iconv.php

@k00ni
Copy link
Collaborator

k00ni commented Dec 19, 2022

If I see it correctly, you force-pushing and remove/replace older commits. Thats fine, but if you only do it so that we have a clean history, dont worry, I will squash all commits when merging.

@oliver681
Copy link
Contributor Author

@k00ni It will be only one commit anyway ;) I needed to force push because I haven't got phpunit to work locally to see if my tests are correct or not. But now I got it to work locally. I'm adding another test case and I think I'll be ready soon ;)

@oliver681 oliver681 marked this pull request as ready for review December 19, 2022 18:10
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliver681 thank you for taking the effort! It looks good to me, I just have one question (see below).

*/
$hSc = 0; // horizontal scaling
$vSc = 3; // vertical scaling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why using 3 as default value for vertical scaling? Is there a reference in the PDF spec for instance?

Copy link
Contributor Author

@oliver681 oliver681 Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k00ni It is not the value of the vertical scaling, but the index of vertical scaling the in the array that encodes the text matrix.

grafik

So it corresponds to d here, while hSc corresponds to a

@oliver681
Copy link
Contributor Author

@k00ni I forgot to exclude the if-statement we talked about above. Now its removed and I adapted the corresponding test cases.

@k00ni
Copy link
Collaborator

k00ni commented Dec 21, 2022

@oliver681 I appreciate your work here, thanks.

I will merge it probably today/tomorrow and add it to a Christmas release.

@k00ni k00ni merged commit 70433d1 into smalot:master Dec 21, 2022
@oliver681 oliver681 deleted the y-coordinate branch December 21, 2022 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in Y coordinate
4 participants