-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider scaling by fontSize(Tf, Tfs) and text matrix (Tm) #559
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliver681 Thank you for taking the time and this pull request.
In the following just a few remarks and questions:
One line in the test files was changed to reflect the correct scaling.
That is alright. Are there any edge cases now we have to take care of? I consider changing tests bad practice in general, but in this case core functionality was buggy in the first place. I would welcome at least one additional test.
This pull request also fixes a sign error in the calculation of the y-coordinate in TD.
What do you mean with "signing" here?
To fix the failing test just run PHP-CS-Fixer locally.
As far as I can see there aren't. But the coordinates are still not correct in some cases: see #558. I mentioned this issue there as well. I will remove the part that references #532 after this pull request has been merged to avoid confusion.
Do you know how I can easily create a PDF manually? I was trying to write an extra test yesterday, but it is necessary to write the test file manually to ensure that Td, TD and next line operators are called to test all possible cases of position translation.
See line 765. The y-translation was being subtracted instead of added:
Ok, i will do that. I wasn't sure if that was right, because when I run that yesterday it changed a lot of code that I haven't touched. |
@k00ni What about the errors phpstan finds in RawDataParser? |
I thought about it and my proposal is to fix all problems which arise during a pull request (even though it "pollutes" your fix). This way we always have a stable code base and can't ignore errors (by mistake). So thanks for fixing it right away.
We use PHPStan always in the latest stable version. A side effect is, that its getting better in finding errors 😆 Can you fix them? If not or you are unsure, I can try to help out.
Please try to build a basic test around |
Hello,
but
|
@ilconfo No, in this case the minus is right. The text leading is given as a positive number, but the y-coordinate in a PDF is from bottom to top, so the coordinate must be decreased by Tl to obtain the new y-position. But the comment above the code line is actually wrong. It should be changed to See also here in the PDF specification: |
@oliver681 can you tell me the status here? Any plans to finish this soon? If you need help, let me know. |
I was busy unfortunately. I hope I can write the test case in the next days. |
@k00ni Sorry, I won't have time to work on this until 9th December. |
Finally I can finish this :) Inside getDataCommands, Is there a good reason why to exclude these commands? Otherwise I would suggest removing this if-statement at that place (getDataCommands) and only leave it within getDataTm(), thus resulting in the desired behaviour (only including font information in dataTm if set to true, but still returning correct coordinates). |
Please merge in latest master branch so we can get rid of these coding style fixes.
Which file do you mean? And can you please check via Git Blame, if you find the author of this line (He or she might be of help later on). You can do this via Github and it might look like here: https://github.com/smalot/pdfparser/blame/8ae72ac808a6d6913350545463058e1e7ca3561f/tests/Integration/ParserTest.php |
The line was changed by shtayerc. As far as I can see, removing it won't affect functionality, because it just affects the data commands returned by get DataCommands() but the font info is still ignored in get DataTm().
Page.php:617 |
@shtayerc I would really appreciate if you could help us out here. There is a question about code you pushed in the past: #559 (comment) |
@k00ni When running phpunit locally, i get the following errors (always the same error): Any idea how I can get rid of them? |
I quickly checked the code. If statement is probably there to keep the same behavior as before and to pass tests for getDataCommands ( pdfparser/tests/Integration/PageTest.php Line 259 in 988efcb
|
@shtayerc thank you for your fast response!
Do you have PHP-extension |
If I see it correctly, you force-pushing and remove/replace older commits. Thats fine, but if you only do it so that we have a clean history, dont worry, I will squash all commits when merging. |
@k00ni It will be only one commit anyway ;) I needed to force push because I haven't got phpunit to work locally to see if my tests are correct or not. But now I got it to work locally. I'm adding another test case and I think I'll be ready soon ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oliver681 thank you for taking the effort! It looks good to me, I just have one question (see below).
src/Smalot/PdfParser/Page.php
Outdated
*/ | ||
$hSc = 0; // horizontal scaling | ||
$vSc = 3; // vertical scaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why using 3
as default value for vertical scaling? Is there a reference in the PDF spec for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k00ni It is not the value of the vertical scaling, but the index of vertical scaling the in the array that encodes the text matrix.
So it corresponds to d
here, while hSc
corresponds to a
…Tm) to calculate accurate x,y-coordinates
@k00ni I forgot to exclude the if-statement we talked about above. Now its removed and I adapted the corresponding test cases. |
@oliver681 I appreciate your work here, thanks. I will merge it probably today/tomorrow and add it to a Christmas release. |
closes #532
Previously, the coordinates in getDataTm() were calculated without taking into accout the scaling that happens by the set font size and text matrix.
In a PDF that would e.g. look like:
When the posiiton is changed using the text positioning operators Td or TD, the new coordinates are now calculated with the set scaling.
One line in the test files was changed to reflect the correct scaling.
This pull request also fixes a sign error in the calculation of the y-coordinate in TD.