-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Issue #454 #455
Fix for Issue #454 #455
Conversation
The file makes that 2 of the Page methods fails. The Page->extractDecodedRawData was not returning the correct string. This was corrected. The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work. This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.
This test is a bit wonky because it relies on memory values which may differ from system to system and run to run. Adjusted values to fix it. Ref: https://github.com/smalot/pdfparser/pull/453/checks?check_run_id=3397695916#step:6:22
Taking out the line: $decodedText = ''; This was not needed. Thanks @j0k3r
To catching Throwable.
When the pdf files is produced by setasign/fpdi/fpdi or FPDF, this correct that nothing is returning by the methods. But for doing that things like to know that the producer is FPDF and the page number are required and used in conjunction with getXObjects.
Hi, @k00ni it seems that I still have the problem (when I create a new branch) with the CS Fixer. May you help me with that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much time these days, but I hope it helps you a bit for now.
Also, did you run php-cs-fixer in dev-tools
folder? You can use the following shortcut, run it in root folder:
make run-php-cs-fixer
Ok... I am just travelling for 2 weeks. As soon as I get back, I will make all changes and tests. |
I am getting back to this. I will start looking all the recommendations. Just one thing: I have to look how to use the: This is because I develop in a windows machine where I cann't run make, I am looking if I can install a windows tools that let me use make. By the way... i run: dev-tools\vendor\bin\php-cs-fixer fix --verbose --dry-run
and:
It finished without errors. |
Some of the changes asked in Github by kOOni
Other changes asked by k00ny
Some other @k00ni recommendations
I already installed a make tool for windows (the GNU one). But when you run the: |
I manually run dev-tools\vendor\bin\php-cs-fixer fix
just to make a code enhacement
Ok. Let me know if with the changes, this is Ok. |
I try to get back to you as soon as possible. Looks good so far. |
Some corrections metions by @k00ni.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it out, that was a file for debugging that I forget to take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you for another important pull request.
Your implementation looks fine, but I would like to discuss placements of these new if-clauses which check for isFpdf. Using them this way bloats the code to an extent IMHO and makes code also a bit harder to read.
In your opinion, does it makes sense to use special functions, like getTextAreaForFPdf
etc. to handle it? Or add further functions, because some code seems like (almost) the same multiple times. I would prefer the latter.
What do you think?
@k00ni About your comment:
Actually I just start doing a workaround/fix over the problem as soon as possible. Let me re-think the fix, and see if there is a better solution, now that we Know what the problem is, and how to solve it. Give me a couple of days to work on this. |
Follow the recomendation of @k00ni on using extra function to have the code clearer.
@k00ni , May you have a look on this approach? Let me know if this is ok or not. Thanks for all your recommendations |
It looks better, thank you. I try to find the time to give you a more in depth feedback soon, hopefully next week, maybe soon. |
Many changes following @k00ni recommendations.
Better explanation for the function
Changes in comments, functions names and variable names.
@k00ni Let me know if there is something more needed on this pull request. I believe I answered everything, and made all the changes that were requested. |
Sorry, I am currently super busy. Thank you for bearing with me. I made a few code nice ups and think that this PR is in a good state right now. If you have no objections I will merge it soon. |
I am OK with this! |
This is the fix/workaround I have been working this to fix that Issue.