Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Issue #454 #455

Merged
merged 23 commits into from
Oct 18, 2021
Merged

Fix for Issue #454 #455

merged 23 commits into from
Oct 18, 2021

Conversation

izabala
Copy link
Contributor

@izabala izabala commented Aug 28, 2021

This is the fix/workaround I have been working this to fix that Issue.

izabala and others added 10 commits August 19, 2021 16:34
The file makes that 2 of the Page methods fails.

The Page->extractDecodedRawData was not returning the correct string. This was corrected.

The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work.

This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.
This test is a bit wonky because it relies on memory values which may differ from system to system and run to run.
Adjusted values to fix it.

Ref: https://github.com/smalot/pdfparser/pull/453/checks?check_run_id=3397695916#step:6:22
Taking out the line:
$decodedText = '';
This was not needed. Thanks @j0k3r
To catching Throwable.
When the pdf files is produced by setasign/fpdi/fpdi or FPDF, this correct that nothing is returning by the methods.
But for doing that things like to know that the producer is FPDF and the page number are required and used in conjunction with getXObjects.
@izabala
Copy link
Contributor Author

izabala commented Aug 28, 2021

Hi, @k00ni it seems that I still have the problem (when I create a new branch) with the CS Fixer. May you help me with that?

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much time these days, but I hope it helps you a bit for now.

Also, did you run php-cs-fixer in dev-tools folder? You can use the following shortcut, run it in root folder:

make run-php-cs-fixer

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
tests/Integration/PageTest.php Outdated Show resolved Hide resolved
@izabala
Copy link
Contributor Author

izabala commented Aug 31, 2021

Ok... I am just travelling for 2 weeks. As soon as I get back, I will make all changes and tests.

@izabala
Copy link
Contributor Author

izabala commented Sep 22, 2021

I am getting back to this. I will start looking all the recommendations. Just one thing:

I have to look how to use the:
make run-php-cs-fixer

This is because I develop in a windows machine where I cann't run make, I am looking if I can install a windows tools that let me use make.

By the way... i run: dev-tools\vendor\bin\php-cs-fixer fix --verbose --dry-run
And even though it has some warnings like:

You are running PHP CS Fixer v2, which is not maintained anymore. Please update to v3. If you need help while solving warnings, ask at https://gitter.im/PHP-CS-Fixer, we will help you!

and:

- Configuration file `.php_cs` is deprecated, rename to `.php-cs-fixer.php`.
- PhpCsFixer\Config::create is deprecated since 2.17 and will be removed in 3.0, use the constructor instead.

It finished without errors.

Some of the changes asked in Github by kOOni
Other changes asked by k00ny
Some other @k00ni recommendations
@izabala
Copy link
Contributor Author

izabala commented Sep 24, 2021

I already installed a make tool for windows (the GNU one). But when you run the:
make run-php-cs-fixer
but it tries to run:
dev-tools/vendor/bin/php-cs-fixer fix
that in windows is not a command (the / instead of \).
So I have to run it manually ... it makes 64 changes in files that I havent touched. I will push that changes. Let me know if I have to re-do everything again.

I manually run dev-tools\vendor\bin\php-cs-fixer fix
just to make a code enhacement
@izabala
Copy link
Contributor Author

izabala commented Sep 24, 2021

Ok. Let me know if with the changes, this is Ok.

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
.vscode/launch.json Outdated Show resolved Hide resolved
@k00ni
Copy link
Collaborator

k00ni commented Sep 27, 2021

I try to get back to you as soon as possible. Looks good so far.

@izabala izabala requested a review from k00ni September 27, 2021 14:48
Copy link
Contributor Author

@izabala izabala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took it out, that was a file for debugging that I forget to take it out.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you for another important pull request.

Your implementation looks fine, but I would like to discuss placements of these new if-clauses which check for isFpdf. Using them this way bloats the code to an extent IMHO and makes code also a bit harder to read.

In your opinion, does it makes sense to use special functions, like getTextAreaForFPdf etc. to handle it? Or add further functions, because some code seems like (almost) the same multiple times. I would prefer the latter.

What do you think?

@izabala
Copy link
Contributor Author

izabala commented Oct 4, 2021

@k00ni About your comment:

In your opinion, does it makes sense to use special functions, like getTextAreaForFPdf etc. to handle it? Or add further functions, because some code seems like (almost) the same multiple times. I would prefer the latter.

What do you think?

Actually I just start doing a workaround/fix over the problem as soon as possible. Let me re-think the fix, and see if there is a better solution, now that we Know what the problem is, and how to solve it.

Give me a couple of days to work on this.

Follow the recomendation of @k00ni on using extra function to have the code clearer.
@izabala
Copy link
Contributor Author

izabala commented Oct 4, 2021

@k00ni , May you have a look on this approach? Let me know if this is ok or not.

Thanks for all your recommendations

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
@k00ni
Copy link
Collaborator

k00ni commented Oct 5, 2021

It looks better, thank you. I try to find the time to give you a more in depth feedback soon, hopefully next week, maybe soon.

Many changes following @k00ni recommendations.
Better explanation for the function
Changes in comments, functions names and variable names.
@izabala
Copy link
Contributor Author

izabala commented Oct 14, 2021

@k00ni Let me know if there is something more needed on this pull request. I believe I answered everything, and made all the changes that were requested.

@k00ni
Copy link
Collaborator

k00ni commented Oct 15, 2021

Let me know if there is something more needed on this pull request. I believe I answered everything, and made all the changes that were requested.

Sorry, I am currently super busy. Thank you for bearing with me. I made a few code nice ups and think that this PR is in a good state right now. If you have no objections I will merge it soon.

@izabala
Copy link
Contributor Author

izabala commented Oct 15, 2021

I am OK with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants