Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workaround for the Issue #450 #453

Merged
merged 8 commits into from
Aug 27, 2021
Merged

workaround for the Issue #450 #453

merged 8 commits into from
Aug 27, 2021

Conversation

izabala
Copy link
Contributor

@izabala izabala commented Aug 19, 2021

The file makes that 2 of the Page methods fails.

The Page->extractDecodedRawData was not returning the correct string. This was corrected.

The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work.

This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.

The file makes that 2 of the Page methods fails.

The Page->extractDecodedRawData was not returning the correct string. This was corrected.

The Page->getTextArray breaks when the Page->get(´Contents´) returns a PDFObject, but this object makes that the PDFObject->getTextArray($this) throw an Error. But if you detected it and instead call PDFObject->getTextArray() , it returns the correct data. This is a workaround, because, what is exactly the difference in the format of this PDF and why it fails, needs to have a more deep investigation. I run all the PageTests and they work.

This happends because the sample Pdf file is not format as we usually see in other files. Actually, I have a similar (not exactly the same) case for a file created with FPDI, that also broke the getTextArray and getDataTm methods, but I am doing a research to see what is actually happends before I open an Issue for that. As soon as I know what is happening in that case, I will opened the Issue, hopefully with the workaround or fix already done.
@izabala
Copy link
Contributor Author

izabala commented Aug 19, 2021

Guys, I already run CS Fixer, but I still get the errors. Please some advice here.

@k00ni
Copy link
Collaborator

k00ni commented Aug 20, 2021

Thank you for your work @izabala!

Guys, I already run CS Fixer, but I still get the errors. Please some advice here.

I am busy this week, but I will try to assist you next week.

@eddturtle
Copy link

I grabbed a copy of your repo @izabala + phpcs was failing (like above). I opened it in my editor and saved and it auto changed quite a few things, now it passes php cs. Here's the updated code:

    public function testExtractDecodedRawDataIssue450()
    {
        $filename = $this->rootDir.'/samples/bugs/Issue450.pdf';
        $parser = $this->getParserInstance();
        $document = $parser->parseFile($filename);
        $pages = $document->getPages();
        $page = $pages[0];
        $extractedDecodedRawData = $page->extractDecodedRawData();
        $this->assertIsArray($extractedDecodedRawData);
        $this->assertGreaterThan(3, \count($extractedDecodedRawData));
        $this->assertIsArray($extractedDecodedRawData[3]);
        $this->assertEquals('TJ', $extractedDecodedRawData[3]['o']);
        $this->assertIsArray($extractedDecodedRawData[3]['c']);
        $this->assertIsArray($extractedDecodedRawData[3]['c'][0]);
        $this->assertEquals(3, \count($extractedDecodedRawData[3]['c'][0]));
        $this->assertEquals('{signature:signer505906:Please+Sign+Here}', $extractedDecodedRawData[3]['c'][0]['c']);
    }

    public function testGetDataTmIssue450()
    {
        $filename = $this->rootDir.'/samples/bugs/Issue450.pdf';
        $parser = $this->getParserInstance();
        $document = $parser->parseFile($filename);
        $pages = $document->getPages();
        $page = $pages[0];
        $dataTm = $page->getDataTm();
        $this->assertIsArray($dataTm);
        $this->assertEquals(1, \count($dataTm));
        $this->assertIsArray($dataTm[0]);
        $this->assertEquals(2, \count($dataTm[0]));
        $this->assertIsArray($dataTm[0][0]);
        $this->assertEquals(6, \count($dataTm[0][0]));
        $this->assertEquals(1, $dataTm[0][0][0]);
        $this->assertEquals(0, $dataTm[0][0][1]);
        $this->assertEquals(0, $dataTm[0][0][2]);
        $this->assertEquals(1, $dataTm[0][0][3]);
        $this->assertEquals(67.5, $dataTm[0][0][4]);
        $this->assertEquals(756.25, $dataTm[0][0][5]);
        $this->assertEquals('{signature:signer505906:Please+Sign+Here}', $dataTm[0][1]);
    }

Hope this helps!

As mentioned in the issue, I think the actual code change is looking good to me.

@izabala
Copy link
Contributor Author

izabala commented Aug 20, 2021

Thanks @k00ni. I wait, I just want to be sure what I am doing wrong!

@k00ni k00ni self-assigned this Aug 22, 2021
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izabala thank you for the PR.

I took the liberty to directly adapt a few files in your repository (see latest commits). All tests are good now.

In ParserTest I had to refine a few lines to fix failing test. It was not related to your changes.

I have one question (see below). After we solved it we are good to go.

src/Smalot/PdfParser/Page.php Outdated Show resolved Hide resolved
Taking out the line:
$decodedText = '';
This was not needed. Thanks @j0k3r
To catching Throwable.
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏿 Thanks.

As always, I will keep this open for a few days so others can comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants