Extract data, objects, elements from the PDF #8

pknabe · 2023-10-19T17:05:58Z

What do I want to do?
I would like to extract content from a PDF in PHP using the Smalot/PDFparser library (already installed and running).
What do I want to extract?
I would like to extract the PDF's content from a very specific level/layer.
Secondly, I would like to hide these levels individually.
I would like to save the content of the extracted level a new PDF and/or output it as XML.
I add an example PDF.
It would be nice if you could show me the problem mentioned in a small code example.

My development and system environment:
I develop on a Windows (10) machine under Laragon (Localhost) with PHP.
example_pdf.pdf

Thanks for your help in advance.

fahadadeel · 2024-01-03T10:28:23Z

@pknabe

Based on your requirement to extract content from a specific level or layer of a PDF and then manipulate it (like hiding these levels or saving the content as a new PDF/XML), it's worth noting that while the Smalot\PdfParser library in PHP is adept at extracting text, images, and other basic elements from PDFs, it may not natively support the nuanced task of interacting with specific layers or levels of a PDF document directly.

As of my knowledge, the library is primarily focused on extracting rudimentary elements and might not provide functionalities for detailed layer or level manipulation. Such tasks often involve understanding and altering the PDF's structure, which can be complex and is not typically within the purview of basic parsing libraries.

However, if you have found a solution or a workaround that fits within the scope of PHP and Smalot\PdfParser, it would be great to share it with the community. Thanks

pknabe · 2024-01-04T09:28:12Z

Hello Fahad, Even though I didn't really expect any feedback after such a long time, I would of course like to thank you very much for your answer. In any case, it is now clear and we have to rely on a different solution in this project. Best regards and thank you again for your effort. Paulo

…

------ Originalnachricht ------ Von: "Fahad Adeel" ***@***.***> An: "fileformat-free-consulting/projects" ***@***.***> Cc: "pknabe" ***@***.***>; "Mention" ***@***.***> Gesendet: 03.01.2024 10:28:35 Betreff: Re: [fileformat-free-consulting/projects] Extract data, objects, elements from the PDF (Issue #8)

@pknabe <https://github.com/pknabe> Based on your requirement to extract content from a specific level or layer of a PDF and then manipulate it (like hiding these levels or saving the content as a new PDF/XML), it's worth noting that while the Smalot\PdfParser library in PHP is adept at extracting text, images, and other basic elements from PDFs, it may not natively support the nuanced task of interacting with specific layers or levels of a PDF document directly. As of my knowledge, the library is primarily focused on extracting rudimentary elements and might not provide functionalities for detailed layer or level manipulation. Such tasks often involve understanding and altering the PDF's structure, which can be complex and is not typically within the purview of basic parsing libraries. However, if you have found a solution or a workaround that fits within the scope of PHP and Smalot\PdfParser, it would be great to share it with the community. Thanks — Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARAS5DMC3YYAUZFDMLUYWW3YMUXFHAVCNFSM6AAAAAA6HRGXQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZVGE2DMMBSGM>. You are receiving this because you were mentioned.Message ID: ***@***.***>

sabir-aspose closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract data, objects, elements from the PDF #8

Extract data, objects, elements from the PDF #8

pknabe commented Oct 19, 2023

fahadadeel commented Jan 3, 2024

pknabe commented Jan 4, 2024 via email

Extract data, objects, elements from the PDF #8

Extract data, objects, elements from the PDF #8

Comments

pknabe commented Oct 19, 2023

fahadadeel commented Jan 3, 2024

pknabe commented Jan 4, 2024 via email