PDF-Processor

PDF-Processor is a Python application that uses Adobe's PDF Services SDK to extract text and tables from PDF files. The extracted data is saved as a ZIP file in the 'Processed' folder. This application provides a user-friendly interface to upload and process PDF files.

Overview of the Code

The main script (main.py) performs the following steps:

Configures logging level.
Defines a function process_pdf that:
- Logs the start of the process.
- Creates a credentials instance using client ID and secret from environment variables.
- Creates an ExecutionContext using the credentials and a new ExtractPDFOperation instance.
- Sets the uploaded file as the input for the operation.
- Builds and sets options for the PDF extraction operation, specifying what elements to extract.
- Executes the operation and gets the result as a FileRef object.
- Checks if the "Processed" folder exists, if not creates it.
- Saves the result (a ZIP file containing the extracted data) to the "Processed" folder.
Creates a Gradio interface to interact with the process_pdf function.
Launches the Gradio interface.

Requirements

Installation

Install the Adobe PDF Services SDK for Python. You can find the SDK and installation instructions here.
Install the Gradio library using pip:
```
pip install gradio
```
Clone this repository or download the source code.
Replace the placeholders in the Credentials/pdfservices-api-credentials.json file with your Adobe PDF Services API credentials.

Usage

Run the launch_app.bat file. This will start the Python script and open a new browser window with the Gradio interface.
In the Gradio interface, upload the PDF file you want to process.
Click the 'Submit' button to start the processing. Once the processing is complete, you will see a message indicating the successful completion of the process.
The result (a ZIP file containing the extracted data) will be saved in the 'Processed' folder.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Credentials		Credentials
README.md		README.md
adobe-dc-pdf-services-sdk-python.zip		adobe-dc-pdf-services-sdk-python.zip
launch_app.bat		launch_app.bat
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Processor

Overview of the Code

Requirements

Installation

Usage

About

Releases

Packages

Languages

mixelpixx/PDF-Processor

Folders and files

Latest commit

History

Repository files navigation

PDF-Processor

Overview of the Code

Requirements

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages