PDF-Processor is a Python application that uses Adobe's PDF Services SDK to extract text and tables from PDF files. The extracted data is saved as a ZIP file in the 'Processed' folder. This application provides a user-friendly interface to upload and process PDF files.
The main script (main.py
) performs the following steps:
- Configures logging level.
- Defines a function
process_pdf
that:- Logs the start of the process.
- Creates a credentials instance using client ID and secret from environment variables.
- Creates an ExecutionContext using the credentials and a new ExtractPDFOperation instance.
- Sets the uploaded file as the input for the operation.
- Builds and sets options for the PDF extraction operation, specifying what elements to extract.
- Executes the operation and gets the result as a FileRef object.
- Checks if the "Processed" folder exists, if not creates it.
- Saves the result (a ZIP file containing the extracted data) to the "Processed" folder.
- Creates a Gradio interface to interact with the
process_pdf
function. - Launches the Gradio interface.
- Install the Adobe PDF Services SDK for Python. You can find the SDK and installation instructions here.
- Install the Gradio library using pip:
pip install gradio
- Clone this repository or download the source code.
- Replace the placeholders in the
Credentials/pdfservices-api-credentials.json
file with your Adobe PDF Services API credentials.
- Run the
launch_app.bat
file. This will start the Python script and open a new browser window with the Gradio interface. - In the Gradio interface, upload the PDF file you want to process.
- Click the 'Submit' button to start the processing. Once the processing is complete, you will see a message indicating the successful completion of the process.
- The result (a ZIP file containing the extracted data) will be saved in the 'Processed' folder.