-
Notifications
You must be signed in to change notification settings - Fork 664
Description
Is your feature request related to a problem? Please describe.
I have OCR'd an image to generate a text layer over the image. This text layer is invisible in the pdf. I then use ghostscript to remove image and vector data to just keep the text layer to further reduce file size but keep page textual structure intact.
TestOCR.pdf - OCR'd image as pdf
TestOCR_textonly.pdf - removed image and vector data using ghostscript -dFILTERIMAGE -dFILTERVECTOR, We can highlight over this "blank" pdf to see the text layer is still there.
Describe the solution you'd like
Make this text layer visible in TestOCR_textonly.pdf. I want the OCR'd text to be visible following the same structural layout as the input.
Can I change the render mode or color for all the text in this pdf to be visible?
My pipeline will eventually deal with very large pdf files, so would like the solution to be performant as well.
@JorjMcKie I have tried your solutions for changing text font color found here but to no avail. Would really appreciate any support.