Skip to content

Change Visibility of OCR'd pdf text layer  #3533

@mikejokic

Description

@mikejokic

Is your feature request related to a problem? Please describe.

I have OCR'd an image to generate a text layer over the image. This text layer is invisible in the pdf. I then use ghostscript to remove image and vector data to just keep the text layer to further reduce file size but keep page textual structure intact.

TestOCR.pdf - OCR'd image as pdf

TestOCR_textonly.pdf - removed image and vector data using ghostscript -dFILTERIMAGE -dFILTERVECTOR, We can highlight over this "blank" pdf to see the text layer is still there.

TestOCR.pdf

TestOCR_textonly.pdf

Describe the solution you'd like

Make this text layer visible in TestOCR_textonly.pdf. I want the OCR'd text to be visible following the same structural layout as the input.
Can I change the render mode or color for all the text in this pdf to be visible?
My pipeline will eventually deal with very large pdf files, so would like the solution to be performant as well.

@JorjMcKie I have tried your solutions for changing text font color found here but to no avail. Would really appreciate any support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions