OCR PDF Attachments? #259

jmrichardson · 2018-04-20T02:52:44Z

Does/will OCRmyPDF support embedded documents//attachments in a portfolio? Thanks

jbarlow83 · 2018-04-20T06:35:18Z

Not currently, and it's not planned any time soon, but I think you're second or third person to ask so there's some demand anyway. (See also #197)

I made some notes about how to go about doing this, whether it's useful to you for me as reference when I implement it:

Recently Ghostscript added PDF/A-3 so it's possible within Ghostscript. The current solution would be to modify the pdfmark file, named pdfa.ps, generated by ocrmypdf/pdfa.py, to include a step to embed the file insert according to the pdfmark specification:
– see page 30, for the /EMBED command and this Ghostscript bug for a functioning example. Use absolute paths.

A better option would be to teach pikepdf how to embed files according to reference manual section 7.11.4, since this is would work without Ghostscript. OCRmyPDF will add pikepdf as dependency soon (I maintain both).

If you're able to do a PR for either I'd be happy to accept.

jbarlow83 added the enhancement label Apr 20, 2018

jbarlow83 added the help wanted label Apr 20, 2018

jbarlow83 removed the help wanted label Aug 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR PDF Attachments? #259

OCR PDF Attachments? #259

jmrichardson commented Apr 20, 2018

jbarlow83 commented Apr 20, 2018

OCR PDF Attachments? #259

OCR PDF Attachments? #259

Comments

jmrichardson commented Apr 20, 2018

jbarlow83 commented Apr 20, 2018