Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR PDF Attachments? #259

Open
jmrichardson opened this issue Apr 20, 2018 · 1 comment
Open

OCR PDF Attachments? #259

jmrichardson opened this issue Apr 20, 2018 · 1 comment

Comments

@jmrichardson
Copy link

Does/will OCRmyPDF support embedded documents//attachments in a portfolio? Thanks

@jbarlow83
Copy link
Collaborator

Not currently, and it's not planned any time soon, but I think you're second or third person to ask so there's some demand anyway. (See also #197)

I made some notes about how to go about doing this, whether it's useful to you for me as reference when I implement it:

Recently Ghostscript added PDF/A-3 so it's possible within Ghostscript. The current solution would be to modify the pdfmark file, named pdfa.ps, generated by ocrmypdf/pdfa.py, to include a step to embed the file insert according to the pdfmark specification:
– see page 30, for the /EMBED command and this Ghostscript bug for a functioning example. Use absolute paths.

A better option would be to teach pikepdf how to embed files according to reference manual section 7.11.4, since this is would work without Ghostscript. OCRmyPDF will add pikepdf as dependency soon (I maintain both).

If you're able to do a PR for either I'd be happy to accept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants