Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PDF rasterisation safety #1083

Closed
wants to merge 2 commits into from

Conversation

sihil
Copy link

@sihil sihil commented Mar 20, 2023

As discussed in #514 I stumbled across an issue where by a file processed by OCRmyPDF would result in corrupted output (some output pages would be blank where the input was not).

This PR makes a couple of changes to fix this and make the behaviour of OCRmyPDF generally safer:

  • Adds the fonts-droid-fallback package to the Dockerfile to ensure that GhostScript's fallback font is present
  • Adds the PDFSTOPONERROR option to the GhostScript command when rasterising a page of a PDF so that OCRmyPDF fails fast rather than continuing and outputting a bad PDF

This is required when rasterising certain PDFs.
When missing GhostScript was outputting a blank page.
Without this option GhostScript will ignore errors when rasterising
pages and cause OCRmyPDF to output corrupted files.
With this option the error bubbles up and OCRmyPDF will fail.
@sihil sihil changed the title Improve PDF rasterisation Improve PDF rasterisation safety Mar 20, 2023
@jbarlow83 jbarlow83 closed this in dbe6148 Jun 3, 2023
@sihil
Copy link
Author

sihil commented Jun 9, 2023

Thanks very much for getting this into the codebase @jbarlow83 - much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant