fixed skip-big when there are no images in pdf #152
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This happens when you are using --skip-big argument, and having a pdf without images, and only text.
In this case, the dic pageinfo is missing "width_pixels" and "height_pixels", causing the following exception:
ERROR - Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions register_cleanup, touch_files_only) File "/usr/local/lib/python3.5/dist-packages/ruffus/task.py", line 567, in job_wrapper_io_files ret_val = user_defined_work_func(*params) File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/pipeline.py", line 294, in split_pages '.ocr.page.pdf' if is_ocr_required(pageinfo, log, options) \ File "/usr/local/lib/python3.5/dist-packages/ocrmypdf/pipeline.py", line 251, in is_ocr_required pixel_count = pageinfo['width_pixels'] * pageinfo['height_pixels'] KeyError: 'width_pixels'