New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] --deskew
not compatible with blank pages or with tesseract_timeout = 0
#1049
Comments
I want to deskew PDFs without running OCR.The --deskew option is not working. Ocrmypdf v14.0.4. |
@jbarlow83 Any thoughts on the above? :) |
Fixed as of v14.1.0 |
Hi @jbarlow83, With v14.3.0, I am still encountering that problem:
Adding
Version information:
Sample file: As pointed out by @azogue in #868 (comment), d48254d seems not to cover sample pages like this. (I am experiencing this error quite frequently when duplex scanning incoming mail when one, e.g. the last, sheet has an empty back page.) Any chance to address this? Happy to provider further information or test solutions / workarounds. |
You are also using the OpenCL version of Tesseract. Although I haven't checked in recently, when I last looked it was quite unstable and not something anyone should expect to work reliably. As you can see from the commits, I'm relying on Tesseract to provide consistent error messages, and it seems to not do that for OpenCL. What happens if you use regular Tesseract? |
Thanks for the quick response. Using the tesseract version (currently at version 5.3.0) mentioned in the tesseract online docu - and build without the OpenCL flag, though not packaged for openSUSE 15.5, but 15.4 - does indeed work without a I will stick to that for the time being... |
Describe the bug
The
--deskew
option is not behaving as expected on Ocrmypdf 13.7.0. I am experiencing two issues related to deskew.Issue 1: Deskew not working on blank pages
I'm using the following options
--output-type=pdf --tesseract-timeout=30
on this blank_image.pdf.When I run the Ocrmypdf command above, I get a
SubprocessOutputError
. I see that issue is referenced here: #868, but I don't think the bug fix covered all scenarios.Issue 2: Deskew not working with tesseract_timeout=0
I want to deskew PDFs without running OCR on them, as mentioned in the docs here. However, when
--tesseract-timeout=0
, the document is not being deskewed because OCR is not being run. If I change--tesseract-timeout
to a different integer, it successfully deskews. Here is a skewed PDF that can be used to reproduce the issue: skewed_text.pdfTo Reproduce
Issue 1: Use blank_image.pdf and run
ocrmypdf --deskew --output-type=pdf --tesseract-timeout=30 blank_image.pdf result.pdf
.Issue2: Use skewed_text.pdf and run
ocrmypdf --deskew --output-type=pdf --tesseract-timeout=0 skewed_text.pdf result_pdf
.Expected behavior
I expect that blank pages do not completely block the ocrmypdf command from running. It should be able to gracefully handle the error and skip deskewing that specific page.
I expect that with
--tesseract_timeout=0
the page can be deskewed without having OCR applied.Screenshots
If applicable, add screenshots to help explain your problem.
Deskew with 0 second timeout:
Deskew with 30 second timeout:
System (please complete the following information):
Installation
brew install ocrmypdf
The text was updated successfully, but these errors were encountered: