New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use a timeout for gs? #1010
Comments
We can do tesseract-timeout because it's still possible to produce a functional, mostly OCRed PDF if Tesseract fails on certain pages. But Ghostscript is a one-shot - it has to run to completion or we don't get a usable PDF. (Or in some cases, we can't produce the images Ghostscript needs.) For Ghostscript, if it fails to run to completion, we can't produce a functional PDF at all.
I assume it's a private file you won't share with me, but if f you run ocrmypdf with |
Thanks for the quick help and many hints. The gs call that hangs is for a single page:
I reduced my options to just one ( |
That's the problem... ocrmypdf picked too high of a rendering resolution for the file some reason. It tries to pick a resolution that will capture all details in the file. This is not a ghostscript problem. |
Encrypted for @jbarlow83 as documented in the Wiki. |
Would a maximum for the rendering resolution be a solution? |
I'm having exactly the same problem. In my case: gs hangs indefinitely, very high CPU usage ensues (100%). Is there any command line parameter that can fix this? I'm fine with setting a limit to DPI or just setting a fixed value. Log excerpt: |
@jbarlow83 Sorry for bothering you, but is there anything that can be done to prevent this from happening? |
I'm afraid the thing preventing anything from happening on this issue is that I'm too busy with other projects and a comprehensive resolution is not trivial. I am hoping to have time in late December. In the meantime if anyone wants to attempt a PR I'd be happy to help with that. |
@jbarlow83 Just made a contribution on Open Collective, it is not much and I do not expect anything but should you ever have the time and nerves it would be very help if you could take a look on this. Thanks a lot in any case for the awesome project! |
@jbarlow83 Any chance you could look into this? I sadly had to stop using ocrmypdf as it would put the server to 100% CPU with no way to prevent it from happening. Really any sort of fix (even if it cannot OCR the document and just exits) would be awesome! |
@hrst you can use the
|
@svenha Thanks for the tip, I had considered this but the problem is that it is hard to determine a good time amount for the timeout. I could retrieve the numbers of pages first and then dynamically set the timeout but I had the hopes of this getting resolved at some point. I might just use the timeout and not used ocrmypdf at all for any documents with a larger amount of pages. However, if I remember correctly ocrmypdf still used 100% CPU even when using the --jobs parameter, but I have to try that. |
Underlying issue fixed in v15 - I probably won't add a timeout on Ghostscript itself because it's difficult to say what a reasonable completion time is. Large documents on slow computers might fail when nothing was wrong. |
I am batch processing pdf files. Some files lead to a never (?, I gave up after 2 hours of cputime) ending gs job. I am looking for a counterpart of the option
--tesseract-timeout
, I guess. (Ubuntu 22.04, with packaged ocrmypdf.)The text was updated successfully, but these errors were encountered: