-
Notifications
You must be signed in to change notification settings - Fork 149
tesseract: broken pipe ⇒ Falling back on Paperwork's heuristic #392
Comments
This is more a Pyocr bug than Paperwork but ok. I'll have a look, probably this week-end. |
Got it: It's a regression in Tesseract 3.04 (not present in 3.03). Since Tesseract 3.x, it's possible to specify "stdin" as input for Tesseract. It's handy since it avoids using temporary files in PyOCR/Paperwork. Example:
Unfortunately, with Tesseract 3.04, when trying to detect the orientation of the image, it tries to open the file 'stdin' ... :
|
I'll open a ticket on Tesseract side later. In the meantime, I guess since the argument 'stdin' is not reliable, I'll change PyOCR so it uses a temporary file ... :/ |
Confirmed: tesseract-ocr/tesseract#85 |
Fixed in PyOCR. You can run Please reopen this ticket if you still have a problem. |
Please forgive the newbie question… What is the proper way to upgrade when the install was done following the instructions there: |
I actually don't work with virtualenv usually, but I think so, yes. |
I tested. The issue is solved. Thank you. |
I had once again orientation detection failing, and so I decided to ditch the Archlinux package, and install paperwork/stable from git as explained here:
https://github.com/jflesch/paperwork/blob/stable/doc/install.devel.markdown#paperwork-in-a-python-virtualenv
Unfortunately, this did not solve the issue. Here are the relevant logs:
The tesseract packages installed on my system are:
tesseract 3.04.00-1
tesseract-data-deu 3.02.02-5
tesseract-data-eng 3.02.02-5
tesseract-data-fra 3.02.02-5
and I use “fra” in Paperwork.
The text was updated successfully, but these errors were encountered: