Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

tesseract: broken pipe ⇒ Falling back on Paperwork's heuristic #392

Closed
tYYGH opened this issue Sep 2, 2015 · 8 comments
Closed

tesseract: broken pipe ⇒ Falling back on Paperwork's heuristic #392

tYYGH opened this issue Sep 2, 2015 · 8 comments

Comments

@tYYGH
Copy link

tYYGH commented Sep 2, 2015

I had once again orientation detection failing, and so I decided to ditch the Archlinux package, and install paperwork/stable from git as explained here:
https://github.com/jflesch/paperwork/blob/stable/doc/install.devel.markdown#paperwork-in-a-python-virtualenv
Unfortunately, this did not solve the issue. Here are the relevant logs:

INFO   paperwork.frontend.mainwindow.scan Scan started
INFO   paperwork.frontend.util.canvas.drawers Drawer: Target area: (105, 0) ((527, 727)) << (105, 0) ((526, 727))
INFO   paperwork.frontend.mainwindow.scan Scan done
INFO   paperwork.frontend.mainwindow.scan Will use tool 'Tesseract'
INFO   paperwork.frontend.mainwindow.scan Animator: Angle 0: (105, 0) (526, 727) -> (73, 20) (222, 323)
INFO   paperwork.frontend.mainwindow.scan Animator: Angle 90: (105, 0) (526, 727) -> (441, 20) (222, 323)
INFO   paperwork.frontend.mainwindow.scan Animator: Angle 180: (105, 0) (526, 727) -> (73, 384) (222, 323)
INFO   paperwork.frontend.mainwindow.scan Animator: Angle 270: (105, 0) (526, 727) -> (441, 384) (222, 323)
INFO   paperwork.frontend.mainwindow.scan Failed to use OCR tool heuristic for orientation detection: [Errno 32] Relais bris? (pipe)
INFO   paperwork.frontend.mainwindow.scan Falling back on Paperwork's heuristic

The tesseract packages installed on my system are:
tesseract 3.04.00-1
tesseract-data-deu 3.02.02-5
tesseract-data-eng 3.02.02-5
tesseract-data-fra 3.02.02-5
and I use “fra” in Paperwork.

@jflesch jflesch added this to the 0.2.5-stable milestone Sep 2, 2015
@jflesch
Copy link
Member

jflesch commented Sep 2, 2015

This is more a Pyocr bug than Paperwork but ok. I'll have a look, probably this week-end.

@jflesch
Copy link
Member

jflesch commented Sep 9, 2015

Got it: It's a regression in Tesseract 3.04 (not present in 3.03).

Since Tesseract 3.x, it's possible to specify "stdin" as input for Tesseract. It's handy since it avoids using temporary files in PyOCR/Paperwork.

Example:

$ tesseract stdin out_file < test.png
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
$ echo $?
0

Unfortunately, with Tesseract 3.04, when trying to detect the orientation of the image, it tries to open the file 'stdin' ... :

$ tesseract -psm 0 stdin out_file < test.png 
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Error in fopenReadStream: file not found
Error in pixRead: image file not found: stdin
Cannot open input file: stdin
$ echo $?
2

@jflesch
Copy link
Member

jflesch commented Sep 9, 2015

I'll open a ticket on Tesseract side later. In the meantime, I guess since the argument 'stdin' is not reliable, I'll change PyOCR so it uses a temporary file ... :/

@jflesch
Copy link
Member

jflesch commented Sep 9, 2015

Confirmed: tesseract-ocr/tesseract#85

@jflesch
Copy link
Member

jflesch commented Sep 10, 2015

Fixed in PyOCR.

You can run sudo pip install --upgrade pyocr, it should solve this problem.

Please reopen this ticket if you still have a problem.

@jflesch jflesch closed this as completed Sep 10, 2015
@tYYGH
Copy link
Author

tYYGH commented Sep 11, 2015

Please forgive the newbie question… What is the proper way to upgrade when the install was done following the instructions there:
https://github.com/jflesch/paperwork/blob/stable/doc/install.devel.markdown#paperwork-in-a-python-virtualenv
Should I run the above command after having run source bin/activate, or not?

@jflesch
Copy link
Member

jflesch commented Sep 11, 2015

I actually don't work with virtualenv usually, but I think so, yes.

@tYYGH
Copy link
Author

tYYGH commented Sep 14, 2015

I tested. The issue is solved. Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants