OCRmyPDF fails to detect text on pages created by Tesseract 3.04 #26

jbarlow83 · 2015-12-04T10:18:11Z

Due to bug(s) in PyPDF's extractText, which does not find text OCR'ed by Tesseract 3.04.

There are probably other cases.

jbarlow83 · 2015-12-04T12:44:39Z

Improved v3.1, but not all imaginable cases fixed

jbarlow83 · 2016-01-04T21:33:39Z

Related issue: tesseract-ocr/tesseract#182

jbarlow83 · 2016-01-04T21:33:58Z

Also related: tesseract-ocr/tesseract#170

jbarlow83 · 2016-02-17T09:25:25Z

Fixed in Tesseract now

jbarlow83 added the bug label Dec 4, 2015

jbarlow83 changed the title ~~OCRmyPDF fails to detect text on pages after Tesseract 3.04~~ OCRmyPDF fails to detect text on pages created by Tesseract 3.04 Dec 4, 2015

jbarlow83 closed this as completed Feb 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCRmyPDF fails to detect text on pages created by Tesseract 3.04 #26

OCRmyPDF fails to detect text on pages created by Tesseract 3.04 #26

jbarlow83 commented Dec 4, 2015

jbarlow83 commented Dec 4, 2015

jbarlow83 commented Jan 4, 2016

jbarlow83 commented Jan 4, 2016

jbarlow83 commented Feb 17, 2016

OCRmyPDF fails to detect text on pages created by Tesseract 3.04 #26

OCRmyPDF fails to detect text on pages created by Tesseract 3.04 #26

Comments

jbarlow83 commented Dec 4, 2015

jbarlow83 commented Dec 4, 2015

jbarlow83 commented Jan 4, 2016

jbarlow83 commented Jan 4, 2016

jbarlow83 commented Feb 17, 2016