issue with extract_text #5

rsteca · 2016-08-30T20:58:07Z

When doing:

import doc2text
doc = doc2text.Document()
doc.read('something.pdf')
doc.process()
doc.extract_text()

I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-5-57184997370d> in <module>()
----> 1 doc.extract_text()

/usr/local/lib/python2.7/dist-packages/doc2text/__init__.pyc in extract_text(self)
     89             for page in self.processed_pages:
     90                 new = page
---> 91                 text = new.extract_text()
     92                 self.page_content.append(text)
     93         else:

/usr/local/lib/python2.7/dist-packages/doc2text/page.pyc in extract_text(self)
     36     def extract_text(self):
     37         temp_path = 'text_temp.png'
---> 38         cv2.imwrite(temp_path, self.image)
     39         self.text = pytesseract.image_to_string(Image.open(temp_path))
     40         os.remove(temp_path)

AttributeError: Page instance has no attribute 'image'

The text was updated successfully, but these errors were encountered:

achikin · 2016-08-30T22:14:13Z

fixed in #6

jlsutherland closed this as completed Aug 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with extract_text #5

issue with extract_text #5

rsteca commented Aug 30, 2016

achikin commented Aug 30, 2016

issue with extract_text #5

issue with extract_text #5

Comments

rsteca commented Aug 30, 2016

achikin commented Aug 30, 2016