Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: object of type 'PDFObjRef' has no len() #15

Closed
PonteIneptique opened this issue Jun 23, 2014 · 2 comments
Closed

TypeError: object of type 'PDFObjRef' has no len() #15

PonteIneptique opened this issue Jun 23, 2014 · 2 comments

Comments

@PonteIneptique
Copy link

I think this time it is your python and not pdfminer. (Let's hope ?) File available here

Traceback (most recent call last):
  File "lltToJson.py", line 521, in <module>
    main(sys.argv[1:])
  File "lltToJson.py", line 494, in main
    occurences = llt.getFolder()
  File "lltToJson.py", line 227, in getFolder
    occurences[identifier] += self.getFile(join(path,f))
  File "lltToJson.py", line 164, in getFile
    pdf.load()
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 288, in load
    self.tree = self.get_tree(*_flatten(page_numbers))
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 365, in get_tree
    root.set(k, smart_unicode_decode(v))
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 89, in smart_unicode_decode
    detected_encoding = chardet.detect(encoded_string)
  File "/usr/lib/python2.7/dist-packages/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/usr/lib/python2.7/dist-packages/chardet/universaldetector.py", line 64, in feed
    aLen = len(aBuf)
TypeError: object of type 'PDFObjRef' has no len()
@jcushman
Copy link
Owner

Yep, this was a pdfquery bug -- thanks for the report. I just pushed v. 0.2.5 which should fix this.

Note: In the file you posted, once this is fixed, there is a separate bug where the text in the file prints like "(cid:5)(cid:152)(cid:150)(cid:150)(cid:152)(cid:141) ..." instead of readable text. This is a pdfminer issue which other people are discussing here: euske/pdfminer#39

If you have the "cid" problem, you should first try uninstalling pdfminer and manually installing as described here: http://www.unixuser.org/~euske/python/pdfminer/#cmap

If that doesn't work, it'll have to be fixed on the PDFMiner end (if at all).

Thanks,
Jack

@PonteIneptique
Copy link
Author

Thanks ;) I am happy my use of your python helped you to make sure it will take care of a better range of malformed pdf :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants