TypeError: object of type 'PDFObjRef' has no len() #15

PonteIneptique · 2014-06-23T09:14:06Z

I think this time it is your python and not pdfminer. (Let's hope ?) File available here

Traceback (most recent call last):
  File "lltToJson.py", line 521, in <module>
    main(sys.argv[1:])
  File "lltToJson.py", line 494, in main
    occurences = llt.getFolder()
  File "lltToJson.py", line 227, in getFolder
    occurences[identifier] += self.getFile(join(path,f))
  File "lltToJson.py", line 164, in getFile
    pdf.load()
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 288, in load
    self.tree = self.get_tree(*_flatten(page_numbers))
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 365, in get_tree
    root.set(k, smart_unicode_decode(v))
  File "/usr/local/lib/python2.7/dist-packages/pdfquery/pdfquery.py", line 89, in smart_unicode_decode
    detected_encoding = chardet.detect(encoded_string)
  File "/usr/lib/python2.7/dist-packages/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/usr/lib/python2.7/dist-packages/chardet/universaldetector.py", line 64, in feed
    aLen = len(aBuf)
TypeError: object of type 'PDFObjRef' has no len()

jcushman · 2014-06-30T01:44:57Z

Yep, this was a pdfquery bug -- thanks for the report. I just pushed v. 0.2.5 which should fix this.

Note: In the file you posted, once this is fixed, there is a separate bug where the text in the file prints like "(cid:5)(cid:152)(cid:150)(cid:150)(cid:152)(cid:141) ..." instead of readable text. This is a pdfminer issue which other people are discussing here: euske/pdfminer#39

If you have the "cid" problem, you should first try uninstalling pdfminer and manually installing as described here: http://www.unixuser.org/~euske/python/pdfminer/#cmap

If that doesn't work, it'll have to be fixed on the PDFMiner end (if at all).

Thanks,
Jack

PonteIneptique · 2014-06-30T13:28:24Z

Thanks ;) I am happy my use of your python helped you to make sure it will take care of a better range of malformed pdf :D

jcushman closed this as completed Jun 30, 2014

jezhiggins mentioned this issue Jul 3, 2014

TypeError: object of type 'PSLiteral' has no len() #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: object of type 'PDFObjRef' has no len() #15

TypeError: object of type 'PDFObjRef' has no len() #15

PonteIneptique commented Jun 23, 2014

jcushman commented Jun 30, 2014

PonteIneptique commented Jun 30, 2014

TypeError: object of type 'PDFObjRef' has no len() #15

TypeError: object of type 'PDFObjRef' has no len() #15

Comments

PonteIneptique commented Jun 23, 2014

jcushman commented Jun 30, 2014

PonteIneptique commented Jun 30, 2014