Skip to content
This repository has been archived by the owner on May 14, 2019. It is now read-only.

TypeError: decode() argument 1 must be string, not None #1

Closed
dvogel opened this issue Aug 28, 2012 · 1 comment
Closed

TypeError: decode() argument 1 must be string, not None #1

dvogel opened this issue Aug 28, 2012 · 1 comment
Labels

Comments

@dvogel
Copy link

dvogel commented Aug 28, 2012

Traceback (most recent call last):
File "/projects/pressley/src/pressley/pressley/releases/management/commands/scrape_releases.py", line 73, in handle
    self.scrape_releases(source)
File "/projects/pressley/src/pressley/pressley/releases/management/commands/scrape_releases.py", line 47, in scrape_releases
    scrape_release(source, feed, entry, link)
File "/projects/pressley/src/pressley/pressley/releases/scrape.py", line 26, in scrape_release
    body = get_link_content(link)
File "/projects/pressley/src/pressley/pressley/releases/scrape.py", line 14, in get_link_content
    (title, body) = readability_extract(response.content)
File "/projects/pressley/src/pressley/pressley/util.py", line 25, in readability_extract
    title_text = unicode(lxml.html.fromstring(doc.short_title()).text_content())
File "/projects/pressley/virt/local/lib/python2.7/site-packages/readability/readability.py", line 124, in short_title
    return shorten_title(self._html(True))
File "/projects/pressley/virt/local/lib/python2.7/site-packages/readability/readability.py", line 104, in _html
    self.html = self._parse(self.input)
File "/projects/pressley/virt/local/lib/python2.7/site-packages/readability/readability.py", line 108, in _parse
    doc = build_doc(input)
File "/projects/pressley/virt/local/lib/python2.7/site-packages/readability/htmls.py", line 17, in build_doc
    page_unicode = page.decode(enc, 'replace')
TypeError: decode() argument 1 must be string, not None
@dvogel
Copy link
Author

dvogel commented Aug 28, 2012

This is caused by PDF links like http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9MTQ2MDYxfENoaWxkSUQ9LTF8VHlwZT0z&t=1. Need to check the response content-type header.

@dvogel dvogel closed this as completed Aug 28, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant