Don't through exception when HTML title is empty #1

yaph · 2013-01-24T23:02:14Z

When extracting content from an HTML file with an empty title an exception occurs, see traceback below. To avoid this I added a call to filter to remove possible None values from data_values in the cleanup function. I also added a test.

Traceback (most recent call last):
  File "tests/tests.py", line 123, in test_empty_title
    extracted = self.extractor.extract(EMPTY_TITLE_HTML)
  File "/home/ramiro/repos/pub/bookmark-tools/local/lib/python2.7/site-packages/extraction/__init__.py", line 248, in extract
    return self.extracted_class(**self.cleanup(extracted, html, source_url=source_url))
  File "/home/ramiro/repos/pub/bookmark-tools/local/lib/python2.7/site-packages/extraction/__init__.py", line 211, in cleanup
    data_values = [self.cleanup_text(x) for x in data_values]
  File "/home/ramiro/repos/pub/bookmark-tools/local/lib/python2.7/site-packages/extraction/__init__.py", line 183, in cleanup_text
    return " ".join(value.split())
AttributeError: 'NoneType' object has no attribute 'split'

…sure 1st h1 (if exists) is used as title

lethain · 2013-01-25T03:44:30Z

Awesome! Much appreciated, merging it in.

Don't through exception when HTML title is empty

Merge pull request #1 from lethain/svven

Don't through exception when HTML title is empty, added test to make …

87bbaf1

…sure 1st h1 (if exists) is used as title

lethain added a commit that referenced this pull request Jan 25, 2013

Merge pull request #1 from yaph/develop

6bdedc0

Don't through exception when HTML title is empty

lethain merged commit 6bdedc0 into lethain:master Jan 25, 2013

lethain pushed a commit that referenced this pull request Jun 6, 2014

Merge pull request #1 from lethain/svven

a34cd2b

Merge pull request #1 from lethain/svven

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't through exception when HTML title is empty #1

Don't through exception when HTML title is empty #1

yaph commented Jan 24, 2013

lethain commented Jan 25, 2013

Don't through exception when HTML title is empty #1

Don't through exception when HTML title is empty #1

Conversation

yaph commented Jan 24, 2013

lethain commented Jan 25, 2013