Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError when training with zero-length data #49

haywhisksoftware opened this issue Feb 4, 2014 · 4 comments

ZeroDivisionError when training with zero-length data #49

haywhisksoftware opened this issue Feb 4, 2014 · 4 comments


Copy link

(Minor bug.)
I installed scrapely from pip this morning.

This is a wacky edge case, but I think you could raise a more constructive error.

(Who wants to extract a zero-length string from a document? It's a bit like a magician pulling some atmosphere out of a hat: it's always going to be there...)

Check it out:

In [97]: from scrapely import Scraper

In [98]: s = Scraper()

In [99]: s.train('', {'image': u''})
- - - - - - - - - - - - - - - - -
ZeroDivisionError                         Traceback (most recent call last)
/home/username/myfolder/<ipython-input-99-233d0ac90e7f> in <module>()
----> 1 s.train('', {'image': u''})

/usr/local/lib/python2.7/dist-packages/scrapely/__init__.pyc in train(self, url, data, encoding)
     44     def train(self, url, data, encoding=None):
     45         page = url_to_page(url, encoding)
---> 46         self.train_from_htmlpage(page, data)
     48     def scrape(self, url, encoding=None):

/usr/local/lib/python2.7/dist-packages/scrapely/__init__.pyc in train_from_htmlpage(self, htmlpage, data)
     39                 if isinstance(value, str):
     40                     value = value.decode(htmlpage.encoding or 'utf-8')
---> 41                 tm.annotate(field, best_match(value))
     42         self.add_template(tm.get_template())

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in annotate(self, field, score_func, best_match)
     32         """
---> 33         indexes =
     34         if not indexes:
     35             raise FragmentNotFound("Fragment not found annotating %r using: %s" % 

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in select(self, score_func)
     46         matches = []
     47         for i, fragment in enumerate(htmlpage.parsed_body):
---> 48             score = score_func(fragment, htmlpage)
     49             if score:
     50                 matches.append((score, i))

/usr/local/lib/python2.7/dist-packages/scrapely/template.pyc in func(fragment, page)
     95         fdata = page.fragment_data(fragment).strip()
     96         if text in fdata:
---> 97             return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
     98         else:
     99             return 0.0

ZeroDivisionError: float division by zero
@pablohoffman pablohoffman changed the title Train with zero-length expected text -> ZeroDivisionError ZeroDivisionError when training with zero-length data Apr 25, 2014
Copy link

This is the reason for the error.

return float(len(text)) / len(fdata) - (1e-6 * fragment.start)

If the float that is being returned is inversely proportional to length of fdata, can we just write this.?

fdata = page.fragment_data(fragment).strip()
if text in fdata:
    if not len(fdata):
        return float("inf")
    return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
    return 0.0
return func

Copy link

This isn't a wacky edge-case at all.

I got the same error using actual data and had to patch it.

Copy link

Same here, I reproduced this error using regular, non-empty data.

marekyggdrasil added a commit to marekyggdrasil/scrapely that referenced this issue Nov 27, 2019
ruairif added a commit that referenced this issue Nov 28, 2019
patch for issue #49 and fixed Travis tests
Copy link

the patch has been merged, I believe this issue can be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

4 participants