Skip to content

Specifying integer values in the data dict #16

@buzypi

Description

@buzypi

Amazing work! This is really useful.

I ran into a minor issue with the way you provide data. The documentation does not say you can't provide integer values, so I ended up providing this data:

In [1]: from scrapely import Scraper

In [2]: s = Scraper()

In [3]: data = {'name': 'scrapy/scrapely', 'url': 'https://github.com/scrapy/scrapely', 'description': 'A pure-python HTML screen-scraping library', 'watchers': 42, 'forks': 9}

In [4]: url = "https://github.com/scrapy/scrapely"

and ran into this exception:

In [5]: s.train(url, data)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

...

/home/ubuntu/scrapely/scrapely/template.py in func(fragment, page)
     93     def func(fragment, page):
     94         fdata = page.fragment_data(fragment).strip()
---> 95         if text in fdata:
     96             return float(len(text)) / len(fdata) - (1e-6 * fragment.start)
     97         else:

TypeError: 'in <string>' requires string as left operand

It took me a while to realize what the issue was, it was with the integer values in the data variable.

So, you can either make it all unicode string:

if unicode(text) in fdata:
    return float(len(unicode(text))) / len(fdata) - (1e-6 * fragment.start)

or specify in the documentation that values should all be strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions