Add python 3 support #67

ruairif · 2014-11-10T13:20:59Z

No description provided.

pablohoffman · 2014-11-30T14:58:26Z

thanks for contributing this @ruairif. can someone review & merge this? /cc @kalessin @kmike @dangra

kmike · 2015-03-03T07:57:46Z

scrapely/__init__.py

-                if isinstance(value, str):
-                    value = value.decode(htmlpage.encoding or 'utf-8')
+                if isinstance(value, six.string_types):
+                    value = str_to_unicode(value, htmlpage.encoding)


I think we can avoid the indirection and make code more straightforward by replacing 'str' with 'bytes':

if isinstance(value, bytes): value = value.decode(htmlpage.encoding or 'utf8')

kmike · 2015-03-03T13:52:28Z

scrapely/__init__.py

-                if isinstance(value, str):
-                    value = value.decode(htmlpage.encoding or 'utf-8')
+                if isinstance(value, bytes):
+                    value = str_to_unicode(value, htmlpage.encoding)


Actually my main issue was str_to_unicode function :) There are many ways to write such function; with .decode() it is clear what's going on. But it could be just a style preference.

What about getting rid of the check and just using value = str_to_unicode(value, htmlpage.encoding)?

sounds good

As for the __iter__ check scrapy has something similar:
https://github.com/scrapy/scrapy/blob/master/scrapy/utils/misc.py#L17

Looks like it needs to be like this though:
if (isinstance(values, bytes) or isinstance(values, str) or not hasattr(values, '__iter__'))

Can you think or a better way of writing this? (Thenot hasattr('__iter__') deals with a unicode string in Python 2)

kmike · 2015-03-03T15:20:53Z

It looks good 👍
I think it can be merged after the following:

make sure __iter__ check is OK;
travis.yml should be fixed to run tests in Python 3.3 and 3.4;
I think it is better to add Python 3.3 to tox.

ruairif · 2015-03-03T17:06:42Z

Unicode in python2 has no 'iter' so if it's written as

if (isinstance(values, (bytes, str)) or
    not hasattr(values, '__iter__')):

The first check catches str in python2 and bytes, str/unicode in python3
The second check catches unicode and all non iterables in python2 and all non iterables in python3

kmike · 2015-03-03T17:18:33Z

@ruairif you're right, thanks for the explanation

ruairif · 2015-03-03T17:34:37Z

With the previous way it was using only python 2.7. What's the best way to set up travis for it?

kmike · 2015-03-03T17:39:06Z

It doesn't matter which Python version we use to start tox, leaving it equal to 2.7 is fine.

kmike · 2015-03-03T18:55:07Z

scrapely/extraction/regionextract.py

-                < labelled_element(first_region).end_index:
+        while (following_regions and
+               _int_cmp(labelled_element(following_regions[0]).start_index,
+                        labelled_element(first_region).end_index)):


I'm too picky today :) Just '_int_cmp' reads as 'equals?', not as 'less than'. What about removing the default value for 'op' argument of _int_cmp function? Or maybe you have other suggestions about how to make the code more readable?

kmike · 2015-03-03T19:54:54Z

Thanks @ruairif!

Add python 3 support

Add python 3 support

58f0886

kmike reviewed Mar 3, 2015
View reviewed changes

Replace six.string_type with bytes

834cf7e

kmike reviewed Mar 3, 2015
View reviewed changes

Update tox and travis for Py3. Fix string and iterable check

afcfdc1

Ruairi Fahy added 2 commits March 3, 2015 17:19

Style change for instance check

04be2b7

Update python versions for travil

c24087b

Remove unnecessary python versions

5481633

kmike reviewed Mar 3, 2015
View reviewed changes

Remove ambiguity in _int_cmp function

ffb4aae

kmike added a commit that referenced this pull request Mar 3, 2015

Merge pull request #67 from ruairif/master

0c4d10a

Add python 3 support

kmike merged commit 0c4d10a into scrapy:master Mar 3, 2015

kmike mentioned this pull request Mar 3, 2015

Python 3 support #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add python 3 support #67

Add python 3 support #67

ruairif commented Nov 10, 2014

pablohoffman commented Nov 30, 2014

kmike Mar 3, 2015

kmike Mar 3, 2015

ruairif Mar 3, 2015

kmike Mar 3, 2015

ruairif Mar 3, 2015

kmike commented Mar 3, 2015

ruairif commented Mar 3, 2015

kmike commented Mar 3, 2015

ruairif commented Mar 3, 2015

kmike commented Mar 3, 2015

kmike Mar 3, 2015

kmike commented Mar 3, 2015

Add python 3 support #67

Add python 3 support #67

Conversation

ruairif commented Nov 10, 2014

pablohoffman commented Nov 30, 2014

kmike Mar 3, 2015

Choose a reason for hiding this comment

kmike Mar 3, 2015

Choose a reason for hiding this comment

ruairif Mar 3, 2015

Choose a reason for hiding this comment

kmike Mar 3, 2015

Choose a reason for hiding this comment

ruairif Mar 3, 2015

Choose a reason for hiding this comment

kmike commented Mar 3, 2015

ruairif commented Mar 3, 2015

kmike commented Mar 3, 2015

ruairif commented Mar 3, 2015

kmike commented Mar 3, 2015

kmike Mar 3, 2015

Choose a reason for hiding this comment

kmike commented Mar 3, 2015