Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python 3 support #67

Merged
merged 7 commits into from
Mar 3, 2015
Merged

Add python 3 support #67

merged 7 commits into from
Mar 3, 2015

Conversation

ruairif
Copy link
Collaborator

@ruairif ruairif commented Nov 10, 2014

No description provided.

@pablohoffman
Copy link
Member

thanks for contributing this @ruairif. can someone review & merge this? /cc @kalessin @kmike @dangra

if isinstance(value, str):
value = value.decode(htmlpage.encoding or 'utf-8')
if isinstance(value, six.string_types):
value = str_to_unicode(value, htmlpage.encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid the indirection and make code more straightforward by replacing 'str' with 'bytes':

if isinstance(value, bytes):
    value = value.decode(htmlpage.encoding or 'utf8')

if isinstance(value, str):
value = value.decode(htmlpage.encoding or 'utf-8')
if isinstance(value, bytes):
value = str_to_unicode(value, htmlpage.encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my main issue was str_to_unicode function :) There are many ways to write such function; with .decode() it is clear what's going on. But it could be just a style preference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about getting rid of the check and just using value = str_to_unicode(value, htmlpage.encoding)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the __iter__ check scrapy has something similar:
https://github.com/scrapy/scrapy/blob/master/scrapy/utils/misc.py#L17

Looks like it needs to be like this though:
if (isinstance(values, bytes) or isinstance(values, str) or not hasattr(values, '__iter__'))

Can you think or a better way of writing this? (Thenot hasattr('__iter__') deals with a unicode string in Python 2)

@kmike
Copy link
Member

kmike commented Mar 3, 2015

It looks good 👍
I think it can be merged after the following:

  1. make sure __iter__ check is OK;
  2. travis.yml should be fixed to run tests in Python 3.3 and 3.4;
  3. I think it is better to add Python 3.3 to tox.

@ruairif
Copy link
Collaborator Author

ruairif commented Mar 3, 2015

Unicode in python2 has no 'iter' so if it's written as

if (isinstance(values, (bytes, str)) or
    not hasattr(values, '__iter__')):

The first check catches str in python2 and bytes, str/unicode in python3
The second check catches unicode and all non iterables in python2 and all non iterables in python3

@kmike
Copy link
Member

kmike commented Mar 3, 2015

@ruairif you're right, thanks for the explanation

@ruairif
Copy link
Collaborator Author

ruairif commented Mar 3, 2015

With the previous way it was using only python 2.7. What's the best way to set up travis for it?

@kmike
Copy link
Member

kmike commented Mar 3, 2015

It doesn't matter which Python version we use to start tox, leaving it equal to 2.7 is fine.

< labelled_element(first_region).end_index:
while (following_regions and
_int_cmp(labelled_element(following_regions[0]).start_index,
labelled_element(first_region).end_index)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm too picky today :) Just '_int_cmp' reads as 'equals?', not as 'less than'. What about removing the default value for 'op' argument of _int_cmp function? Or maybe you have other suggestions about how to make the code more readable?

@kmike
Copy link
Member

kmike commented Mar 3, 2015

Thanks @ruairif!

kmike added a commit that referenced this pull request Mar 3, 2015
@kmike kmike merged commit 0c4d10a into scrapy:master Mar 3, 2015
@kmike kmike mentioned this pull request Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants