-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python 3 support #67
Conversation
if isinstance(value, str): | ||
value = value.decode(htmlpage.encoding or 'utf-8') | ||
if isinstance(value, six.string_types): | ||
value = str_to_unicode(value, htmlpage.encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid the indirection and make code more straightforward by replacing 'str' with 'bytes':
if isinstance(value, bytes):
value = value.decode(htmlpage.encoding or 'utf8')
if isinstance(value, str): | ||
value = value.decode(htmlpage.encoding or 'utf-8') | ||
if isinstance(value, bytes): | ||
value = str_to_unicode(value, htmlpage.encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my main issue was str_to_unicode function :) There are many ways to write such function; with .decode() it is clear what's going on. But it could be just a style preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about getting rid of the check and just using value = str_to_unicode(value, htmlpage.encoding)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the __iter__
check scrapy has something similar:
https://github.com/scrapy/scrapy/blob/master/scrapy/utils/misc.py#L17
Looks like it needs to be like this though:
if (isinstance(values, bytes) or isinstance(values, str) or not hasattr(values, '__iter__'))
Can you think or a better way of writing this? (Thenot hasattr('__iter__')
deals with a unicode string in Python 2)
It looks good 👍
|
Unicode in python2 has no 'iter' so if it's written as
The first check catches str in python2 and bytes, str/unicode in python3 |
@ruairif you're right, thanks for the explanation |
With the previous way it was using only python 2.7. What's the best way to set up travis for it? |
It doesn't matter which Python version we use to start tox, leaving it equal to 2.7 is fine. |
< labelled_element(first_region).end_index: | ||
while (following_regions and | ||
_int_cmp(labelled_element(following_regions[0]).start_index, | ||
labelled_element(first_region).end_index)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm too picky today :) Just '_int_cmp' reads as 'equals?', not as 'less than'. What about removing the default value for 'op' argument of _int_cmp function? Or maybe you have other suggestions about how to make the code more readable?
Thanks @ruairif! |
No description provided.