Unicode support for urlparse 4 #3

malloxpb · 2018-06-07T21:48:38Z

No description provided.

malloxpb · 2018-06-07T21:49:21Z

@lopuhin, this PR should be clearer than the one in the original repository 😄

lopuhin

@nctl144 I think this is a good start, but would be also great to add tests which check unicode support, and also it would be great to first add travis.ci and tox support to the repo before proceeding with more changes (but that's a topic for another PR) - it greatly simplifies testing and reviewing.

lopuhin · 2018-06-08T07:40:50Z

urlparse4/cgurl.pyx

-            slice_component(url, parsed.query),
-            slice_component(url, parsed.ref)
-        ))
+        if six.PY2:


Two points about this bit of code:

it would be nice to avoid duplication of code between python 2 and 3

I'm not sure unicode function is right here, from the docs: "Create a new Unicode object from the given encoded string. Encoding defaults to the current default string encoding." - I don't think we should rely on default encoding. Also, if we need to do encoding/decoding, it's more efficient to do this in Cython

Regarding returning unicode or not, stdlib urlparse functions can return unicode, so to me it looks more sane to always return unicode even on python 2, maybe it will also be easier to support. Although this might require some extra effort when doing scrapy integration, see e.g. scrapy/scrapy#1949 (comment)

malloxpb · 2018-06-08T15:52:52Z

You are right! I will set up tox and travis in the next PRs :D

… unicodesp

malloxpb added 5 commits June 7, 2018 14:51

testing unicode sp

74d7493

rebuild cython

1c389d9

indentation

f4e21ad

convert result to unicode on python2

fa7ba59

compile cython on py3

7509bf4

lopuhin reviewed Jun 8, 2018

View reviewed changes

malloxpb added 8 commits June 8, 2018 10:54

testing unicode sp

92dfbcc

rebuild cython

1819f78

indentation

7fb618a

convert result to unicode on python2

452a430

compile cython on py3

feebe43

Merge branch 'unicodesp' of https://github.com/nctl144/urlparse4 into…

70f9128

… unicodesp

decode in cython

65d927d

compile cython

13eb988

malloxpb merged commit 0ea4927 into master Jun 8, 2018

malloxpb mentioned this pull request Jun 8, 2018

Check the solution for supporting unicode #5

Closed

malloxpb deleted the unicodesp branch August 1, 2018 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support for urlparse 4 #3

Unicode support for urlparse 4 #3

malloxpb commented Jun 7, 2018

malloxpb commented Jun 7, 2018

lopuhin left a comment

lopuhin Jun 8, 2018

lopuhin Jun 8, 2018

malloxpb commented Jun 8, 2018

Unicode support for urlparse 4 #3

Unicode support for urlparse 4 #3

Conversation

malloxpb commented Jun 7, 2018

malloxpb commented Jun 7, 2018

lopuhin left a comment

Choose a reason for hiding this comment

lopuhin Jun 8, 2018

Choose a reason for hiding this comment

lopuhin Jun 8, 2018

Choose a reason for hiding this comment

malloxpb commented Jun 8, 2018