-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix urlsplit #13
Fix urlsplit #13
Conversation
Running the performance test gives 0.18sec for urlsplit and urljoin, which is expected 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nctl144 this looks great, I left some comments on possible code improvements
tests/test_urlparse.py
Outdated
@@ -989,6 +1007,7 @@ def test_all(self): | |||
self.assertCountEqual(urlparse4.__all__, expected) | |||
|
|||
|
|||
@pytest.mark.skip(reason="We dont need this test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not going to need this in the future, it's fine to remove it completely
tests/test_urlparse.py
Outdated
@@ -1137,7 +1156,7 @@ def test_unwrap(self): | |||
url = urlparse4._unwrap('<URL:type://host/path>') | |||
self.assertEqual(url, 'type://host/path') | |||
|
|||
|
|||
@pytest.mark.skip(reason="we dont need this test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - if we're not going to need this in the future, it's fine to remove it completely
urlparse4/cgurl.pyx
Outdated
if parsed.port.len > 0: | ||
port = int(slice_component(url, parsed.port)) | ||
if port <= 65535: | ||
return port | ||
|
||
elif prop == "username": | ||
if decoded: | ||
return slice_component(url, parsed.username).decode('utf-8') or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to avoid repeating slice_component(url, parsed.username)
on this line and below
urlparse4/cgurl.pyx
Outdated
return slice_component(url, parsed.username) or None | ||
elif prop == "password": | ||
if decoded: | ||
return slice_component(url, parsed.password).decode('utf-8') or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to avoid repeating slice_component(...)
on this line and below
urlparse4/cgurl.pyx
Outdated
return slice_component(url, parsed.password) or None | ||
elif prop == "hostname": | ||
if decoded: | ||
return slice_component(url, parsed.host).lower().decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to avoid repeating slice_component(...).lower()
on this line and below
benchmarks/performance_test.py
Outdated
|
||
print("the total time is", total, "seconds") | ||
a = urlsplit(url.encode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to move encoding of url outside of timed part, and specify encoding
benchmarks/performance_test.py
Outdated
|
||
start = timer() | ||
try: | ||
if argv[1] == "encode": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use argparse
library instead? It's in stdlib both for python 2 and 3, and will make the code much nicer
benchmarks/performance_test.py
Outdated
print("the urlsplit time with encode in python is", total / 5, "seconds") | ||
|
||
|
||
total2 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe urljoin_time
and urlsplit_time
would be better than total
and total2
@@ -1,17 +1,78 @@ | |||
from urlparse4 import urlsplit, urljoin | |||
from timeit import default_timer as timer | |||
|
|||
total = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this file is changes so heavily in this PR, maybe also move all code into some function, e.g. main
, and call it at the end? There are two reasons:
- with a function, it's more natural to add helper functions when the code grows, and we avoid accidentally using global variables
- PyPy has much better optimizations for local function variables than for globals, so it would be more fair to measure performance inside a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense! I will move all the code inside a main() func
urlparse4/cgurl.pyx
Outdated
TO DO: | ||
What do we return here | ||
""" | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will ParseStandardURL
or ParsePathURL
work here? If not, maybe raise an exception, and then in the python part we can either let it propogate, or catch it and use stdlib function instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let it use stdlib function since we don't want any weird behavior from this library for now 😄
Hey @lopuhin, thank you so much for the review. I have optimized the code as you have suggested. Can you take a look at it when you have time 😄 |
Looks great, thanks @nctl144 ! |
For this PR, I implemented parsing based on what was implemented in GURL class :) #12