Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix urlsplit #13

Merged
merged 73 commits into from
Jun 15, 2018
Merged

Fix urlsplit #13

merged 73 commits into from
Jun 15, 2018

Conversation

malloxpb
Copy link
Member

@malloxpb malloxpb commented Jun 14, 2018

For this PR, I implemented parsing based on what was implemented in GURL class :) #12

@malloxpb
Copy link
Member Author

Running the performance test gives 0.18sec for urlsplit and urljoin, which is expected 😄

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nctl144 this looks great, I left some comments on possible code improvements

@@ -989,6 +1007,7 @@ def test_all(self):
self.assertCountEqual(urlparse4.__all__, expected)


@pytest.mark.skip(reason="We dont need this test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not going to need this in the future, it's fine to remove it completely

@@ -1137,7 +1156,7 @@ def test_unwrap(self):
url = urlparse4._unwrap('<URL:type://host/path>')
self.assertEqual(url, 'type://host/path')


@pytest.mark.skip(reason="we dont need this test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - if we're not going to need this in the future, it's fine to remove it completely

if parsed.port.len > 0:
port = int(slice_component(url, parsed.port))
if port <= 65535:
return port

elif prop == "username":
if decoded:
return slice_component(url, parsed.username).decode('utf-8') or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to avoid repeating slice_component(url, parsed.username) on this line and below

return slice_component(url, parsed.username) or None
elif prop == "password":
if decoded:
return slice_component(url, parsed.password).decode('utf-8') or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to avoid repeating slice_component(...) on this line and below

return slice_component(url, parsed.password) or None
elif prop == "hostname":
if decoded:
return slice_component(url, parsed.host).lower().decode('utf-8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to avoid repeating slice_component(...).lower() on this line and below


print("the total time is", total, "seconds")
a = urlsplit(url.encode())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to move encoding of url outside of timed part, and specify encoding


start = timer()
try:
if argv[1] == "encode":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use argparse library instead? It's in stdlib both for python 2 and 3, and will make the code much nicer

print("the urlsplit time with encode in python is", total / 5, "seconds")


total2 = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe urljoin_time and urlsplit_time would be better than total and total2

@@ -1,17 +1,78 @@
from urlparse4 import urlsplit, urljoin
from timeit import default_timer as timer

total = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is changes so heavily in this PR, maybe also move all code into some function, e.g. main, and call it at the end? There are two reasons:

  • with a function, it's more natural to add helper functions when the code grows, and we avoid accidentally using global variables
  • PyPy has much better optimizations for local function variables than for globals, so it would be more fair to measure performance inside a function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense! I will move all the code inside a main() func

TO DO:
What do we return here
"""
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will ParseStandardURL or ParsePathURL work here? If not, maybe raise an exception, and then in the python part we can either let it propogate, or catch it and use stdlib function instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let it use stdlib function since we don't want any weird behavior from this library for now 😄

@malloxpb
Copy link
Member Author

Hey @lopuhin, thank you so much for the review. I have optimized the code as you have suggested. Can you take a look at it when you have time 😄

@malloxpb malloxpb merged commit 3abf155 into master Jun 15, 2018
@lopuhin
Copy link
Member

lopuhin commented Jun 18, 2018

Looks great, thanks @nctl144 !

@lopuhin lopuhin deleted the fix_split branch June 18, 2018 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants