Fix urlsplit #13

malloxpb · 2018-06-14T14:30:47Z

For this PR, I implemented parsing based on what was implemented in GURL class :) #12

… fix_tests

… fix_split

malloxpb · 2018-06-14T21:03:42Z

Running the performance test gives 0.18sec for urlsplit and urljoin, which is expected 😄

lopuhin

Hey @nctl144 this looks great, I left some comments on possible code improvements

lopuhin · 2018-06-15T06:40:20Z

tests/test_urlparse.py

@@ -989,6 +1007,7 @@ def test_all(self):
        self.assertCountEqual(urlparse4.__all__, expected)


+@pytest.mark.skip(reason="We dont need this test")


If we're not going to need this in the future, it's fine to remove it completely

lopuhin · 2018-06-15T06:40:40Z

tests/test_urlparse.py

@@ -1137,7 +1156,7 @@ def test_unwrap(self):
        url = urlparse4._unwrap('<URL:type://host/path>')
        self.assertEqual(url, 'type://host/path')

-
+@pytest.mark.skip(reason="we dont need this test")


Same here - if we're not going to need this in the future, it's fine to remove it completely

lopuhin · 2018-06-15T06:43:06Z

urlparse4/cgurl.pyx

                if parsed.port.len > 0:
                    port = int(slice_component(url, parsed.port))
                    if port <= 65535:
                        return port

            elif prop == "username":
+                if decoded:
+                    return slice_component(url, parsed.username).decode('utf-8') or None


would be nice to avoid repeating slice_component(url, parsed.username) on this line and below

lopuhin · 2018-06-15T06:43:30Z

urlparse4/cgurl.pyx

                return slice_component(url, parsed.username) or None
            elif prop == "password":
+                if decoded:
+                    return slice_component(url, parsed.password).decode('utf-8') or None


would be nice to avoid repeating slice_component(...) on this line and below

lopuhin · 2018-06-15T06:43:44Z

urlparse4/cgurl.pyx

                return slice_component(url, parsed.password) or None
            elif prop == "hostname":
+                if decoded:
+                    return slice_component(url, parsed.host).lower().decode('utf-8')


would be nice to avoid repeating slice_component(...).lower() on this line and below

lopuhin · 2018-06-15T06:53:33Z

benchmarks/performance_test.py


-print("the total time is", total, "seconds")
+                a = urlsplit(url.encode())


I think it's better to move encoding of url outside of timed part, and specify encoding

lopuhin · 2018-06-15T06:54:21Z

benchmarks/performance_test.py


-        start = timer()
+try:
+    if argv[1] == "encode":


Maybe use argparse library instead? It's in stdlib both for python 2 and 3, and will make the code much nicer

lopuhin · 2018-06-15T06:57:16Z

benchmarks/performance_test.py

+    print("the urlsplit time with encode in python is", total / 5, "seconds")
+
+
+    total2 = 0


maybe urljoin_time and urlsplit_time would be better than total and total2

lopuhin · 2018-06-15T06:59:54Z

benchmarks/performance_test.py

@@ -1,17 +1,78 @@
 from urlparse4 import urlsplit, urljoin
 from timeit import default_timer as timer

-total = 0


Since this file is changes so heavily in this PR, maybe also move all code into some function, e.g. main, and call it at the end? There are two reasons:

with a function, it's more natural to add helper functions when the code grows, and we avoid accidentally using global variables

PyPy has much better optimizations for local function variables than for globals, so it would be more fair to measure performance inside a function

It makes sense! I will move all the code inside a main() func

lopuhin · 2018-06-15T07:01:52Z

urlparse4/cgurl.pyx

+            TO DO:
+            What do we return here
+            """
+            return False


will ParseStandardURL or ParsePathURL work here? If not, maybe raise an exception, and then in the python part we can either let it propogate, or catch it and use stdlib function instead?

I will let it use stdlib function since we don't want any weird behavior from this library for now 😄

malloxpb · 2018-06-15T15:58:19Z

Hey @lopuhin, thank you so much for the review. I have optimized the code as you have suggested. Can you take a look at it when you have time 😄

lopuhin · 2018-06-18T08:26:42Z

Looks great, thanks @nctl144 !

malloxpb added 30 commits June 11, 2018 14:12

fix failing test by importing

734e3d7

import more variables to not fail the test

e2abab6

test on py3 only for now

1ff8b35

decode the result to string on Python 3

31e224c

compile cython on py3

23c8b3d

fix failing test by importing

3d7eb0e

import more variables to not fail the test

95260b4

test on py3 only for now

790bd94

decode the result to string on Python 3

168830f

compile cython on py3

89a71b3

Merge branch 'fix_tests' of https://github.com/nctl144/urlparse4 into…

6b76ede

… fix_tests

do not change test cases

1699be0

decode result based on input types

978eada

compile in py3

3aa1b38

remove tests that are not related

af5eced

DRY code

c351048

compile in py3

5c28dd6

update the performance test

0924478

update the performance test

4165c4f

return based on input types (DRY)

55de674

recompile cython on py3

3a034c2

fix encoding urljoin without base

23e10ca

recompile cython

9a7d433

return username, password based on input

325b518

recompile cython py3

9d7f33b

reduce the performance test time

0974a1e

update mozilla files from chromium

8c97272

import func to compare scheme

f958e75

import scheme constants

66f4828

import the rest of the functions in urlparse

de26520

malloxpb added 10 commits June 14, 2018 15:01

fix return error

222a80c

recompile cython py3

dba7798

recompile cython py3

2de8884

Merge branch 'fix_split' of https://github.com/nctl144/urlparse4 into…

9cb095b

… fix_split

recompile cython, resolve merge conflicts

3effa97

mark as todo

0bc77f4

return result based on input type

8f160ae

compile cython

e0e4a97

skip mailto tests for now

3c5f21c

add note to discuss

3ece971

malloxpb mentioned this pull request Jun 14, 2018

Discuss the mailto: parsing correctness #15

Open

lopuhin reviewed Jun 15, 2018

View reviewed changes

malloxpb added 11 commits June 15, 2018 09:54

remove deprecation tests

507d427

avoid repetition

70da959

avoid repetition and indentation

455f810

shorten the code

37f39c4

recompile cython py3

61de690

rename variables

f9bbeb3

move code inside main func, move encode outside timer

eaec0a9

fallback to stdlib when failed to parse

9f1a6be

recompile cython

4b7c716

use argparse instead

89378bf

indentation

7f57de6

malloxpb added 2 commits June 15, 2018 12:52

std split url based on input type

5d66f3f

recompile Cython

409129f

malloxpb merged commit 3abf155 into master Jun 15, 2018

lopuhin deleted the fix_split branch June 18, 2018 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix urlsplit #13

Fix urlsplit #13

malloxpb commented Jun 14, 2018 •

edited

Loading

malloxpb commented Jun 14, 2018

lopuhin left a comment

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

lopuhin Jun 15, 2018

malloxpb Jun 15, 2018

lopuhin Jun 15, 2018

malloxpb Jun 15, 2018

malloxpb commented Jun 15, 2018

lopuhin commented Jun 18, 2018

		@@ -989,6 +1007,7 @@ def test_all(self):
		self.assertCountEqual(urlparse4.__all__, expected)


		@pytest.mark.skip(reason="We dont need this test")


		print("the total time is", total, "seconds")
		a = urlsplit(url.encode())

		print("the urlsplit time with encode in python is", total / 5, "seconds")


		total2 = 0

Fix urlsplit #13

Fix urlsplit #13

Conversation

malloxpb commented Jun 14, 2018 • edited Loading

malloxpb commented Jun 14, 2018

lopuhin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malloxpb commented Jun 15, 2018

lopuhin commented Jun 18, 2018

malloxpb commented Jun 14, 2018 •

edited

Loading