[WIP] Python 3 port of urlgrabber #8

brejoc · 2018-11-13T09:29:41Z

With these changes it is possible to use urlgrabber with Python 3, while still be Python 2 compatible.

Please see #7.

brejoc · 2018-11-13T09:36:58Z

Feedback is very welcome!

Contents length with non-ascii chars was off by one. Returning the buffer length instead fixes that.

setup.py

Conan-Kudo · 2019-01-20T17:42:40Z

test/test_mirror.py

-        self.assertEquals(self.code, 503); del self.code
+        self.assertEquals([e.exception.errno for e in err], [5])
+        # self.assertEquals(self.code, 503)
+        # del self.code


What's broken here?

If I remember correctly, we got multiple errors here and the 503 was not in self.code. But I'd have to re-check that.

Conan-Kudo · 2019-01-20T17:43:22Z

urlgrabber/__init__.py

@@ -52,4 +52,4 @@
              'Zdenek Pavlas <zpavlas@redhat.com>'
 __url__     = 'http://urlgrabber.baseurl.org/'

-from grabber import urlgrab, urlopen, urlread
+from .grabber import urlgrab, urlopen, urlread


Please change all source files to consistently use Python 3 style absolute imports.

from __future__ import absolute_import

Thanks, I'll check that.

Conan-Kudo · 2019-01-20T17:44:20Z

scripts/urlgrabber

-        except getopt.GetoptError, e:
-            print >>sys.stderr, "Error:", e
+        except getopt.GetoptError as e:
+            print("Error:", e, file=sys.stderr)


Please ensure everything using print() has the __future__ import so that it disables the legacy behavior.

from __future__ import print_function

This reverts commit 5ced2a3.

Since we can't rely on everybody using utf-8, we are now doing an auto-detection with chardet.

* grabber_fix.diff * python-urlgrabber-3.9.1-preserve-queryparams-in-urls.patch * declare-dollar-sign-as-safe-in-urlquote.patch * python-urlgrabber-3.9.1-set-SSL_VERIFYHOST-correct.dif

Conan-Kudo · 2019-01-29T14:58:21Z

@brejoc Please don't merge a bunch of independent changes into a single commit like that.

If you want to do unrelated things, please structure like so:

Commit for porting to Python 3
Commit for bug fix
Commit for feature enhancement

and so on.

We need to be able to understand the changeset after it's merged too, which is why I'm asking this.

brejoc · 2019-01-30T15:48:04Z

@brejoc Please don't merge a bunch of independent changes into a single commit like that.

…

We need to be able to understand the changeset after it's merged too, which is why I'm asking this.

Yeah, that wasn't very good. Sorry about that!

Python2 only code was introduced with the addition of one of the SUSE patches. This is now Python version agnostic again.

keszybz

In my experience, it is best to run each fixer (e.g. 2to3 -f raise -nw .) as a separate step, and commit such automatic changes as separate commits. It seems that the raise syntax changes were done manually and errors were introduced. My recommendation would be start with running all fixers that make useful changes (at least raise and except) as separate commits, and then apply the other changes on top.

repr fixer proposes what this patch does, but it's arguably the wrong thing to do. Too verbose and ugly. I suggest using %r instead.

keszybz · 2019-02-04T16:52:32Z

.gitignore

@@ -5,3 +5,6 @@ build
 *.kdev*
 *.kateproject
 ipython.log*
+
+# virtualenv
+sandbox/*


/sandbox/* ?

Yes, that's the folder we are most of the times using for virtualenv internally.

A leading slash may be used to only match the folder if it appears in this directory, and not any of the subdirectories. Unless you actually want to match the same name in subdirectories, it's generally recommended to always lead with a slash in .gitignore.

keszybz · 2019-02-04T16:55:26Z

test/munittest.py

-            raise ValueError, "no such test method in %s: %s" % \
-                  (self.__class__, methodName)
+            raise(ValueError, "no such test method in %s: %s" % \
+                  (self.__class__, methodName))


That is very wrong. raise is not a function, so it does not need parentheses, and they should not be used in this misleading way.

Something like this:

raise ValueError("no such test method in %s: %s" % (self.__class__, methodName))

Thanks, I hope I'll be able to take a look again this week. To be hones I don't remember where this is coming from. Let me check this.

keszybz · 2019-02-04T16:56:41Z

test/munittest.py

@@ -361,15 +370,15 @@ def _exc_info(self):

    def fail(self, msg=None):
        """Fail immediately, with the given message."""
-        raise self.failureException, msg
+        raise(self.failureException, msg)


Noooooo

raise self.failureException(msg)

It usually is not necessary to do such changes by hand. Please run 2to3-3.6 -f raise -nw . in a clean repo, and commit that as a separate commit.

keszybz · 2019-02-04T16:56:54Z

test/munittest.py


    def failIf(self, expr, msg=None):
        "Fail the test if the expression is true."
-        if expr: raise self.failureException, msg
+        if expr: raise(self.failureException, msg)


keszybz · 2019-02-04T16:57:29Z

test/munittest.py


    def failUnless(self, expr, msg=None):
        """Fail the test unless the expression is true."""
-        if not expr: raise self.failureException, msg
+        if not expr: raise(self.failureException, msg)


keszybz · 2019-02-04T17:01:45Z

test/munittest.py

-            raise self.failureException, \
-                  (msg or '%s == %s' % (`first`, `second`))
+            raise(self.failureException, \
+                  (msg or '%s == %s' % (repr(first), repr(second))))


raise is wrong, as above. Also, %r should be used instead:

... '%r == %r' % (first, second)

keszybz · 2019-02-04T17:03:05Z

test/munittest.py

-        if self.sortTestMethodsUsing:
-            testFnNames.sort(self.sortTestMethodsUsing)
+        # if self.sortTestMethodsUsing:
+        #     testFnNames.sort(key=self.sortTestMethodsUsing)


keszybz · 2019-02-04T17:03:29Z

test/munittest.py

@@ -737,7 +752,8 @@ def startSuite(self, suite):
        if self.showAll and self.descriptions:
            self.stream.write(self.indent * self.depth)
            try: desc = self.getDescription(suite)
-            except AttributeError: desc = '(no description)'


It seems strange to leave try as it was.

Agreed. But at that time the focus was on getting it running, not refactoring the code - which has it's own style in various places.

keszybz · 2019-02-04T17:05:13Z

urlgrabber/byterange.py

+    from urllib2 import ftpwrapper as urllib_ftpwrapper
+    from urllib2 import splitport
+except ImportError:
+    # Python3


IMHO the Python3 imports should be placed first, and the deprecated names tried only as fallback.

Yeah, we can agree on that. Thanks!

keszybz · 2019-02-04T17:05:34Z

urlgrabber/byterange.py


 DEBUG = None

-try:    
+try:
    from cStringIO import StringIO


The same for those imports: io should be tried first.

With the Python3 addition binary file download was broken. This re-adds the possibility to do that. urlgrabber now differenciates between binary and text downloads. Binary downloads are handled via BytesIO and text downloads via StringIO. To look up the "content-type" a HEAD-request is performed.

Moves content type detection and initialization to its own method and adds detection for prior initialization of self.fo by urlgrab function.

hroncok · 2019-02-06T15:59:13Z

@keszybz makes good points about porting... those are all documented at https://portingguide.readthedocs.io/en/latest/process.html#port-the-code - not sure if you are aware of this piece of doc, so sharing it for a reference.

brejoc · 2019-02-06T18:57:10Z

Thanks, I'll bookmark that link. And yes, and that's roughly what we did here. In the first phase we leveraged 2to3 and in the second phase we tried to iron some things out to make it actually work in Python2 and Python3. The tests where our indicator. While the results might not be the cleanest code on earth, we do have to keep in mind that the starting base also wasn't. The focus was on getting it running and not refactoring it. I'm definitely willing to improve this PR, but one thing should also be clear: I don't have the time and resources to completely redo this. If this would be needed, I'd have to close this PR. My constraints simply wouldn't allow that at the moment.

Right now we are focusing on the integration in the repo sync mechanism of SUSE Manager and some additional issues came up that are partly already fixed or will be fixed in the coming days. After that we can talk about getting this PR into a better shape. I guess chat would be a better tool to do that. @Conan-Kudo, will you be my contact for this endeavor?

Conan-Kudo · 2019-02-06T19:00:40Z

@brejoc That's fine. What I meant was that the commits should represent a logical working change. I don't really care too much if you make the underlying code cleaner right now. But the idea here is I need to be able to make sense of what you're changing and why.

Does that make sense?

Conan-Kudo · 2019-02-25T12:30:23Z

This is now done with the 4.0.0 release. Tarballs coming soon!

brejoc added 8 commits October 5, 2018 14:37

Adds virtualenv to gitignore

924f2e7

2to3 code changes and making it work again in Python2

5d8d122

Python 3 porting

ad20579

Reached test failure parity with unmodified master

0c2f357

Removing pudb break points

98a07e3

Removes white-space

e5618d1

2to3 for scripts

59301cf

Removes unneeded white space

a86e2cc

brejoc mentioned this pull request Nov 14, 2018

Python3: Alternative for python-urlgrabber cobbler/cobbler#1931

Closed

meaksh and others added 3 commits January 17, 2019 11:26

Fix setup.py to be compatible with Python 3

8a5f2f0

Do not install the urlgrabber-ext-down script

5ced2a3

Adds fix for utf8 unicode issue

4b4600f

Contents length with non-ascii chars was off by one. Returning the buffer length instead fixes that.

Conan-Kudo requested changes Jan 20, 2019

View reviewed changes

meaksh and others added 3 commits January 23, 2019 09:04

Revert "Do not install the urlgrabber-ext-down script"

8da3c30

This reverts commit 5ced2a3.

Adds encoding detection for buffer

569e7ea

Since we can't rely on everybody using utf-8, we are now doing an auto-detection with chardet.

Incorprates the SUSE patches

1f56ca1

* grabber_fix.diff * python-urlgrabber-3.9.1-preserve-queryparams-in-urls.patch * declare-dollar-sign-as-safe-in-urlquote.patch * python-urlgrabber-3.9.1-set-SSL_VERIFYHOST-correct.dif

Making _join_url() Python3 compatible again

8b39a6d

Python2 only code was introduced with the addition of one of the SUSE patches. This is now Python version agnostic again.

keszybz suggested changes Feb 4, 2019

View reviewed changes

brejoc added 2 commits February 5, 2019 14:44

Adds detection for prior initialization by urlgrab

b7b3532

Moves content type detection and initialization to its own method and adds detection for prior initialization of self.fo by urlgrab function.

keszybz mentioned this pull request Feb 12, 2019

Python3 compatibility #9

Merged

Conan-Kudo closed this Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Python 3 port of urlgrabber #8

[WIP] Python 3 port of urlgrabber #8

brejoc commented Nov 13, 2018

brejoc commented Nov 13, 2018

Conan-Kudo Jan 20, 2019

brejoc Jan 23, 2019

Conan-Kudo Jan 20, 2019

brejoc Jan 23, 2019 •

edited

Conan-Kudo Jan 20, 2019

Conan-Kudo commented Jan 29, 2019

brejoc commented Jan 30, 2019 •

edited

keszybz left a comment

keszybz Feb 4, 2019

brejoc Feb 5, 2019

keszybz Feb 5, 2019

keszybz Feb 4, 2019

brejoc Feb 5, 2019

keszybz Feb 4, 2019

keszybz Feb 4, 2019

keszybz Feb 4, 2019

keszybz Feb 4, 2019

keszybz Feb 4, 2019

keszybz Feb 4, 2019

brejoc Feb 5, 2019

keszybz Feb 4, 2019

brejoc Feb 5, 2019

keszybz Feb 4, 2019

hroncok commented Feb 6, 2019

brejoc commented Feb 6, 2019

Conan-Kudo commented Feb 6, 2019

Conan-Kudo commented Feb 25, 2019

[WIP] Python 3 port of urlgrabber #8

[WIP] Python 3 port of urlgrabber #8

Conversation

brejoc commented Nov 13, 2018

brejoc commented Nov 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brejoc Jan 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Conan-Kudo commented Jan 29, 2019

brejoc commented Jan 30, 2019 • edited

keszybz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hroncok commented Feb 6, 2019

brejoc commented Feb 6, 2019

Conan-Kudo commented Feb 6, 2019

Conan-Kudo commented Feb 25, 2019

brejoc Jan 23, 2019 •

edited

brejoc commented Jan 30, 2019 •

edited