New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Python 3 port of urlgrabber #8
Conversation
Feedback is very welcome! |
Contents length with non-ascii chars was off by one. Returning the buffer length instead fixes that.
self.assertEquals(self.code, 503); del self.code | ||
self.assertEquals([e.exception.errno for e in err], [5]) | ||
# self.assertEquals(self.code, 503) | ||
# del self.code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's broken here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, we got multiple errors here and the 503 was not in self.code
. But I'd have to re-check that.
@@ -52,4 +52,4 @@ | |||
'Zdenek Pavlas <zpavlas@redhat.com>' | |||
__url__ = 'http://urlgrabber.baseurl.org/' | |||
|
|||
from grabber import urlgrab, urlopen, urlread | |||
from .grabber import urlgrab, urlopen, urlread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change all source files to consistently use Python 3 style absolute imports.
from __future__ import absolute_import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll check that.
except getopt.GetoptError, e: | ||
print >>sys.stderr, "Error:", e | ||
except getopt.GetoptError as e: | ||
print("Error:", e, file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure everything using print()
has the __future__
import so that it disables the legacy behavior.
from __future__ import print_function
This reverts commit 5ced2a3.
Since we can't rely on everybody using utf-8, we are now doing an auto-detection with chardet.
* grabber_fix.diff * python-urlgrabber-3.9.1-preserve-queryparams-in-urls.patch * declare-dollar-sign-as-safe-in-urlquote.patch * python-urlgrabber-3.9.1-set-SSL_VERIFYHOST-correct.dif
@brejoc Please don't merge a bunch of independent changes into a single commit like that. If you want to do unrelated things, please structure like so:
and so on. We need to be able to understand the changeset after it's merged too, which is why I'm asking this. |
Yeah, that wasn't very good. Sorry about that! |
Python2 only code was introduced with the addition of one of the SUSE patches. This is now Python version agnostic again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, it is best to run each fixer (e.g. 2to3 -f raise -nw .
) as a separate step, and commit such automatic changes as separate commits. It seems that the raise
syntax changes were done manually and errors were introduced. My recommendation would be start with running all fixers that make useful changes (at least raise
and except
) as separate commits, and then apply the other changes on top.
repr
fixer proposes what this patch does, but it's arguably the wrong thing to do. Too verbose and ugly. I suggest using %r
instead.
@@ -5,3 +5,6 @@ build | |||
*.kdev* | |||
*.kateproject | |||
ipython.log* | |||
|
|||
# virtualenv | |||
sandbox/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sandbox/*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the folder we are most of the times using for virtualenv internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A leading slash may be used to only match the folder if it appears in this directory, and not any of the subdirectories. Unless you actually want to match the same name in subdirectories, it's generally recommended to always lead with a slash in .gitignore
.
test/munittest.py
Outdated
raise ValueError, "no such test method in %s: %s" % \ | ||
(self.__class__, methodName) | ||
raise(ValueError, "no such test method in %s: %s" % \ | ||
(self.__class__, methodName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very wrong. raise
is not a function, so it does not need parentheses, and they should not be used in this misleading way.
Something like this:
raise ValueError("no such test method in %s: %s" %
(self.__class__, methodName))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I hope I'll be able to take a look again this week. To be hones I don't remember where this is coming from. Let me check this.
test/munittest.py
Outdated
@@ -361,15 +370,15 @@ def _exc_info(self): | |||
|
|||
def fail(self, msg=None): | |||
"""Fail immediately, with the given message.""" | |||
raise self.failureException, msg | |||
raise(self.failureException, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noooooo
raise self.failureException(msg)
It usually is not necessary to do such changes by hand. Please run 2to3-3.6 -f raise -nw .
in a clean repo, and commit that as a separate commit.
test/munittest.py
Outdated
|
||
def failIf(self, expr, msg=None): | ||
"Fail the test if the expression is true." | ||
if expr: raise self.failureException, msg | ||
if expr: raise(self.failureException, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise
test/munittest.py
Outdated
|
||
def failUnless(self, expr, msg=None): | ||
"""Fail the test unless the expression is true.""" | ||
if not expr: raise self.failureException, msg | ||
if not expr: raise(self.failureException, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
test/munittest.py
Outdated
raise self.failureException, \ | ||
(msg or '%s == %s' % (`first`, `second`)) | ||
raise(self.failureException, \ | ||
(msg or '%s == %s' % (repr(first), repr(second)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise
is wrong, as above. Also, %r
should be used instead:
... '%r == %r' % (first, second)
if self.sortTestMethodsUsing: | ||
testFnNames.sort(self.sortTestMethodsUsing) | ||
# if self.sortTestMethodsUsing: | ||
# testFnNames.sort(key=self.sortTestMethodsUsing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@@ -737,7 +752,8 @@ def startSuite(self, suite): | |||
if self.showAll and self.descriptions: | |||
self.stream.write(self.indent * self.depth) | |||
try: desc = self.getDescription(suite) | |||
except AttributeError: desc = '(no description)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange to leave try
as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. But at that time the focus was on getting it running, not refactoring the code - which has it's own style in various places.
from urllib2 import ftpwrapper as urllib_ftpwrapper | ||
from urllib2 import splitport | ||
except ImportError: | ||
# Python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the Python3 imports should be placed first, and the deprecated names tried only as fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can agree on that. Thanks!
|
||
DEBUG = None | ||
|
||
try: | ||
try: | ||
from cStringIO import StringIO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same for those imports: io
should be tried first.
With the Python3 addition binary file download was broken. This re-adds the possibility to do that. urlgrabber now differenciates between binary and text downloads. Binary downloads are handled via BytesIO and text downloads via StringIO. To look up the "content-type" a HEAD-request is performed.
Moves content type detection and initialization to its own method and adds detection for prior initialization of self.fo by urlgrab function.
@keszybz makes good points about porting... those are all documented at https://portingguide.readthedocs.io/en/latest/process.html#port-the-code - not sure if you are aware of this piece of doc, so sharing it for a reference. |
Thanks, I'll bookmark that link. And yes, and that's roughly what we did here. In the first phase we leveraged 2to3 and in the second phase we tried to iron some things out to make it actually work in Python2 and Python3. The tests where our indicator. While the results might not be the cleanest code on earth, we do have to keep in mind that the starting base also wasn't. The focus was on getting it running and not refactoring it. I'm definitely willing to improve this PR, but one thing should also be clear: I don't have the time and resources to completely redo this. If this would be needed, I'd have to close this PR. My constraints simply wouldn't allow that at the moment. Right now we are focusing on the integration in the repo sync mechanism of SUSE Manager and some additional issues came up that are partly already fixed or will be fixed in the coming days. After that we can talk about getting this PR into a better shape. I guess chat would be a better tool to do that. @Conan-Kudo, will you be my contact for this endeavor? |
@brejoc That's fine. What I meant was that the commits should represent a logical working change. I don't really care too much if you make the underlying code cleaner right now. But the idea here is I need to be able to make sense of what you're changing and why. Does that make sense? |
This is now done with the 4.0.0 release. Tarballs coming soon! |
With these changes it is possible to use urlgrabber with Python 3, while still be Python 2 compatible.
Please see #7.