Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Python 3 port of urlgrabber #8

Closed
wants to merge 17 commits into from

Conversation

brejoc
Copy link
Contributor

@brejoc brejoc commented Nov 13, 2018

With these changes it is possible to use urlgrabber with Python 3, while still be Python 2 compatible.

Please see #7.

@brejoc
Copy link
Contributor Author

brejoc commented Nov 13, 2018

Feedback is very welcome!

meaksh and others added 3 commits January 17, 2019 11:26
Contents length with non-ascii chars was off by one. Returning the
buffer length instead fixes that.
setup.py Outdated Show resolved Hide resolved
self.assertEquals(self.code, 503); del self.code
self.assertEquals([e.exception.errno for e in err], [5])
# self.assertEquals(self.code, 503)
# del self.code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's broken here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, we got multiple errors here and the 503 was not in self.code. But I'd have to re-check that.

@@ -52,4 +52,4 @@
'Zdenek Pavlas <zpavlas@redhat.com>'
__url__ = 'http://urlgrabber.baseurl.org/'

from grabber import urlgrab, urlopen, urlread
from .grabber import urlgrab, urlopen, urlread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all source files to consistently use Python 3 style absolute imports.

from __future__ import absolute_import

Copy link
Contributor Author

@brejoc brejoc Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll check that.

except getopt.GetoptError, e:
print >>sys.stderr, "Error:", e
except getopt.GetoptError as e:
print("Error:", e, file=sys.stderr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure everything using print() has the __future__ import so that it disables the legacy behavior.

from __future__ import print_function

meaksh and others added 3 commits January 23, 2019 09:04
Since we can't rely on everybody using utf-8, we are now doing an
auto-detection with chardet.
  * grabber_fix.diff
  * python-urlgrabber-3.9.1-preserve-queryparams-in-urls.patch
  * declare-dollar-sign-as-safe-in-urlquote.patch
  * python-urlgrabber-3.9.1-set-SSL_VERIFYHOST-correct.dif
@Conan-Kudo
Copy link
Member

@brejoc Please don't merge a bunch of independent changes into a single commit like that.

If you want to do unrelated things, please structure like so:

  • Commit for porting to Python 3
  • Commit for bug fix
  • Commit for feature enhancement

and so on.

We need to be able to understand the changeset after it's merged too, which is why I'm asking this.

@brejoc
Copy link
Contributor Author

brejoc commented Jan 30, 2019

@brejoc Please don't merge a bunch of independent changes into a single commit like that.

We need to be able to understand the changeset after it's merged too, which is why I'm asking this.

Yeah, that wasn't very good. Sorry about that!

Python2 only code was introduced with the addition of one of the SUSE patches.
This is now Python version agnostic again.
Copy link
Contributor

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, it is best to run each fixer (e.g. 2to3 -f raise -nw .) as a separate step, and commit such automatic changes as separate commits. It seems that the raise syntax changes were done manually and errors were introduced. My recommendation would be start with running all fixers that make useful changes (at least raise and except) as separate commits, and then apply the other changes on top.

repr fixer proposes what this patch does, but it's arguably the wrong thing to do. Too verbose and ugly. I suggest using %r instead.

@@ -5,3 +5,6 @@ build
*.kdev*
*.kateproject
ipython.log*

# virtualenv
sandbox/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/sandbox/* ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the folder we are most of the times using for virtualenv internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A leading slash may be used to only match the folder if it appears in this directory, and not any of the subdirectories. Unless you actually want to match the same name in subdirectories, it's generally recommended to always lead with a slash in .gitignore.

raise ValueError, "no such test method in %s: %s" % \
(self.__class__, methodName)
raise(ValueError, "no such test method in %s: %s" % \
(self.__class__, methodName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very wrong. raise is not a function, so it does not need parentheses, and they should not be used in this misleading way.

Something like this:

 raise ValueError("no such test method in %s: %s" %
                  (self.__class__, methodName))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I hope I'll be able to take a look again this week. To be hones I don't remember where this is coming from. Let me check this.

@@ -361,15 +370,15 @@ def _exc_info(self):

def fail(self, msg=None):
"""Fail immediately, with the given message."""
raise self.failureException, msg
raise(self.failureException, msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noooooo

raise self.failureException(msg)

It usually is not necessary to do such changes by hand. Please run 2to3-3.6 -f raise -nw . in a clean repo, and commit that as a separate commit.


def failIf(self, expr, msg=None):
"Fail the test if the expression is true."
if expr: raise self.failureException, msg
if expr: raise(self.failureException, msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise


def failUnless(self, expr, msg=None):
"""Fail the test unless the expression is true."""
if not expr: raise self.failureException, msg
if not expr: raise(self.failureException, msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

raise self.failureException, \
(msg or '%s == %s' % (`first`, `second`))
raise(self.failureException, \
(msg or '%s == %s' % (repr(first), repr(second))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise is wrong, as above. Also, %r should be used instead:

... '%r == %r' % (first, second)

if self.sortTestMethodsUsing:
testFnNames.sort(self.sortTestMethodsUsing)
# if self.sortTestMethodsUsing:
# testFnNames.sort(key=self.sortTestMethodsUsing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@@ -737,7 +752,8 @@ def startSuite(self, suite):
if self.showAll and self.descriptions:
self.stream.write(self.indent * self.depth)
try: desc = self.getDescription(suite)
except AttributeError: desc = '(no description)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to leave try as it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But at that time the focus was on getting it running, not refactoring the code - which has it's own style in various places.

from urllib2 import ftpwrapper as urllib_ftpwrapper
from urllib2 import splitport
except ImportError:
# Python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the Python3 imports should be placed first, and the deprecated names tried only as fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can agree on that. Thanks!


DEBUG = None

try:
try:
from cStringIO import StringIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same for those imports: io should be tried first.

With the Python3 addition binary file download was broken. This re-adds
the possibility to do that.
urlgrabber now differenciates between binary and text downloads. Binary
downloads are handled via BytesIO and text downloads via StringIO. To
look up the "content-type" a HEAD-request is performed.
Moves content type detection and initialization to its own method and
adds detection for prior initialization of self.fo by urlgrab function.
@hroncok
Copy link

hroncok commented Feb 6, 2019

@keszybz makes good points about porting... those are all documented at https://portingguide.readthedocs.io/en/latest/process.html#port-the-code - not sure if you are aware of this piece of doc, so sharing it for a reference.

@brejoc
Copy link
Contributor Author

brejoc commented Feb 6, 2019

Thanks, I'll bookmark that link. And yes, and that's roughly what we did here. In the first phase we leveraged 2to3 and in the second phase we tried to iron some things out to make it actually work in Python2 and Python3. The tests where our indicator. While the results might not be the cleanest code on earth, we do have to keep in mind that the starting base also wasn't. The focus was on getting it running and not refactoring it. I'm definitely willing to improve this PR, but one thing should also be clear: I don't have the time and resources to completely redo this. If this would be needed, I'd have to close this PR. My constraints simply wouldn't allow that at the moment.

Right now we are focusing on the integration in the repo sync mechanism of SUSE Manager and some additional issues came up that are partly already fixed or will be fixed in the coming days. After that we can talk about getting this PR into a better shape. I guess chat would be a better tool to do that. @Conan-Kudo, will you be my contact for this endeavor?

@Conan-Kudo
Copy link
Member

@brejoc That's fine. What I meant was that the commits should represent a logical working change. I don't really care too much if you make the underlying code cleaner right now. But the idea here is I need to be able to make sense of what you're changing and why.

Does that make sense?

@keszybz keszybz mentioned this pull request Feb 12, 2019
@Conan-Kudo
Copy link
Member

This is now done with the 4.0.0 release. Tarballs coming soon!

@Conan-Kudo Conan-Kudo closed this Feb 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants