New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented support for basic authentication #38
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
import re | ||
import sys | ||
import urllib | ||
import urllib2 | ||
|
||
import mozinfo | ||
|
||
|
@@ -50,7 +51,8 @@ class Scraper(object): | |
"""Generic class to download an application from the Mozilla server""" | ||
|
||
def __init__(self, directory, version, platform=None, | ||
application='firefox', locale='en-US', extension=None): | ||
application='firefox', locale='en-US', extension=None, | ||
authentication=None): | ||
|
||
# Private properties for caching | ||
self._target = None | ||
|
@@ -61,6 +63,7 @@ def __init__(self, directory, version, platform=None, | |
self.platform = platform or self.detect_platform() | ||
self.version = version | ||
self.extension = extension or DEFAULT_FILE_EXTENSIONS[self.platform] | ||
self.authentication = authentication | ||
|
||
# build the base URL | ||
self.application = application | ||
|
@@ -170,11 +173,28 @@ def download(self): | |
|
||
print 'Downloading build: %s' % (urllib.unquote(self.final_url)) | ||
tmp_file = self.target + ".part" | ||
urllib.urlretrieve(self.final_url, tmp_file) | ||
|
||
if self.authentication \ | ||
and self.authentication['username'] \ | ||
and self.authentication['password']: | ||
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() | ||
password_mgr.add_password(None, | ||
self.final_url, | ||
self.authentication['username'], | ||
self.authentication['password']) | ||
handler = urllib2.HTTPBasicAuthHandler(password_mgr) | ||
opener = urllib2.build_opener(urllib2.HTTPHandler, handler) | ||
urllib2.install_opener(opener) | ||
|
||
r = urllib2.urlopen(self.final_url) | ||
CHUNK = 16 * 1024 | ||
with open(tmp_file, 'wb') as f: | ||
for chunk in iter(lambda: r.read(CHUNK), ''): | ||
f.write(chunk) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Have found the same on stackoverflow. Ii just wonder if 16KB is the right amount of data we want to read-in for each iteration or if we should better go with 64KB, which I think is a better size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Out of interest, what is this based on? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previous experience. But it's a couple of years back so things could have changed. Values too small could cause an overhead while larger values could block the thread. I think we can leave it for now and figure out performance improvements later. |
||
os.rename(tmp_file, self.target) | ||
except: | ||
try: | ||
if tmp_file: | ||
if os.path.isfile(tmp_file): | ||
os.remove(tmp_file) | ||
except OSError: | ||
pass | ||
|
@@ -672,6 +692,16 @@ def cli(): | |
metavar='EXTENSION', | ||
help='File extension of the build (e.g. "zip"), default:\ | ||
the standard build extension on the platform.') | ||
parser.add_option('--username', | ||
dest='username', | ||
default=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default is only necessary for non There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do this in four other places so I was going for consistency. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So please remove all of them so that we are doing it right. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps a later refactor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Works for me. |
||
metavar='USERNAME', | ||
help='Username for basic HTTP authentication.') | ||
parser.add_option('--password', | ||
dest='password', | ||
default=None, | ||
metavar='PASSWORD', | ||
help='Password for basic HTTP authentication.') | ||
|
||
# Option group for candidate builds | ||
group = OptionGroup(parser, "Candidate builds", | ||
|
@@ -727,7 +757,11 @@ def cli(): | |
'platform': options.platform, | ||
'version': options.version, | ||
'directory': options.directory, | ||
'extension': options.extension} | ||
'extension': options.extension, | ||
'authentication': { | ||
'username': options.username, | ||
'password': options.password} | ||
} | ||
scraper_options = {'candidate': { | ||
'build_number': options.build_number, | ||
'no_unsigned': options.no_unsigned}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we still need urllib because of unquote()? Is there no replacement in urllib2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urllib2 uses urllib itself to do unquote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't answer my question why we need urllib in our file regarding the unquote() call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not making this clear enough. Why would we remove urllib if we're still using it? Given that urllib2 uses unquote from urllib itself, I don't see urllib2 as a direct replacement for urllib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so the only existing code which makes use of this method is:
print 'Downloading build: %s' % (urllib.unquote(self.final_url))
. I don't think we can get around it. So it should be fine to leave it in for now. But if you have an idea...