Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented support for basic authentication #38

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
42 changes: 38 additions & 4 deletions mozdownload/scraper.py
Expand Up @@ -13,6 +13,7 @@
import re
import sys
import urllib
import urllib2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we still need urllib because of unquote()? Is there no replacement in urllib2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urllib2 uses urllib itself to do unquote.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't answer my question why we need urllib in our file regarding the unquote() call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not making this clear enough. Why would we remove urllib if we're still using it? Given that urllib2 uses unquote from urllib itself, I don't see urllib2 as a direct replacement for urllib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so the only existing code which makes use of this method is: print 'Downloading build: %s' % (urllib.unquote(self.final_url)). I don't think we can get around it. So it should be fine to leave it in for now. But if you have an idea...


import mozinfo

Expand Down Expand Up @@ -50,7 +51,8 @@ class Scraper(object):
"""Generic class to download an application from the Mozilla server"""

def __init__(self, directory, version, platform=None,
application='firefox', locale='en-US', extension=None):
application='firefox', locale='en-US', extension=None,
authentication=None):

# Private properties for caching
self._target = None
Expand All @@ -61,6 +63,7 @@ def __init__(self, directory, version, platform=None,
self.platform = platform or self.detect_platform()
self.version = version
self.extension = extension or DEFAULT_FILE_EXTENSIONS[self.platform]
self.authentication = authentication

# build the base URL
self.application = application
Expand Down Expand Up @@ -170,11 +173,28 @@ def download(self):

print 'Downloading build: %s' % (urllib.unquote(self.final_url))
tmp_file = self.target + ".part"
urllib.urlretrieve(self.final_url, tmp_file)

if self.authentication \
and self.authentication['username'] \
and self.authentication['password']:
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None,
self.final_url,
self.authentication['username'],
self.authentication['password'])
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(urllib2.HTTPHandler, handler)
urllib2.install_opener(opener)

r = urllib2.urlopen(self.final_url)
CHUNK = 16 * 1024
with open(tmp_file, 'wb') as f:
for chunk in iter(lambda: r.read(CHUNK), ''):
f.write(chunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Have found the same on stackoverflow. Ii just wonder if 16KB is the right amount of data we want to read-in for each iteration or if we should better go with 64KB, which I think is a better size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should better go with 64KB, which I think is a better size

Out of interest, what is this based on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous experience. But it's a couple of years back so things could have changed. Values too small could cause an overhead while larger values could block the thread. I think we can leave it for now and figure out performance improvements later.

os.rename(tmp_file, self.target)
except:
try:
if tmp_file:
if os.path.isfile(tmp_file):
os.remove(tmp_file)
except OSError:
pass
Expand Down Expand Up @@ -672,6 +692,16 @@ def cli():
metavar='EXTENSION',
help='File extension of the build (e.g. "zip"), default:\
the standard build extension on the platform.')
parser.add_option('--username',
dest='username',
default=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is only necessary for non None values. You can omit it in both cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this in four other places so I was going for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please remove all of them so that we are doing it right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a later refactor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

metavar='USERNAME',
help='Username for basic HTTP authentication.')
parser.add_option('--password',
dest='password',
default=None,
metavar='PASSWORD',
help='Password for basic HTTP authentication.')

# Option group for candidate builds
group = OptionGroup(parser, "Candidate builds",
Expand Down Expand Up @@ -727,7 +757,11 @@ def cli():
'platform': options.platform,
'version': options.version,
'directory': options.directory,
'extension': options.extension}
'extension': options.extension,
'authentication': {
'username': options.username,
'password': options.password}
}
scraper_options = {'candidate': {
'build_number': options.build_number,
'no_unsigned': options.no_unsigned},
Expand Down