Implemented support for basic authentication #38

Closed
wants to merge 2 commits into
from

Projects

None yet

2 participants

@davehunt
Member
davehunt commented Nov 7, 2012

Fixes issue #36

@whimboo whimboo commented on an outdated diff Nov 9, 2012
mozdownload/scraper.py
@@ -50,7 +51,8 @@ class Scraper(object):
"""Generic class to download an application from the Mozilla server"""
def __init__(self, directory, version, platform=None,
- application='firefox', locale='en-US', extension=None):
+ application='firefox', locale='en-US', extension=None,
+ username=None, password=None):
whimboo
whimboo Nov 9, 2012 Contributor

I would use a dict here with authentication as name. It should have username and password as properties.

@whimboo whimboo commented on an outdated diff Nov 9, 2012
mozdownload/scraper.py
@@ -170,7 +174,18 @@ def download(self):
print 'Downloading build: %s' % (urllib.unquote(self.final_url))
tmp_file = self.target + ".part"
- urllib.urlretrieve(self.final_url, tmp_file)
+
+ if self.username and self.password:
+ password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
+ password_mgr.add_password(None, self.final_url, self.username, self.password)
+ handler = urllib2.HTTPBasicAuthHandler(password_mgr)
+ opener = urllib2.build_opener(urllib2.HTTPHandler, handler)
+ urllib2.install_opener(opener)
+
+ f = urllib2.urlopen(self.final_url)
+ data = f.read()
+ with open(tmp_file, 'wb') as code:
+ code.write(data)
whimboo
whimboo Nov 9, 2012 Contributor

I don't think we put the whole file into memory first. Whether we should write in chunks or directly to the file. Doesn't let us urllib2 do that?

@whimboo whimboo commented on the diff Nov 9, 2012
mozdownload/scraper.py
@@ -13,6 +13,7 @@
import re
import sys
import urllib
+import urllib2
whimboo
whimboo Nov 9, 2012 Contributor

So we still need urllib because of unquote()? Is there no replacement in urllib2?

davehunt
davehunt Nov 11, 2012 Member

urllib2 uses urllib itself to do unquote.

whimboo
whimboo Nov 14, 2012 Contributor

It doesn't answer my question why we need urllib in our file regarding the unquote() call.

davehunt
davehunt Nov 14, 2012 Member

Sorry for not making this clear enough. Why would we remove urllib if we're still using it? Given that urllib2 uses unquote from urllib itself, I don't see urllib2 as a direct replacement for urllib.

whimboo
whimboo Nov 14, 2012 Contributor

Hm, so the only existing code which makes use of this method is: print 'Downloading build: %s' % (urllib.unquote(self.final_url)). I don't think we can get around it. So it should be fine to leave it in for now. But if you have an idea...

@whimboo whimboo commented on the diff Nov 9, 2012
mozdownload/scraper.py
@@ -672,6 +687,16 @@ def cli():
metavar='EXTENSION',
help='File extension of the build (e.g. "zip"), default:\
the standard build extension on the platform.')
+ parser.add_option('--username',
+ dest='username',
+ default=None,
whimboo
whimboo Nov 9, 2012 Contributor

default is only necessary for non None values. You can omit it in both cases.

davehunt
davehunt Nov 11, 2012 Member

We do this in four other places so I was going for consistency.

whimboo
whimboo Nov 14, 2012 Contributor

So please remove all of them so that we are doing it right.

davehunt
davehunt Dec 4, 2012 Member

Perhaps a later refactor?

whimboo
whimboo Dec 4, 2012 Contributor

Works for me.

@whimboo whimboo commented on an outdated diff Nov 9, 2012
mozdownload/scraper.py
@@ -672,6 +687,16 @@ def cli():
metavar='EXTENSION',
help='File extension of the build (e.g. "zip"), default:\
the standard build extension on the platform.')
+ parser.add_option('--username',
+ dest='username',
+ default=None,
+ metavar='USERNAME',
+ help='Username for protected files.')
whimboo
whimboo Nov 9, 2012 Contributor

I would call it Username basic HTTP Authentication. Same for the password.

Contributor
whimboo commented Nov 9, 2012

Looks fine otherwise.

Member

Updated

@whimboo whimboo commented on the diff Nov 14, 2012
mozdownload/scraper.py
+ and self.authentication['username'] \
+ and self.authentication['password']:
+ password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
+ password_mgr.add_password(None,
+ self.final_url,
+ self.authentication['username'],
+ self.authentication['password'])
+ handler = urllib2.HTTPBasicAuthHandler(password_mgr)
+ opener = urllib2.build_opener(urllib2.HTTPHandler, handler)
+ urllib2.install_opener(opener)
+
+ r = urllib2.urlopen(self.final_url)
+ CHUNK = 16 * 1024
+ with open(tmp_file, 'wb') as f:
+ for chunk in iter(lambda: r.read(CHUNK), ''):
+ f.write(chunk)
whimboo
whimboo Nov 14, 2012 Contributor

Nice! Have found the same on stackoverflow. Ii just wonder if 16KB is the right amount of data we want to read-in for each iteration or if we should better go with 64KB, which I think is a better size.

davehunt
davehunt Nov 14, 2012 Member

we should better go with 64KB, which I think is a better size

Out of interest, what is this based on?

whimboo
whimboo Nov 14, 2012 Contributor

Previous experience. But it's a couple of years back so things could have changed. Values too small could cause an overhead while larger values could block the thread. I think we can leave it for now and figure out performance improvements later.

Contributor
whimboo commented Nov 14, 2012

So please merge to patch to master. It looks fine!

Member
davehunt commented Dec 4, 2012

Raised issue #43 to take care of the default=None items.

Member
davehunt commented Dec 4, 2012

Landed in commit 44efef0

@davehunt davehunt closed this Dec 4, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment