Db downloader #5

Closed
wants to merge 12 commits into
from

Projects

None yet

2 participants

Contributor

Implemented a script that downloads xpis based on AMO data.

Note: there are enough differences between Python 2.6 & 2.7 that my best advice is to run this script using the 'python2.7' binary on the vm.

@ochameau ochameau commented on the diff Feb 7, 2013
scripts/fetch_amo.py
+ else:
+ print "File already exists: %s" % filename
+ except mdb.Error, e:
+
+ print "Error %d: %s" % (e.args[0],e.args[1])
+ sys.exit(1)
+
+ finally:
+
+ if con:
+ con.close()
+
+ print len(errors)
+
+ if len(errors) > 0:
+ print errors
ochameau
ochameau Feb 7, 2013 Owner

errors is always empty. we can just get rid of it.

@ochameau ochameau commented on an outdated diff Feb 7, 2013
scripts/fetch_amo.py
+ url_tpl = 'https://addons.cdn.mozilla.net/storage/public-staging/%d/%s'
+
+ remote = url_tpl % (id, filename)
+ local = join(download_dir, filename)
+
+ # return errors...
+ stdout.write("(%d/%d) Downloading %s" % (i, total_rows, filename))
+ stdout.flush()
+ try:
+ u = urllib2.urlopen(remote)
+ except urllib2.HTTPError, e:
+ reason = ''
+ if hasattr(e, 'reason'):
+ reason = e.reason
+ stdout.write("\nERR %s %s: %s %s\n" % ( e.code, reason, id, filename ))
+ stdout.flush()
ochameau
ochameau Feb 7, 2013 Owner

Better use stderr.write instead of suggesting to grep for ERR.

@ochameau ochameau commented on an outdated diff Feb 7, 2013
scripts/fetch_amo.py
+ remote = url_tpl % (id, filename)
+ local = join(download_dir, filename)
+
+ # return errors...
+ stdout.write("(%d/%d) Downloading %s" % (i, total_rows, filename))
+ stdout.flush()
+ try:
+ u = urllib2.urlopen(remote)
+ except urllib2.HTTPError, e:
+ reason = ''
+ if hasattr(e, 'reason'):
+ reason = e.reason
+ stdout.write("\nERR %s %s: %s %s\n" % ( e.code, reason, id, filename ))
+ stdout.flush()
+ err = {'code': e.code, 'reason': reason}
+ return err
ochameau
ochameau Feb 7, 2013 Owner

We aren't using the returned value of download.

@ochameau ochameau commented on an outdated diff Feb 7, 2013
scripts/fetch_amo.py
+ total_rows = len(rows)
+
+ print "Downloading %d extensions, hold on." % total_rows
+
+ for r in rows:
+ filename = r[5]
+ id = r[0]
+ i += 1
+ if not exists(join(download_dir, filename)):
+ download(id, filename, download_dir, i, total_rows)
+ else:
+ print "File already exists: %s" % filename
+ except mdb.Error, e:
+
+ print "Error %d: %s" % (e.args[0],e.args[1])
+ sys.exit(1)
ochameau
ochameau Feb 7, 2013 Owner

Same thing about stderr here.

@ochameau ochameau commented on an outdated diff Feb 7, 2013
scripts/fetch_amo.py
+from shutil import copyfile, copytree, rmtree
+from time import strftime, strptime, localtime
+import urllib2
+import MySQLdb as mdb
+import sys
+from yaml import load, dump
+
+# speedy YAML
+try:
+ from yaml import CLoader as Loader, CDumper as Dumper
+except ImportError:
+ from yaml import Loader, Dumper
+
+con = None
+config_file = './amo_db_config.yml'
+download_dir = abspath('./addons')
ochameau
ochameau Feb 7, 2013 Owner

Could you use ../ftp in order to match the behavior of existing script?

@ochameau ochameau commented on the diff Feb 7, 2013
scripts/README.md
@@ -12,13 +12,11 @@ $ wget --no-check-certificate https://github.com/ochameau/jetpack-repacker/tarba
1/ Download xpi files
```
-$ mkdir /addons/ftp
-$ cd /addons/ftp
-$ /addons/scripts/fetch-ftp.sh
-# This script will download all xpi files. You can re-run it at anytime to download only new files.
-# but note that remove files from mozilla ftp will be kept locally.
-# xpi will be in /addons/ftp/ftp.mozilla.org/pub/mozilla.org/addons/
-# (On 2012/07, it downloaded 17GB of files.)
+$ python2.7 fetch_amo.py
+# this will fetch records from the AMO database for every current sdk-based
+# add-on and download the xpi files to ./srcipts/addons/.
+# to capture errors, grep for 'ERR' eg
+# python2.7 fetch_amo.py | grep ERR > errors.txt
ochameau
ochameau Feb 7, 2013 Owner

I'd keep both scripts documented, as some people may play with that without having access to the VM.

With the usage of stderr, we can suggest doing: python2.7 fetch_amo.py 2 > errors.txt.
And I'd rather keep a normalized folder layout and keep mkdir/cd commands.

$ mkdir /addons/ftp
$ cd /addons
$ python2.7 scripts/fetch_amo.py
... download xpi files to /addons/ftp ...

This normalized folder layout allows to run these command and be sure that everything works without having to play with custom paths.

@ochameau ochameau commented on the diff Feb 7, 2013
scripts/fetch_amo.py
+from yaml import load, dump
+
+# speedy YAML
+try:
+ from yaml import CLoader as Loader, CDumper as Dumper
+except ImportError:
+ from yaml import Loader, Dumper
+
+con = None
+config_file = './amo_db_config.yml'
+download_dir = abspath('./addons')
+
+def getYaml(path):
+ """ loadin' YAML files. """
+ if not exists(path):
+ raise "YAML file doesn't exist: %s" % path
ochameau
ochameau Feb 7, 2013 Owner

-> raise Exception("YAML file doesn't exist: %s" % path)

@ochameau ochameau added a commit that referenced this pull request Feb 7, 2013
@ochameau ochameau Address review comments from PR #5 and update docs to make it simplie…
…r to launch repack scripts.
1645f2e
Owner
ochameau commented Feb 7, 2013

Finally I ended up fixing all these comments and started fixing the script to support the new layout.
See PR #6.

@canuckistani canuckistani addressed review comments in PR 5 - switched to using stderr, removed…
… garbage from previous attempts at error collection.
6c98ac7
Contributor

Closing, this is all addressed in PR6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment