Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Explained generating the categories file for the Powerhouse example.

  • Loading branch information...
commit 0f8303a51b3362c71ff63dc6e22dd4948a89dd6b 1 parent cd0dad3
@malcolmt authored
Showing with 29 additions and 5 deletions.
  1. +24 −5 Powerhouse/README.txt
  2. +5 −0 README.txt
View
29 Powerhouse/README.txt
@@ -9,16 +9,35 @@ licensed under a Creative Commons Attribution-Sharealike 2.5 Australia license.
This particular tutorial used a version that was downloaded on June 22, 2010,
but there's nothing particularly special about that copy.
+As explained in the talk, the importing code (load.py) expects two files. The
+first is a list of categories, one per line with repitions permitted. The
+second is the unadulterated Powerhouse dataset. The categories file can be
+generated (on a Unix-like system) via:
+
+ tail -n +2 phm_collection.txt | cut -f9 | grep -v "^$" | tr "|" "\n" \
+ > categories.txt
+
+You can also generate the same file (with a few extra blanks, which are
+harmless) using the Python one-liner (split over two lines for readability, but
+it should be a single line):
+
+ python -c "print '\n'.join(l.split('\t')[8].replace('|', '\n')
+ for l in open('phm_collection.txt').readlines())" > categories.txt
+
+and then removing the first line ("Categories"), which is the header line and
+not a real category name.
+
When "python manage.py syncdb --noinput" is run to initialise the database, an
admin user with username "admin" and password "admin" will automatically be
created. The currents settings use a local sqlite database. However, for speed
of import when I was developing this, I used an sqlite database on a
memory-backed filesystem (in Linux). I ran:
- mount -t tmpfs -o uid=500,gid=500 tmpfs /home/malcolm/store
- # Changed settings.py to put the database in ~/store/phm.sqlite
- PYTHONPATH=.. DJANGO_SETTINGS_MODULE=settings ./load.py \
- categories.txt phm_collection.txt
+ mount -t tmpfs -o uid=500,gid=500 tmpfs /home/malcolm/store
+ # Changed settings.py to put the database in ~/store/phm.sqlite
+ PYTHONPATH=.. DJANGO_SETTINGS_MODULE=settings ./load.py \
+ categories.txt phm_collection.txt
-Using this setup (memory-backed filesystem), it takes just over 5 minutes to import the full dataset using a single 2.5GHz core on my laptop.
+Using this setup (memory-backed filesystem), it takes just over 5 minutes to
+import the full dataset using a single 2.5GHz core on my laptop.
View
5 README.txt
@@ -2,6 +2,11 @@ The two Django projects here were used to illustrate data importing techniques
in my talk "Displaying Australian datasets with Django" at PyCon-AU, 27 June,
2010.
+Each project contains its own README.txt file explaining how to download the
+relevant dataset. Aside from the dataset and database, all the other software
+pieces are included in this package to enabled you to display the data in
+Django's admin interface.
+
All original text and data is released under the Creative Commons
Attribution-ShareAlike 3.0 Australia license. Refer to
http://creativecommons.org/licenses/by-sa/3.0/au/ . As far as attribution for
Please sign in to comment.
Something went wrong with that request. Please try again.