A simple python-based abstraction library for the various blob storage out there including s3, google storage and local disk.
OFS is a bucket/object storage library.

It provides a common API for storing bitstreams (plus related metadata) in 'bucket/object' stores such as:

  • S3, Google Storage, Eucalytus,
  • Filesystem (via pairtree)
  • 'REST' Store (see remote/ - implementation at
  • Riak (buggy)
  • add a backend here - just implement the methods in

Why use the library:

  • Abstraction: write common code but use different storage backends
  • More than a filesystem, less than a database - support for metadata as well as bitstreams


For all boto-based stores (Google Storage, S3 etc) require boto>=2.0.

Example Usage

(local version - depends on 'pairtree', and 'simplejson'):

 >>> from ofs.local import PTOFS

 >>> o = PTOFS()
 (Equivalent to 'o = PTOFS(storage_dir = "data", uri_base="urn:uuid:", hashing_type="md5")')

 # Claim a bucket - this will add the bucket to the list of existing ones
 >>> uuid_id = o.claim_bucket()
 >>> uuid_id

 # Choose a bucket name - if it exists, a new UUID one will be formed instead and returned
 >>> bucket_id = o.claim_bucket("foo")
 >>> bucket_id
 >>> bucket_id = o.claim_bucket("foo")
 >>> bucket_id

 # Store a file:
 >>> o.put_stream(bucket_id, "foo.txt", open("foo....))
 {'_label': 'foo.txt', '_content_length': 10, '_checksum': 'md5:10feda25f8da2e2ebfbe646eea351224', '_last_modified': '2010-08-02T11:37:21', '_creation_date': '2010-08-02T11:37:21'}

 # or:
 >>> o.put_stream(bucket_id, "foo.txt", "asidaisdiasjdiajsidjasidji")
 {'_label': 'foo.txt', '_content_length': 10, '_checksum': 'md5:10feda25f8da2e2ebfbe646eea351224', '_last_modified': '2010-08-02T11:37:21', '_creation_date': '2010-08-02T11:37:21'}

 # adding a file with some parameters:
 >>> o.put_stream(bucket_id, "foooo", "asidaisdiasjdiajsidjasidji", params={"original_uri":"http://...."})
 {'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26}

 # Get the underlying URL pointing to a resource
 >>> o.get_url(bucket_id, "foo")
   [typical local pairtree response:]
   [typical remote response]

 # adding to existing metadata:
 >>> o.update_metadata(bucket_id, "foooo", {'foo':'bar'})
 {'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26, 'foo': 'bar'}

 # Remove keys
 >>> o.remove_metadata_keys(bucket_id, "foooo", ['foo'])
 {'_label': 'foooo', 'original_uri': 'http://....', '_last_modified': '2010-08-02T11:39:11', '_checksum': 'md5:3d690d7e0f4479c5a7038b8a4572d0fe', '_creation_date': '2010-08-02T11:39:11', '_content_length': 26}

 # Delete blob
 >>> o.exists(bucket_id, "foooo")
 >>> o.del_stream(bucket_id, "foooo")
 >>> o.exists(bucket_id, "foooo")

 # Iterate through ids for buckets held:
 >>> for item in o.list_buckets():
 ...   print item
 .... etc

 # Display the labels in a specific bucket:


Tests use plain unittest but recommend using nose.

To run the botostore tests you'll need to copy test.ini.tmpl to test.ini and put in details for a google storage account.


v0.4.1: 2011-08-13

  • Set checksum (md5) based on etag (botostore backends) if not set

v0.4: 2011-04-28

  • New authenticate_request method for boto based backends.
  • Improved update_medata in botostore (no need to download and re-upload).

v0.3: 2011-01-20

  • S3Bounce backend (use authorization credentials from CKAN).
  • Use setuptools plugins with ofs.backend to allow for 3rd party backends
  • ofs_upload command

v0.2: 2010-11-20

  • Google Storage support.
  • REST store

v0.1: 2010-10-14

  • Initial implemenation with PairTree and S3
