A polite, minimal interface for sending python objects to and from Amazon S3.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
s3plz
tests
.gitignore
README.md
requirements.txt
setup.py

README.md

s3plz

A polite, minimal interface for sending python objects to and from Amazon S3.

Installation

pip install s3plz

Tests

To run tests, you must first set these environmental variabels:

export AWS_ACCESS_KEY_ID='fdsaf'
export AWS_SECRET_ACCESS_KEY='fdsaf'
export S3PLZ_TEST_BUCKET='s3://my-cool-bucket'

and then run:

nosetests

Bacic Usage

import s3plz

# Return an `s3plz.S3` object with 
# methods for sending objects
# to and from Amazon S3.

plz = s3plz.connect('s3://asteroid', 
    key='navigate',
    secret='shield',
    serializer="json.gz",
    public = False
)

# You can also set `AWS_ACCESS_KEY_ID` and 
# `AWS_SECRET_ACCESS_KEY` as environmental variables
# instead of passing `key` and `secret` to `s3plz.connect`



# Serialize an object, format its
# filepath, put it on s3, and return
# the formatted filepath (with an absolute s3path) 
# for your records

obj1 = {"key": "value"}
filepath = 'test/{key}.json.gz'

fp = plz.put(obj1, filepath, **obj1)
print fp

# >>> 's3://asteroid/test/value.json.gz'
# you can now fetch this object with its filepath

obj2 =  plz.get(fp)
assert(obj1 == obj2)

Customization.

Filepaths

s3plz will attempt to format your filepath for you given arbitrary **kwargs passed to any method. You also have access to UTC time via the "@" operator.

These include:

  • @second: 56
  • @minute: 54
  • @hour: 23
  • @day: 29
  • @month: 01
  • @year: 2014
  • @timestamp: 1234567
  • @date_path: 2014/01/14
  • @date_slug: 2014-01-14
  • @datetime_slug: 2013-12-12-06-08-52
  • @uid: dasfas-23r32-sad-3sadf-sdf

For instance,

import s3plz

obj = {"key": "value"}
filepath = 'test/{key}/{@date_path}/{@uid}.json.gz'

plz = s3plz.connect('s3://my-bucket', serializer="json.gz")
fp = plz.put(obj, filepath, **obj)
print fp 
# >>> 's3://my-bucket/test/value/2014/08/25/3225-sdsa-35235-asdfas-235.json.gz'

Serialization

By default, s3plz will send strings to/from S3. You can also serialize / deserialize objects to / from json.gz, json, gz, zip, or pickle (set with serializer via s3plz.connect). These can also be changed on-the-fly by passing serializer as a kwarg into get, put, stream or upsert.

For example,

import s3plz 

s3 = s3plz.connect('s3://my-bucket')
obj1 = {"foo":"bar"}
fp = s3.put(obj1, "test/{foo}.json.gz", serializer="json.gz", **obj1)
obj2 = s3.get(fp, serializer="json.gz")
assert(obj1 == obj2)

string1 = "hello world"
fp = s3.put(string1, "test/string.zip", serializer="zip")
string2 = s3.get(fp, serializer="zip")
assert(string1 == string2)

However, you can also inherit from the core s3plz.S3 class and overwrite the serialize and deserialize methods:

from s3plz import S3

class SqlAlchemyToS3(S3):

    def serialize(self, obj):
        return "Do something here."

    def deserialize(self, string):
        return "Undo it."

s3 = SqlAlchemyToS3('s3://bucket')
print s3.get('s3://bucket/file.mycoolformat')
# >>> `A SqLAlchemy Model`

API

Full API Methods

  • S3.put(data, filepath, headers, serializer, **kw)

    • desciption: Upload a file if it doesnt already exist, otherwise return False
    • params:
      • data: Object to upload
      • filepath: filepath format string
      • headers: http headers to set on the object.
      • serializer: serializer to use for the object.
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: s3uri / False
  • S3.upsert(data, filepath, headers, serializer, **kw)

    • desciption: Upload a file if it doesnt already exist, otherwise return False
    • params:
      • data: Object to upload
      • filepath: filepath format string
      • headers: http headers to set on the object.
      • serializer: serializer to use for the object.
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: s3uri / False
  • S3.get(filepath, headers, serializer, **kw)

    • desciption: Download a file from s3. If it doesn't exist return None.
    • params:
      • filepath: filepath format string
      • headers: http headers to set on the object.
      • serializer: serializer to use for the object.
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: deserialized contents
  • S3.get_meta(filepath,**kw)

    • desciption: Get a dictionary of metadata fields for a filepath. If it doesn't exist, return {}
    • params:
      • filepath: filepath format string
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: dict of metadata.
  • S3.get_age(filepath, **kw)

    • desciption: Get the age of a a filepath. If it doesn't exist, return {}
    • params:
      • filepath: filepath format string
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: datetime.timedelta
  • S3.exists(filepath, **kw)

    • desciption: Check if a file exists on s3.
    • params:
      • filepath: filepath format string
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: s3_uri / False
  • S3.ls(filepath, **kw)

    • desciption: Return a generator of filepaths under a directory.
    • params:
      • filepath: filepath format string
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: generator of s3_uri's
  • S3.stream(filepath, headers, serializer, **kw)

    • desciption: Return a generator of tuples of (s3_uri, contents) under a directory.
    • params:
      • filepath: filepath format string
      • headers: http headers to set on the object.
      • serializer: serializer to use for the object.
      • **kw: arbitrary kwargs to pass to the filepath format string.
    • returns: generator of tuples of (s3_uri, contents) from s3.
  • S3.delete(filepath, **kw)

    • desciption: Delete a filepath from s3.
    • params:
      • filepath: filepath format string
      • **kw: arbitrary kwargs to pass to the filepath format string.