Skip to content

Commit

Permalink
multiple backends
Browse files Browse the repository at this point in the history
  • Loading branch information
nathancahill committed Sep 1, 2014
1 parent 064d6b1 commit 11af09e
Show file tree
Hide file tree
Showing 11 changed files with 357 additions and 124 deletions.
118 changes: 87 additions & 31 deletions README.md
@@ -1,49 +1,105 @@
### MimicDB: An Isomorphic Key-Value Store for S3
## MimicDB

#### S3 Metadata without the Latency or Costs
[![PyPI](http://img.shields.io/pypi/v/mimicdb.svg?style=flat)](https://pypi.python.org/pypi/mimicdb/)
[![Build Status](http://img.shields.io/travis/nathancahill/mimicdb/master.svg?style=flat)](https://travis-ci.org/nathancahill/mimicdb)
[![Coverage Status](http://img.shields.io/coveralls/nathancahill/mimicdb/master.svg?style=flat)](https://coveralls.io/r/nathancahill/mimicdb)

MimicDB is a local database of the metadata of objects stored on S3. Many tasks like listing, searching keys and calculating storage usage can be completely handled locally, without the latency or costs of calling the S3 API.

On average, tasks like these are __2000x__ faster using MimicDB.
#### Python Implementation of MimicDB

Python works with the Boto library.

#### Installation

By default, MimicDB requires Redis (although other backends can be used instead).

__Boto__
```python
>>> c = S3Connection(KEY, SECRET)
>>> bucket = c.get_bucket('bucket_name')
>>> start = time.time()
>>> bucket.get_all_keys()
>>> print time.time() - start
0.425064992905
```
$ pip install redis
$ pip install mimicdb
```

#### Quickstart

__Boto + MimicDB__
```python
>>> c = S3Connection(KEY, SECRET)
>>> bucket = c.get_bucket('bucket_name')
>>> start = time.time()
>>> bucket.get_all_keys()
>>> print time.time() - start
0.000198841094971
If you're using Boto already, replace ```boto``` imports with ```mimicdb``` imports.

Change:
```
from boto.s3.connection import S3Connection
from boto.s3.key import Key
```

#### Key Value Store
To:
```
from mimicdb.s3.connection import S3Connection
from mimicdb.s3.key import Key
```

MimicDB uses a Redis backend to stored S3 metadata. Data is stored in the following layout.
Additionally, import the MimicDB object itself, and initiate the backend:
```
from mimicdb import MimicDB
MimicDB()
```

`mimicdb` A set of buckets
After establishing a connection for the first time, sync the connection to save the metadata locally:
```
conn = S3Connection(KEY, SECRET)
conn.sync()
```

`mimicdb:bucket` A set of keys
Or sync only a couple buckets from the connection:
```
conn.sync('bucket1', 'bucket2')
```

`mimicdb:bucket:key` A hash of key metadata (size and MD5)
After that, upload, download and list as you usually would. API calls that can be responded to locally will return instantly without hitting S3 servers. API calls that are made to S3 using MimicDB will be mimicked locally to ensure consistency with the remote servers.

The `mimicdb` prefix can additionally use an optional `namespace` string, which allows multiple S3 connections to share the same backend. In that case, the layout looks like this:
Pass ```force=True``` to most functions to force a call to the S3 API. This also updates the local database.

`mimicdb:namespace`
#### Alternate Backends

`mimicdb:namespace:bucket`
Besides the default Redis backend, MimicDB has SQLite and in-memory backends available.
```
from mimicdb.backends.sqlite import SQLite
MimicDB(SQLite())
```
```
from mimicdb.backends.memory import Memory
MimicDB(Memory())
```

#### Documentation

[mimicdb.readthedocs.org](http://mimicdb.readthedocs.org)

#### Contributing


1. Fork the repo.
2. Run tests to ensure a clean, working slate.
3. Improve/fix the code.
4. Add test cases if new functionality introduced or bug fixed (100% test coverage).
5. Ensure tests pass.
6. Push to your fork and submit a pull request to the develop branch.

#### Tests

`mimicdb:namespace:bucket:key`
Run tests after installing nose and coverage.

#### Implementation
```
$ nosetests --with-coverage --cover-package=mimicdb
```

Integration testing is provided by Travis-CI at [travis-ci.org/nathancahill/mimicdb](https://travis-ci.org/nathancahill/mimicdb)

Test coverage reporting is provided by Coveralls at [coveralls.io/r/nathancahill/mimicdb](coveralls.io/r/nathancahill/mimicdb)

MimicDB is currently implemented in Python via Boto. If you're using Boto already, the MimicDB Python library works as a drop in replacement.
#### Benchmarks

Run ```benchmarks.py``` in the root of the repo:

```
$ python benchmarks.py
Boto Time: 0.338411092758
MimicDB Time: 0.00015789039612
Factor: 2143x faster
```
26 changes: 12 additions & 14 deletions mimicdb/__init__.py
@@ -1,19 +1,17 @@
from redis import StrictRedis
from .s3 import tpl
"""Python implementation of MimicDB
"""


class MimicDB(object):
def __init__(self, *args, **kwargs):
def __init__(self, backend=None, namespace=None):
"""Initialze the MimicDB backend with an optional namespace.
"""
Initialze the MimicDB object by passing the Redis connection parameters:
:host='localhost'
:port=6379
:db=0
if not backend:
from .backends.default import Redis
backend = Redis()

The Redis connection is accessed elsewhere in the module by importing
mimicdb, then calling mimicdb.redis
"""
if kwargs and 'namespace' in kwargs:
tpl.set_namespace(kwargs.pop('namespace'))
globals()['backend'] = backend

globals()['redis'] = StrictRedis(*args, **kwargs)
if namespace:
from .backends import tpl
tpl.set_namespace(namespace)
27 changes: 27 additions & 0 deletions mimicdb/backends/__init__.py
@@ -0,0 +1,27 @@
"""Base class for MimicDB backends
"""

class Backend(object):
def __init__(self, *args, **kwargs):
pass

def delete(self, *names):
pass

def sadd(self, name, *values):
pass

def srem(self, name, *values):
pass

def sismember(self, name, value):
pass

def smembers(self, name):
pass

def hmset(self, name, mapping):
pass

def hgetall(self, name):
pass
33 changes: 33 additions & 0 deletions mimicdb/backends/default.py
@@ -0,0 +1,33 @@

from redis import StrictRedis

from . import Backend


class Redis(Backend):
def __init__(self, *args, **kwargs):
self._redis = StrictRedis(*args, **kwargs)

def keys(self, pattern='*'):
return self._redis.keys(pattern)

def delete(self, *names):
return self._redis.delete(*names)

def sadd(self, name, *values):
return self._redis.sadd(name, *values)

def srem(self, name, *values):
return self._redis.srem(name, *values)

def sismember(self, name, value):
return self._redis.sismember(name, value)

def smembers(self, name):
return self._redis.smembers(name)

def hmset(self, name, mapping):
return self._redis.hmset(name, mapping)

def hgetall(self, name):
return self._redis.hgetall(name)
41 changes: 41 additions & 0 deletions mimicdb/backends/memory.py
@@ -0,0 +1,41 @@


from . import Backend


class Memory(Backend):
def __init__(self):
self._data = dict()

def keys(self, pattern='*'):
pattern = pattern.replace('*', '')
return [key for key in self._data if key.startswith(pattern)]

def delete(self, *names):
for name in names:
self._data.pop(name, None)

def sadd(self, name, *values):
if name in self._data:
self._data[name].update(values)
else:
self._data[name] = set(values)

def srem(self, name, *values):
if name in self._data:
self._data[name].difference_update(values)

def sismember(self, name, value):
if name in self._data:
return value in self._data[name]

return False

def smembers(self, name):
return self._data.get(name, [])

def hmset(self, name, mapping):
self._data[name] = mapping

def hgetall(self, name):
return self._data.get(name, dict())
78 changes: 78 additions & 0 deletions mimicdb/backends/sqlite.py
@@ -0,0 +1,78 @@

import sqlite3

from . import Backend


class SQLite(Backend):
def __init__(self, *args, **kwargs):
if not args and not kwargs:
args = [':memory:']

self._sqlite = sqlite3.connect(*args, **kwargs)

def keys(self, pattern='*'):
c = self._sqlite.cursor()
pattern = pattern.replace('*', '%')
c.execute('SELECT name FROM sqlite_master WHERE type="%s" AND name LIKE "%s"' % ('table', pattern))

return [row[0] for row in c.fetchall()]

def delete(self, *names):
c = self._sqlite.cursor()

for name in names:
c.execute('DROP TABLE IF EXISTS "%s"' % (name,))

self._sqlite.commit()

def sadd(self, name, *values):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (member text)' % (name,))

for value in values:
c.execute('INSERT INTO "%s" VALUES ("%s")' % (name, value))

self._sqlite.commit()

def srem(self, name, *values):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (member text)' % (name,))

for value in values:
c.execute('DELETE FROM "%s" WHERE member="%s"' % (name, value))

self._sqlite.commit()

def sismember(self, name, value):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (member text)' % (name,))
c.execute('SELECT * FROM "%s" WHERE member="%s"' % (name, value))

return c.fetchone() != None

def smembers(self, name):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (member text)' % (name,))
c.execute('SELECT * FROM "%s"' % (name,))

return [row[0] for row in c.fetchall()]

def hmset(self, name, mapping):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (size text, md5 text)' % (name,))
c.execute('INSERT INTO "%s" VALUES ("%s", "%s")' % (name, mapping['size'], mapping['md5']))

self._sqlite.commit()

def hgetall(self, name):
c = self._sqlite.cursor()
c.execute('CREATE TABLE IF NOT EXISTS "%s" (size text, md5 text)' % (name,))
c.execute('SELECT * FROM "%s"' % (name,))

row = c.fetchone()

if row:
return dict(size=row[0], md5=row[1])

return dict()
File renamed without changes.

0 comments on commit 11af09e

Please sign in to comment.