Blob Storage

Rustam Aliyev edited this page Jul 7, 2013 · 11 revisions

ElasticInbox designed to easily store millions of emails with linear scalability. To achieve this ElasticInbox stores message sources on scalable Blob storage systems such as Openstack Object Store and AWS S3.

Blob URI Format

Metadata record for each message contain Blob URI in the following format:

blob://blob-profile/f1ca99e0-99a0-11e2-95f0-040cced3bd7a:email@domail.tld?c=dfl&e=ekey2
blob://db/f1ca99e0-99a0-11e2-95f0-040cced3bd7a?c=dfl&b=1

Blob URI above has 4 parts:

  1. URI schema. Always blob. Currently this is the only supported scheme, showing that file is stored on blob storage.
  2. Host. Blob profile name configured in elasticinbox.yaml (see below). db is a special profile indicating that blob is stored in metadata store (database).
  3. Path. Unique file name generated by ElasticInbox based on Blob Naming Policy.
  4. Parameters. Various blob attributes such as compression algorithm, encryption key, total block count, etc.

Multi-Cloud Configuration

It is possible to configure one or more blob stores in elasticinbox.yaml. For instance, AWS S3 and OpenStack can coexist in the same setup:

### Blob storage settings
blobstore_profiles:
  openstack-example:
    provider: swift
    endpoint: http://10.0.0.1:8066/auth/
    container: elasticinbox
    identity: user:elasticinbox
    credential: mysecret
    apiversion: 1.0
  aws-example:
    provider: aws-s3
    endpoint: https://s3-eu-west-1.amazonaws.com
    container: mybucket.mydomain.tld
    identity: AWSGENERATEDID
    credential: myverylongawssecret

NOTE: Once configured, blob store profile names should never be changed and should be identical on all ElasticInbox nodes. This is because profile names are stored in metadata (as seen above). ElasticInbox does not verify profile configurations on the nodes (this may change in future).

Messages are written to the blob store configured in blobstore_write_profile parameter, but can be read from any configured blob store.

Each blob profile has following properties:

  • Provider. Blob store provider. ElasticInbox supports most of blob stores through jClouds. However, in default configuration only support for aws-s3 (AWS S3), swift (OpenStack), azure (Microsoft Azure) and filesystem is included.
  • Endpoint. Endpoint for blob store.
  • Container. Container name. On AWS S3 it's known as bucket.
  • Identity. Identity or username of your blob store account.
  • Credential. Credential or password of your blob store account.
  • Api Version. Optional.

Support for multi-cloud blob stores enables even greater scalability and flexibility. For instance, you can start with public cloud such as AWS S3 and later move to private cloud. Migration process will be smooth and without downtime. This technique also can facilitate migration from legacy filesystem storage to ElasticInbox.

Hybrid Storage

In hybrid mode, ElasticInbox can use metadata store for small and cloud storage for large blobs.

Email conversations without large attachments are typically producing blobs of a very small size. Depending on the nature of a mail service, number of small blobs (less than 24K) can reach 95% of total email traffic (based on our evaluations). Since most of the cloud providers charge per request, saving small files in local metadata store would be more efficient and economic.

To enable hybrid mode, define threshold of maximum blob size to be stored in metadata store (e.g. Cassandra) in the configuration file:

# Maximum blob size in bytes which can be stored in the database.
# Blobs larger than this value will be stored with the deafult blob profile (blobstore_write_profile).
# If compression enabled, this threshold will be applied to a compressed blob size.
# Set to 0 to disable using database as a blob storage. Maximum allowed value 128K.
database_blob_max_size: 32768

To disable hybrid storage set database_blob_max_size to 0. Currently maximum supported blob size in metadata store is 128K. Recommended value is between 12-30K. When compression enabled, threshold applied to the compressed blob size.

Compression

In order to save space and network traffic you can enable blob compression from the config:

# Compress objects written to the blob store (including database blobs)
blobstore_enable_compression: true

ElasticInbox uses standard Deflate algorithm (RFC1951) for compression. Compressed blobs will have c=dfl parameter in URL. This is how compressed files are differentiated from uncompressed ones.

Encryption

Security is important when considering public cloud. ElasticInbox supports encryption of all blobs before sending them to your cloud provider. For more see Encryption