Data Model

Rustam Aliyev edited this page Sep 19, 2013 · 8 revisions

ElasticInbox uses 5 column families (tables):

  • Accounts
  • MessageMetadata
  • MessageBlob
  • IndexLabels
  • Counters

Complete schema can be found here: https://github.com/elasticinbox/elasticinbox/blob/master/config/elasticinbox.cml

RFC5322 compatible email address is used as a unique account identifier in all CFs.

Accounts CF

This CF contains account information such as labels and custom attributes.

Schema syntax:

CREATE COLUMN FAMILY Accounts WITH 
    key_validation_class = UTF8Type AND
    caching = all AND
    comment = 'Basic information about accounts';

Sample contents:

"Accounts" {
    "user@elasticinbox.com" {
        "label:0"  : "all",
        "label:1"  : "inbox",
        "label:2"  : "drafts",
        ...
        "label:1234" : "MyLabel",
        "lattr:1234:color" : "Green",
        "lattr:1234:MyAttribute" : "My Text Value",
        ...
    }
}

MessageMetadata SCF

MessageMetadata is a super column family. Each row contains all messages for the particular account identified by email address. This helps to store all messages for an account on the same Cassandra node and speedup read operation.

Each super column contains information about particular message, identified and ordered by message UUID. Message UUID generated based on the message time.

Schema syntax:

CREATE COLUMN FAMILY MessageMetadata WITH 
    column_type = Super AND
    key_validation_class = UTF8Type AND
    comparator = TimeUUIDType AND 
    subcomparator = BytesType AND
    caching = keys_only AND
    comment='Message metadata including headers, labels, markers, physical location, etc.';

Sample contents:

"MessageMetadata" {
    "user@elasticinbox.com" {
        "550e8400-e29b-41d4-a716-446655440000" {
            "from"     : "[["EI Test","test@elasticinbox.com"]]", # JSON encoded data
            "to"       : "[["Me","user@elasticinbox.com"],[...]]",
            "subject"  : "Hello world!",
            "date"     : "12 March 2011",
            "location" : "blob://fs-local/container/username@elasticinbox.com:753eef70-d5fb-14ce-abd4-040cced3bd7a",
            "l:1"      : true,   # Label ID
            "m:1"      : true,   # Marker ID
            ...
        }
    }
}

MessageBlob CF

When enabled, MessageBlob column family is used to store chunks of message blobs. Cassandra can be used as a blob storage and store messages in 128K chunks. Each key contains of message UUID and block (chunk) ID. In turn, columns represent sub-block ID and block data. For more see ...

Schema syntax:

CREATE COLUMN FAMILY MessageBlob WITH
    key_validation_class = 'CompositeType(TimeUUIDType, Int32Type)' AND
    comparator = Int32Type AND
    caching = keys_only AND
    comment='Chunked message blobs';

Sample contents:

"MessageBlob" {
    "550e8400-e29b-41d4-a716-446655440000:0" {
        "0" : "binary.message.content.for.block.0.subblock.0"
        ...
    },
    "550e8400-e29b-41d4-a716-446655440000:1" {
        "0" : "binary.message.content.for.block.1.subblock.0"
        ...
    },
    ...
}

IndexLabels CF

IndexLabels is reverse index for labels. Each row uniquely identified by composite key of email address and label id. Contents of each label index are message UUIDs which belong to this label and sorted as TimeUUID.

Schema syntax:

CREATE COLUMN FAMILY IndexLabels WITH
    key_validation_class = UTF8Type AND
    comparator = TimeUUIDType AND 
    caching = all AND
    comment = 'Message ID indexes grouped by labels and ordered by time';

Sample contents:

"IndexLabels" {
    "user@elasticinbox.com:1" {
        "550e8400-e29b-41d4-a716-446655440000" : null,
        "892e8300-e29b-41d4-a716-446655440000" : null,
        "a0232400-e29b-41d4-a716-446655440000" : null,
        ...
    }
}

Purge Indexes

In addition to the normal label indexes, there's specific purge index type in IndexLabels CF. Purge index keeps track of deleted messages.

Each time message deleted, ElasticInbox will remove it from all label indexes and add entry to purge index. Purge index's column name is timestamp of delete event (in form of TimeUUID) and column value is message UUID.

Sample contents:

"IndexLabels" {
    "user@elasticinbox.com:purge" {
        "550e8400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "892e8300-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "a0232400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        ...
    }
}

NOTE: delete message operation does not remove message from MessageMetadata and Blob Store. This is done in order to 1) speedup delete operation, 2) provide restore mechanism in case of accidental deletes. Deleted messages should be periodically purged using API call.

Counters CF

Counters is a column family which keeps track of mailbox stats (potentially may be used for IMAP serial ID generation).

Following stats are currently stored for each label:

  • Size in Bytes (only available for ALL_MAILS label)
  • Total message count
  • New message count

For total mailbox stats query ALL_MAILS label (ID=0).

Schema syntax:

CREATE COLUMN FAMILY Counters WITH
    comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)' AND
    key_validation_class = UTF8Type AND
    default_validation_class = CounterColumnType AND
    replicate_on_write = true AND
    caching = all AND
    comment = 'All counters for an account';

Sample contents:

"Counters" {
    "user@elasticinbox.com" {
        "l:0:b" : 18239090,   # bytes, composite type, label ID, or other counter identified
        "l:0:m" : 394,        # messages
        "l:0:u" : 12,         # unread
        ...
    }
}