Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 15 additions & 16 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ Scalar DB v1 is a distributed storage abstraction and client-coordinated distrib

## Background and Objectives

Distributed storage is widely adopted in real-world applications and recent open-source distributed storages such as Cassandra and HBase accelerates the trend. They are particularly accepted by large and sometimes mission-critical applications because of its high performance, high availability and high scalability. However, they are mostly lack of transaction capability, which are particularly important in mission-critical applications. HBase has the transaction capability by third-party libraries, but they tend to be a single point of failure due to master-slave architecture, thus it sacrifices some availability property of the storage. Also, distributed storages are so diverge per implementation, and some of them are getting too complex to meet various user requirements. It's sometimes a big headache for engineers to use and manage such storage systems. Some engineers or companies ended up with creating yet another distributed transactional database (such as CockroachDB and TiDB) from scratch to overcome such problems.
Distributed storage is widely adopted in real-world applications and recent open-source distributed storages such as Cassandra and HBase have accelerated the trend. They are particularly used by large and sometimes mission-critical applications because of their high performance, high availability and high scalability. However, they often lack transaction capability, which is particularly important in mission-critical applications. Transaction capability can be added to HBase via third-party libraries, but they tend to be a single point of failure due to the master-slave architecture, and thus it sacrifices the availability property of the storage. Also, distributed storages are so diverse in their implementations, and some of them are getting too complex to meet various user requirements. It is sometimes a big headache for engineers to use and manage such storage systems. Some companies have ended up creating yet another distributed transactional database from scratch (such as CockroachDB and TiDB) to overcome such problems.

Scalar DB v1 is a simple and practical solution to solve the above mentioned problems in a little different way. It provides a storage abstraction layer on the existing storage implementations and storage-independent distributed transaction layer on top of the storage abstraction. So, it can fully utilizes not only battle-tested existing implementations, tools and good properties of storages, but also the eco system and the community which have been grown for a long time.
Scalar DB v1 is a simple and practical solution to solve the above mentioned problems in a different way. It provides a storage abstraction layer on the existing storage implementations and a storage-independent distributed transaction layer on top of the storage abstraction. So, it can fully utilize, not only battle-tested existing implementations, tools and good properties of storages, but also the eco-system and the community which have grown for a long time.

## Design Goals

The primary design goals of Scalar DB v1 are high availability, horizontal scalability and strong consistency for distributed storage and transaction operations. It aims to tolerate disk, machine, rack, and even data-center failures with minimal performance degradation. It achieves the goals with unbundled transaction layer [1] with easy and unified API so the underlining storage implementations can be replaced with others without application code change. The performance of the Scalar DB is highly dependent on the underlining storage performance, and is usually slower than other scratch-built distributed databases since it adds a storage abstraction layer and storage-oblivious transaction layer.
The primary design goals of Scalar DB v1 are high availability, horizontal scalability and strong consistency for distributed storage and transaction operations. It aims to tolerate disk, machine, rack, and even data-center failures, with minimal performance degradation. It achieves these goals with an unbundled transaction layer [1] with easy and unified API so that the underlining storage implementations can be replaced with others without application code change. The performance of the Scalar DB is highly dependent on the underlining storage performance, and is usually slower than other scratch-built distributed databases since it adds a storage abstraction layer and storage-oblivious transaction layer.

## High-level Architecture

Expand All @@ -22,16 +22,16 @@ The primary design goals of Scalar DB v1 are high availability, horizontal scala

- TODO : diagram

The data model of Scalar DB v1 is a multi-dimensional map based on key-value data model. A logical record is composed of partition-key, clustering-key and a set of values. The value is uniquely mapped by a primary key composed of partition-key, clustering-key and value-name as described in the following scheme.
The data model of Scalar DB v1 is a multi-dimensional map based on the key-value data model. A logical record is composed of partition-key, clustering-key and a set of values. The value is uniquely mapped by a primary key composed of partition-key, clustering-key and value-name as described in the following scheme.

(partition-key, clustering-key, value-name) -> value-content

### Physical Data Model

Scalar DB v1 is a multi-dimensional map distributed to multiple nodes by key-based hash partitioning.
Records are assumed to be hash-partitioned by partition-key (even though an underlining implementation supports range partitioning).
Records with the same partition-key, which is called a partition, are sorted by clustering-key.
It is similar to Google BigTable[2] but it differs in clustering-key structure and partitioning scheme.
Records are assumed to be hash-partitioned by partition-key (even though an underlining implementation may support range partitioning).
Records with the same partition-key define a partition. Partitions are then sorted by the clustering-key.
It is similar to Google BigTable [2] but it differs in clustering-key structure and partitioning scheme.

### Limitation of the data model

Expand All @@ -42,26 +42,25 @@ Range scan is only supported for clustering-key access within the same partition

### Storage

Scalar DB supports only Cassandra storage as a storage implementation in v1.0. More correctly, it supports Cassandra java-driver API. Thus Cassandra java-driver compatible storage system such as ScyllaDB can also be used potentially.
The storage abstraction assumes the following features/properties, which most of recent distributed storages have:
- Atomic CRUD operation (Each single-record operation needs to be atomic)
Scalar DB supports only Cassandra storage as a storage implementation in v1.0. More correctly, it supports Cassandra java-driver API. Thus Cassandra java-driver compatible storage systems, such as ScyllaDB, can potentially also be used. The storage abstraction assumes the following features/properties, which most recent distributed storages have:
- Atomic CRUD operations (each single-record operation needs to be atomic)
- Sequential consistency support
- Atomic/Linearlizable conditional mutation (CUD)
- Ability to include user-defined meta-data for each record

Please see the javadoc for more detail and the usage.
Please see the javadoc for more details and usage.

### Transaction

It basically follows Cherry Garicia protocol[3] to achieve highly-available and scalable transaction on top of storage, in other words, paxos-commit[4] like transaction management.
More specifically, it achieves scalable distributed transaction by utilizing atomic conditional mutation for managing transaction state and storing WAL (Write-Ahead-Logging) records in distributed fashion in each record by using meta-data ability.
Transactions basically follow the Cherry Garcia protocol [3] to achieve highly-available and scalable transactions on top of storage, in other words, paxos-commit [4] like transaction management.
More specifically, Scalar DB achieves scalable distributed transaction by utilizing atomic conditional mutation for managing transaction state and storing WAL (Write-Ahead-Logging) records in distributed fashion in each record by using meta-data ability.

Please see the javadoc for more detail and the usage.
Please see the javadoc for more details and usage.

## Future Work

* It might support Hbase for another storage implementation for more performance.
* It might utlize deterministic nature of Scalar DL (ledger middleware used for stroing ledger information with Scalar DB) to avoid heavy-wight global consensus for linearizability and serializablity.
* Support Hbase for another storage implementation for more performance.
* Utilize deterministic nature of Scalar DL (ledger middleware used for storing ledger information with Scalar DB) to avoid heavy-write global consensus for linearizability and serializablity.

## References

Expand Down
36 changes: 18 additions & 18 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Getting Started with Scalar DB v1

## Overview
Scalar DB v1 is a library that provides an distributed storage abstraction and client-coordinated distributed transaction on the storage.
Scalar DB v1 is a library that provides a distributed storage abstraction and client-coordinated distributed transaction on the storage.
This document briefly explains how you can get started with Scalar DB with a simple electronic money application.

## Install prerequisites
Expand All @@ -22,19 +22,19 @@ $ cd /path/to/scalardb
$ ./gradlew installDist
```

Let's move to the getting started directory if you don't want to copy-and-paste the following codes.
Let's move to the getting started directory so that we do not have to copy-and-paste too much.
```
$ cd docs/getting-started
```

## Set up database schema

First of all, you need to define how a data is organized (a.k.a database schema) in an application.
Currently you need to define it with storage implementation specific schema.
First of all, you need to define how the data will be organized (a.k.a database schema) in the application.
Currently you will need to define it with a storage implementation specific schema.
For the mapping between Cassandra schema and Scalar DB schema, please take a look at [this document](schema.md).
NOTICE: We are planning to have Scalar DB specific schema definition and schema loader.
NOTICE: We are planning to have a Scalar DB specific schema definition and schema loader.

In this document, let's use the following Cassandra schema.
The following ([`emoney-storage.cql`](getting-started/emoney-storage.cql)) specifies a Cassandra schema.

```sql:emoney-storage.cql
DROP KEYSPACE IF EXISTS emoney;
Expand All @@ -49,16 +49,16 @@ CREATE TABLE emoney.account (
);
```

Now, you can load the schema to Cassandra with the following command.
This schema may be loaded into Cassandra with the following command.
```
$ cqlsh -f emoney-storage.cql
```

## Store & retrieve data with storage service

Here is a simple electronic money application with storage service.
(Be careful that it's simplified for ease of reading and far from practical and production-ready.)
You can find the full source code at [here](./getting-started).
[`ElectronicMoneyWithStorage.java`](./getting-started/src/main/java/sample/ElectronicMoneyWithStorage.java)
is a simple electronic money application with storage service.
(Be careful: it is simplified for ease of reading and far from practical and is certainly not production-ready.)

```java:ElectronicMoneyWithStorage.java
public class ElectronicMoneyWithStorage extends ElectronicMoney {
Expand Down Expand Up @@ -120,18 +120,18 @@ public class ElectronicMoneyWithStorage extends ElectronicMoney {
}
```

LET's run the application.
Now let's run the application.
```
$ ../../gradlew run --args="-mode storage -action charge -amount 1000 -to user1"
$ ../../gradlew run --args="-mode storage -action pay -amount 100 -to merchant1 -from user1"
$ ../../gradlew run --args="-mode storage -action pay -amount 100 -to merchant1 -from user1"
```

## Store & retrieve data with transaction service

The previous application seems fine in normal cases, but it's problematic when some failure happens during the operation or when multiple operations occur at the same time because it is not transactional.
In other words, money transfer (pay) from `from account's balance` to `to account's balance` is not done atomically in the application, and there might be case where only `from accont's balance` is decreased if a failure happens right after the first `put` or some money will be lost.
The previous application seems fine in ideal conditions, but it's problematic when some failure happens during the operation or when multiple operations occur at the same time because it is not transactional.
For example, money transfer (pay) from `A's balance` to `B's balance` is not done atomically in the application, and there might be a case where only `A's balance` is decreased (and `B's balance` is not increased) if a failure happens right after the first `put` and some money will be lost.

With transaction capability of Scalar DB, we can make such operations to be executed with ACID properties.
With the transaction capability of Scalar DB, we can make such operations to be executed with ACID properties.
Before updating the code, we need to update the schema to make it transaction capable.

```sql:emoney-transaction.cql
Expand Down Expand Up @@ -165,10 +165,10 @@ CREATE TABLE IF NOT EXISTS coordinator.state (
PRIMARY KEY (tx_id)
);
```
We don't go deeper here to explan what are those, but added definitions are metadata used by client-coordinated transaction of Scalar DB.
For more detail, please take a look at [this document](schema.md).
We will not go deeper into the details here, but the added definitions are metadata used by client-coordinated transactions of Scalar DB.
For more explanation, please take a look at [this document](schema.md).

After re-applying the schema, we can update the code like the following to make it transactional.
After reapplying the schema, we can update the code as follows to make it transactional.
```java:ElectronicMoneyWithTransaction.java
public class ElectronicMoneyWithTransaction extends ElectronicMoney {
private final TransactionService service;
Expand Down
30 changes: 16 additions & 14 deletions docs/schema.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
## Databaes schema in Scalar DB
## Database schema in Scalar DB

Scalar DB has its own data model (and schema), and it's mapped to implementation-specific data model and schema.
Scalar DB has its own data model and schema and is mapped to a implementation-specific data model and schema.
Also, it stores internal metadata for managing transaction logs and statuses.
This document briefly explains how data model and schema is mapped in between Scalar DB and other implementations, and what are the internal metadata.
This document briefly explains how the data model and schema is mapped between Scalar DB and other implementations, and what are the internal metadata.

## Scalar DB and Cassandra

Data model in Scalar DB is pretty similar to the data model in Cassandra except that Scalar DB is more like a simple key-value data model and it does not support secondary indexes.
The primary key in Scalar DB is composed of one or more partition keys and zero or more clustering keys. Similary, the primary key in Cassandra is composed of one or more partition keys and zero or more clustering columns.
The data model in Scalar DB is quite similar to the data model in Cassandra, except that in Scalar DB it is more like a simple key-value data model and it does not support secondary indexes.
The primary key in Scalar DB is composed of one or more partition keys and zero or more clustering keys. Similarly, the primary key in Cassandra is composed of one or more partition keys and zero or more clustering columns.


Data types supported in Scalar DB is a little restricted than Cassandra.
Here is the supported data types and the mapping to Cassandra data types.
Data types supported in Scalar DB are a little more restricted than in Cassandra.
Here are the supported data types and their mapping to Cassandra data types.

|Scalar DB |Cassandra |
|---|---|
Expand All @@ -26,9 +26,10 @@ Here is the supported data types and the mapping to Cassandra data types.
## Internal metadata in Scalar DB

Scalar DB executes transactions in a client-coordinated manner by storing and retrieving metadata stored along with the actual records.
Thus, additional values for the metadata need to be defined in the schema in addition to required values by applications.
Thus, along with any required values by the application, additional values for the metadata need to be defined in the schema.

Here is an example schema in Cassandra when it is used with Scalar DB transactions.

Here is an example schmea in Cassadra when it's used with Scalar DB transactions.
```sql
CREATE TABLE example.table1 (
; keys and values required by an application
Expand All @@ -53,13 +54,14 @@ CREATE TABLE example.table1 (
);
```

Lets' assume that k1 is a partition key, k2 is a clustering key and v1 is a value, and those are the values required by an application.
Let's assume that k1 is a partition key, k2 is a clustering key and v1 is a value, and those are the values required by an application.
In addition to those, Scalar DB requires metadata for managing transactions.
The rule behind it is as follows.
* add `tx_id`, `tx_prepared_at`, `tx_committed_at`, `tx_state`, `tx_version` for metadata for the current record
* add `tx_id`, `tx_prepared_at`, `tx_committed_at`, `tx_state`, `tx_version` as metadata for the current record
* add `before_` prefixed values for each existing value except for primary keys (partition keys and clustering keys) for managing before image

Also, we need state table for managing transaction states as follows.
Additionally, we need a state table for managing transaction states as follows.

```sql
CREATE TABLE IF NOT EXISTS coordinator.state (
tx_id text,
Expand All @@ -71,7 +73,7 @@ CREATE TABLE IF NOT EXISTS coordinator.state (

## Schema generator and loader

It's a little hard for application developers to care for the schema mapping and metadata for transactions,
we are preparing tools to generate implementation-specific schema and load the schema without caring too much about those.
It is a little hard for application developers to care for the schema mapping and metadata for transactions,
so we are preparing tools to generate implementation-specific schema and to load the schema.

coming soon