diff --git a/docs/design.md b/docs/design.md index e889f3198a..c7cfebe933 100644 --- a/docs/design.md +++ b/docs/design.md @@ -6,13 +6,13 @@ Scalar DB v1 is a distributed storage abstraction and client-coordinated distrib ## Background and Objectives -Distributed storage is widely adopted in real-world applications and recent open-source distributed storages such as Cassandra and HBase accelerates the trend. They are particularly accepted by large and sometimes mission-critical applications because of its high performance, high availability and high scalability. However, they are mostly lack of transaction capability, which are particularly important in mission-critical applications. HBase has the transaction capability by third-party libraries, but they tend to be a single point of failure due to master-slave architecture, thus it sacrifices some availability property of the storage. Also, distributed storages are so diverge per implementation, and some of them are getting too complex to meet various user requirements. It's sometimes a big headache for engineers to use and manage such storage systems. Some engineers or companies ended up with creating yet another distributed transactional database (such as CockroachDB and TiDB) from scratch to overcome such problems. +Distributed storage is widely adopted in real-world applications and recent open-source distributed storages such as Cassandra and HBase have accelerated the trend. They are particularly used by large and sometimes mission-critical applications because of their high performance, high availability and high scalability. However, they often lack transaction capability, which is particularly important in mission-critical applications. Transaction capability can be added to HBase via third-party libraries, but they tend to be a single point of failure due to the master-slave architecture, and thus it sacrifices the availability property of the storage. Also, distributed storages are so diverse in their implementations, and some of them are getting too complex to meet various user requirements. It is sometimes a big headache for engineers to use and manage such storage systems. Some companies have ended up creating yet another distributed transactional database from scratch (such as CockroachDB and TiDB) to overcome such problems. -Scalar DB v1 is a simple and practical solution to solve the above mentioned problems in a little different way. It provides a storage abstraction layer on the existing storage implementations and storage-independent distributed transaction layer on top of the storage abstraction. So, it can fully utilizes not only battle-tested existing implementations, tools and good properties of storages, but also the eco system and the community which have been grown for a long time. +Scalar DB v1 is a simple and practical solution to solve the above mentioned problems in a different way. It provides a storage abstraction layer on the existing storage implementations and a storage-independent distributed transaction layer on top of the storage abstraction. So, it can fully utilize, not only battle-tested existing implementations, tools and good properties of storages, but also the eco-system and the community which have grown for a long time. ## Design Goals -The primary design goals of Scalar DB v1 are high availability, horizontal scalability and strong consistency for distributed storage and transaction operations. It aims to tolerate disk, machine, rack, and even data-center failures with minimal performance degradation. It achieves the goals with unbundled transaction layer [1] with easy and unified API so the underlining storage implementations can be replaced with others without application code change. The performance of the Scalar DB is highly dependent on the underlining storage performance, and is usually slower than other scratch-built distributed databases since it adds a storage abstraction layer and storage-oblivious transaction layer. +The primary design goals of Scalar DB v1 are high availability, horizontal scalability and strong consistency for distributed storage and transaction operations. It aims to tolerate disk, machine, rack, and even data-center failures, with minimal performance degradation. It achieves these goals with an unbundled transaction layer [1] with easy and unified API so that the underlining storage implementations can be replaced with others without application code change. The performance of the Scalar DB is highly dependent on the underlining storage performance, and is usually slower than other scratch-built distributed databases since it adds a storage abstraction layer and storage-oblivious transaction layer. ## High-level Architecture @@ -22,16 +22,16 @@ The primary design goals of Scalar DB v1 are high availability, horizontal scala - TODO : diagram -The data model of Scalar DB v1 is a multi-dimensional map based on key-value data model. A logical record is composed of partition-key, clustering-key and a set of values. The value is uniquely mapped by a primary key composed of partition-key, clustering-key and value-name as described in the following scheme. +The data model of Scalar DB v1 is a multi-dimensional map based on the key-value data model. A logical record is composed of partition-key, clustering-key and a set of values. The value is uniquely mapped by a primary key composed of partition-key, clustering-key and value-name as described in the following scheme. (partition-key, clustering-key, value-name) -> value-content ### Physical Data Model Scalar DB v1 is a multi-dimensional map distributed to multiple nodes by key-based hash partitioning. -Records are assumed to be hash-partitioned by partition-key (even though an underlining implementation supports range partitioning). -Records with the same partition-key, which is called a partition, are sorted by clustering-key. -It is similar to Google BigTable[2] but it differs in clustering-key structure and partitioning scheme. +Records are assumed to be hash-partitioned by partition-key (even though an underlining implementation may support range partitioning). +Records with the same partition-key define a partition. Partitions are then sorted by the clustering-key. +It is similar to Google BigTable [2] but it differs in clustering-key structure and partitioning scheme. ### Limitation of the data model @@ -42,26 +42,25 @@ Range scan is only supported for clustering-key access within the same partition ### Storage -Scalar DB supports only Cassandra storage as a storage implementation in v1.0. More correctly, it supports Cassandra java-driver API. Thus Cassandra java-driver compatible storage system such as ScyllaDB can also be used potentially. -The storage abstraction assumes the following features/properties, which most of recent distributed storages have: -- Atomic CRUD operation (Each single-record operation needs to be atomic) +Scalar DB supports only Cassandra storage as a storage implementation in v1.0. More correctly, it supports Cassandra java-driver API. Thus Cassandra java-driver compatible storage systems, such as ScyllaDB, can potentially also be used. The storage abstraction assumes the following features/properties, which most recent distributed storages have: +- Atomic CRUD operations (each single-record operation needs to be atomic) - Sequential consistency support - Atomic/Linearlizable conditional mutation (CUD) - Ability to include user-defined meta-data for each record -Please see the javadoc for more detail and the usage. +Please see the javadoc for more details and usage. ### Transaction -It basically follows Cherry Garicia protocol[3] to achieve highly-available and scalable transaction on top of storage, in other words, paxos-commit[4] like transaction management. -More specifically, it achieves scalable distributed transaction by utilizing atomic conditional mutation for managing transaction state and storing WAL (Write-Ahead-Logging) records in distributed fashion in each record by using meta-data ability. +Transactions basically follow the Cherry Garcia protocol [3] to achieve highly-available and scalable transactions on top of storage, in other words, paxos-commit [4] like transaction management. +More specifically, Scalar DB achieves scalable distributed transaction by utilizing atomic conditional mutation for managing transaction state and storing WAL (Write-Ahead-Logging) records in distributed fashion in each record by using meta-data ability. -Please see the javadoc for more detail and the usage. +Please see the javadoc for more details and usage. ## Future Work -* It might support Hbase for another storage implementation for more performance. -* It might utlize deterministic nature of Scalar DL (ledger middleware used for stroing ledger information with Scalar DB) to avoid heavy-wight global consensus for linearizability and serializablity. +* Support Hbase for another storage implementation for more performance. +* Utilize deterministic nature of Scalar DL (ledger middleware used for storing ledger information with Scalar DB) to avoid heavy-write global consensus for linearizability and serializablity. ## References diff --git a/docs/getting-started.md b/docs/getting-started.md index 2c26caa131..5773cb5b8c 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -1,7 +1,7 @@ # Getting Started with Scalar DB v1 ## Overview -Scalar DB v1 is a library that provides an distributed storage abstraction and client-coordinated distributed transaction on the storage. +Scalar DB v1 is a library that provides a distributed storage abstraction and client-coordinated distributed transaction on the storage. This document briefly explains how you can get started with Scalar DB with a simple electronic money application. ## Install prerequisites @@ -22,19 +22,19 @@ $ cd /path/to/scalardb $ ./gradlew installDist ``` -Let's move to the getting started directory if you don't want to copy-and-paste the following codes. +Let's move to the getting started directory so that we do not have to copy-and-paste too much. ``` $ cd docs/getting-started ``` ## Set up database schema -First of all, you need to define how a data is organized (a.k.a database schema) in an application. -Currently you need to define it with storage implementation specific schema. +First of all, you need to define how the data will be organized (a.k.a database schema) in the application. +Currently you will need to define it with a storage implementation specific schema. For the mapping between Cassandra schema and Scalar DB schema, please take a look at [this document](schema.md). -NOTICE: We are planning to have Scalar DB specific schema definition and schema loader. +NOTICE: We are planning to have a Scalar DB specific schema definition and schema loader. -In this document, let's use the following Cassandra schema. +The following ([`emoney-storage.cql`](getting-started/emoney-storage.cql)) specifies a Cassandra schema. ```sql:emoney-storage.cql DROP KEYSPACE IF EXISTS emoney; @@ -49,16 +49,16 @@ CREATE TABLE emoney.account ( ); ``` -Now, you can load the schema to Cassandra with the following command. +This schema may be loaded into Cassandra with the following command. ``` $ cqlsh -f emoney-storage.cql ``` ## Store & retrieve data with storage service -Here is a simple electronic money application with storage service. -(Be careful that it's simplified for ease of reading and far from practical and production-ready.) -You can find the full source code at [here](./getting-started). +[`ElectronicMoneyWithStorage.java`](./getting-started/src/main/java/sample/ElectronicMoneyWithStorage.java) +is a simple electronic money application with storage service. +(Be careful: it is simplified for ease of reading and far from practical and is certainly not production-ready.) ```java:ElectronicMoneyWithStorage.java public class ElectronicMoneyWithStorage extends ElectronicMoney { @@ -120,18 +120,18 @@ public class ElectronicMoneyWithStorage extends ElectronicMoney { } ``` -LET's run the application. +Now let's run the application. ``` $ ../../gradlew run --args="-mode storage -action charge -amount 1000 -to user1" -$ ../../gradlew run --args="-mode storage -action pay -amount 100 -to merchant1 -from user1" +$ ../../gradlew run --args="-mode storage -action pay -amount 100 -to merchant1 -from user1" ``` ## Store & retrieve data with transaction service -The previous application seems fine in normal cases, but it's problematic when some failure happens during the operation or when multiple operations occur at the same time because it is not transactional. -In other words, money transfer (pay) from `from account's balance` to `to account's balance` is not done atomically in the application, and there might be case where only `from accont's balance` is decreased if a failure happens right after the first `put` or some money will be lost. +The previous application seems fine in ideal conditions, but it's problematic when some failure happens during the operation or when multiple operations occur at the same time because it is not transactional. +For example, money transfer (pay) from `A's balance` to `B's balance` is not done atomically in the application, and there might be a case where only `A's balance` is decreased (and `B's balance` is not increased) if a failure happens right after the first `put` and some money will be lost. -With transaction capability of Scalar DB, we can make such operations to be executed with ACID properties. +With the transaction capability of Scalar DB, we can make such operations to be executed with ACID properties. Before updating the code, we need to update the schema to make it transaction capable. ```sql:emoney-transaction.cql @@ -165,10 +165,10 @@ CREATE TABLE IF NOT EXISTS coordinator.state ( PRIMARY KEY (tx_id) ); ``` -We don't go deeper here to explan what are those, but added definitions are metadata used by client-coordinated transaction of Scalar DB. -For more detail, please take a look at [this document](schema.md). +We will not go deeper into the details here, but the added definitions are metadata used by client-coordinated transactions of Scalar DB. +For more explanation, please take a look at [this document](schema.md). -After re-applying the schema, we can update the code like the following to make it transactional. +After reapplying the schema, we can update the code as follows to make it transactional. ```java:ElectronicMoneyWithTransaction.java public class ElectronicMoneyWithTransaction extends ElectronicMoney { private final TransactionService service; diff --git a/docs/schema.md b/docs/schema.md index 31a81070ff..2eb2db2eba 100644 --- a/docs/schema.md +++ b/docs/schema.md @@ -1,17 +1,17 @@ -## Databaes schema in Scalar DB +## Database schema in Scalar DB -Scalar DB has its own data model (and schema), and it's mapped to implementation-specific data model and schema. +Scalar DB has its own data model and schema and is mapped to a implementation-specific data model and schema. Also, it stores internal metadata for managing transaction logs and statuses. -This document briefly explains how data model and schema is mapped in between Scalar DB and other implementations, and what are the internal metadata. +This document briefly explains how the data model and schema is mapped between Scalar DB and other implementations, and what are the internal metadata. ## Scalar DB and Cassandra -Data model in Scalar DB is pretty similar to the data model in Cassandra except that Scalar DB is more like a simple key-value data model and it does not support secondary indexes. -The primary key in Scalar DB is composed of one or more partition keys and zero or more clustering keys. Similary, the primary key in Cassandra is composed of one or more partition keys and zero or more clustering columns. +The data model in Scalar DB is quite similar to the data model in Cassandra, except that in Scalar DB it is more like a simple key-value data model and it does not support secondary indexes. +The primary key in Scalar DB is composed of one or more partition keys and zero or more clustering keys. Similarly, the primary key in Cassandra is composed of one or more partition keys and zero or more clustering columns. -Data types supported in Scalar DB is a little restricted than Cassandra. -Here is the supported data types and the mapping to Cassandra data types. +Data types supported in Scalar DB are a little more restricted than in Cassandra. +Here are the supported data types and their mapping to Cassandra data types. |Scalar DB |Cassandra | |---|---| @@ -26,9 +26,10 @@ Here is the supported data types and the mapping to Cassandra data types. ## Internal metadata in Scalar DB Scalar DB executes transactions in a client-coordinated manner by storing and retrieving metadata stored along with the actual records. -Thus, additional values for the metadata need to be defined in the schema in addition to required values by applications. +Thus, along with any required values by the application, additional values for the metadata need to be defined in the schema. + +Here is an example schema in Cassandra when it is used with Scalar DB transactions. -Here is an example schmea in Cassadra when it's used with Scalar DB transactions. ```sql CREATE TABLE example.table1 ( ; keys and values required by an application @@ -53,13 +54,14 @@ CREATE TABLE example.table1 ( ); ``` -Lets' assume that k1 is a partition key, k2 is a clustering key and v1 is a value, and those are the values required by an application. +Let's assume that k1 is a partition key, k2 is a clustering key and v1 is a value, and those are the values required by an application. In addition to those, Scalar DB requires metadata for managing transactions. The rule behind it is as follows. -* add `tx_id`, `tx_prepared_at`, `tx_committed_at`, `tx_state`, `tx_version` for metadata for the current record +* add `tx_id`, `tx_prepared_at`, `tx_committed_at`, `tx_state`, `tx_version` as metadata for the current record * add `before_` prefixed values for each existing value except for primary keys (partition keys and clustering keys) for managing before image -Also, we need state table for managing transaction states as follows. +Additionally, we need a state table for managing transaction states as follows. + ```sql CREATE TABLE IF NOT EXISTS coordinator.state ( tx_id text, @@ -71,7 +73,7 @@ CREATE TABLE IF NOT EXISTS coordinator.state ( ## Schema generator and loader -It's a little hard for application developers to care for the schema mapping and metadata for transactions, -we are preparing tools to generate implementation-specific schema and load the schema without caring too much about those. +It is a little hard for application developers to care for the schema mapping and metadata for transactions, +so we are preparing tools to generate implementation-specific schema and to load the schema. coming soon