diff --git a/_posts/2015-08-22-paxos-register.markdown b/_posts/2015-08-22-paxos-register.markdown index d0fc04ac..c355a74e 100644 --- a/_posts/2015-08-22-paxos-register.markdown +++ b/_posts/2015-08-22-paxos-register.markdown @@ -1,25 +1,25 @@ --- layout: post -title: rystsov::The Paxos Register -name: The Paxos Register -tags: ["misc"] +title: rystsov::Write-once distributed register +name: Write-once distributed register +tags: ["distr"] desc: "Design of a write-once distributed fault-tolerance register" has_comments: true --- -
+To build a key value storage each node runs a new instance of Paxos register each time when the node receives a get or put request for an unseen key and redirects all further requests related to the key to that register. Of course different instances on the same node should share the quorum size and the set of nodes and use unified membership change mechanism which generalises the Paxos variable's version to work with a dynamic set of Paxos registers. For example its 2n+1 to 2n+2 transition is:
+1. Increase the quorum size.
+2. Generate an event on each node.
+3. Fetch all keys from the nodes up to the event, union them and sync all of them. Of course this operation may be optimized by batching and parallel processing.
+4. Add a new node to the set of nodes.
-
+Up to this point we designed a replicated key/value with strong consistency and per key test-and-set concurrency control primitive. Since the whole dataset fits one node nothing prevents us from maintaining the order on keys and support range queries.
-
+The next step is to support sharding.
-
+
+Imagine a key value storage that lives on three nodes.
+
+
+
+Some day because of the storage usage or high load we will decide to split the storage. So we peek a key from the key space and split the key/value storage into two logical groups. First group (A) contains key less-or-equal to the key and the second group contains the rest (B).
+
+
+
+Then we add a node to the B group.
+
+
+
+And remove a node from it.
+
+
+
+And repeat the process until we get A and B clusters working on the different sets of nodes.
+
+
+
+As a result we split the storage without stopping the cluster or loosing consistency.
diff --git a/css/common.css b/css/common.css
index 4e2b6d50..d929e3dd 100644
--- a/css/common.css
+++ b/css/common.css
@@ -9,6 +9,10 @@ a {
text-decoration: none;
}
+.anchor {
+ background: none;
+}
+
h1,h2,h3,p {
margin: 0;
}
@@ -24,6 +28,11 @@ h1,h2,h3,p {
font-size: 90%;
}
+.key_value_design {
+ margin-top: 1em;
+ margin-bottom: 1em;
+}
+
.abstract-center {
max-width: 600px;
margin-right: auto;
diff --git a/css/post.css b/css/post.css
index 37eda1cb..3a4e7cb1 100644
--- a/css/post.css
+++ b/css/post.css
@@ -30,3 +30,9 @@ h3 {
#low_internet {
margin-top: 2em;
}
+
+.sharded-paxos-pic {
+ margin-left: auto;
+ margin-right: auto;
+ display: block;
+}
diff --git a/images/sharded-paxos-1.png b/images/sharded-paxos-1.png
new file mode 100644
index 00000000..5434b7ed
Binary files /dev/null and b/images/sharded-paxos-1.png differ
diff --git a/images/sharded-paxos-2.png b/images/sharded-paxos-2.png
new file mode 100644
index 00000000..fe218281
Binary files /dev/null and b/images/sharded-paxos-2.png differ
diff --git a/images/sharded-paxos-3.png b/images/sharded-paxos-3.png
new file mode 100644
index 00000000..5606e128
Binary files /dev/null and b/images/sharded-paxos-3.png differ
diff --git a/images/sharded-paxos-4.png b/images/sharded-paxos-4.png
new file mode 100644
index 00000000..a440010c
Binary files /dev/null and b/images/sharded-paxos-4.png differ
diff --git a/images/sharded-paxos-5.png b/images/sharded-paxos-5.png
new file mode 100644
index 00000000..b9c29720
Binary files /dev/null and b/images/sharded-paxos-5.png differ
diff --git a/index.html b/index.html
index ea816dbb..3aebdc7f 100644
--- a/index.html
+++ b/index.html
@@ -11,6 +11,20 @@