Rasputin DB 🌐

(Significant work is currently happening in the tyler_ranges branch)

flexible linearizable distributed store

triumvirs: operational clarity, performance and composability

currently implemented: linearized KV set/get/cas/del. client code is happy-path only, so it's only fit for playing around with at this point!

current reasons why you don't want to use this beyond playing with it:

Mostly unimplemented. We don't have support for automatic resharding, real transactions or collection types other than KV yet. These are still in the planning phase.
Possibly incorrect. We have not yet proven the correctness of the core consensus algorithm. We may be able to adapt the Raft Coq proof to this end, as we are essentially replacing Raft's preemptible leadership with a non-preempting lease to improve throughput in the presence of partial partitions.
Inefficient. The write path involves a TON of copying. We are in the process of designing a much more efficient buffer management system.
Buggy. Client code is a complete hack that only occasionally works. We have a simulator in place for teasing out bugs in the state machine, but we haven't used it for simulating common datacenter conditions like partitions, delayed message arrival, node restarts/shutdowns/pauses, etc...
Undocumented.
Unpopular. No community and no production users (or, at least I hope nobody is using it yet!).

Running

Run a test cluster

cargo build
./run.sh
tail -f _rasputin_test/*log

Run an individual server

target/debug/rasputind \
    --peer-port=7777 \
    --cli-port=8888 \
    --seed-peers="127.0.0.1:7777" \
    --storage-dir=/var/lib/rasputin/ \
    --logfile=/var/log/rasputin.log

Hit the cluster with a remote client!

Cargo.toml:

[dependencies]
rasputin = "0.1.0"

Code:

extern crate rasputin;

fn main() {
    let peers = vec!["127.0.0.1:8888".parse().unwrap()];
    let nthreads = 1;
    let mut cli = rasputin::Client::new(peers, nthreads);

    cli.set(b"k1", b"v1").unwrap();
    assert!(cli.get(b"k1").unwrap().get_value() == b"v1");

    // CAS returns the current value, and sets the success flag accordingly
    assert!(cli.cas(b"k1", b"v1", b"v12").unwrap().get_value() == b"v12");
    assert!(cli.cas(b"k1", b"vNever", b"vNever2").unwrap().get_value() == b"v12");
    assert!(cli.cas(b"k1", b"vNever", b"vNever2").unwrap().get_success() == false);
    assert!(cli.cas(b"k1", b"v12", b"v13").unwrap().get_value() == b"v13");

    // deletes return the last value
    assert!(cli.del(b"k1").unwrap().get_value() == b"v13");
    assert!(cli.get(b"k1").unwrap().get_success() == false);
}

Planned Work

automatic lexicographic resharding

Rasputin will utilize shard size and request density metrics to faciliate intelligent splitting.

several simple persistent collection types

kv: backed by RocksDB
log: Kafka-like sequential segment files
object: files on system VFS

interest semantics

subscribe: in-order mutation stream
watch: at most once mutation notification

replication modes (per-collection)

consensus: mutations block on replication to a quorum
async: mutations return quickly, and are replicated later

timeseries primitives

logarithmically bucketed histograms for efficient aggregation and consumption of extremely high velocity metrics, a la loghisto.

roadmap

Appendix: The Harpoon Consensus Algorithm

Because Rasputin aims to be as general purpose of a replication mechanism as possible, it needs to be resilient against partitions. We aim to reuse the parts of Raft that work for this as much as we can, and replace the leader election mechanism with a lease-based one that does not preempt in the presence of partial partitions. This obviously needs to be tested extensively, and to that end a comprehensive simulator is being built for testing the state machine (see test/cluster.rs), and fault injection tooling is being built for inducing realistic datacenter conditions on a non-simulated cluster.

Raft is vulnerable to rapid leader churn when a partial partition exists between the leader and any other node. The partially partitioned node will fire its leader election timer and receive quorum. Because the old leader can't talk to this new leader, it will do the same. Leadership bounces a lot and we have suboptimal throughput. Harpoon is essentially just Raft with a modified election mechanism: candidates and leaders request leases from all peers, extend leadership if they reach quorum, and abdicate if they do not reach a quorum of successful extension request responses by the end of their lease. This prevents leadership churn in scenarios where there is a partial partition, which is common over the open internet, for example.

Harpoon has not yet been formally verified, but eventually we will adapt the Raft Coq proof for it.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
doc		doc
include		include
src		src
test		test
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rasputin DB 🌐

Running

Run a test cluster

Run an individual server

Hit the cluster with a remote client!

Planned Work

automatic lexicographic resharding

several simple persistent collection types

interest semantics

replication modes (per-collection)

timeseries primitives

roadmap

Appendix: The Harpoon Consensus Algorithm

About

Releases

Packages

Contributors 2

Languages

License

spacejam/rasputin

Folders and files

Latest commit

History

Repository files navigation

Rasputin DB 🌐

Running

Run a test cluster

Run an individual server

Hit the cluster with a remote client!

Planned Work

automatic lexicographic resharding

several simple persistent collection types

interest semantics

replication modes (per-collection)

timeseries primitives

roadmap

Appendix: The Harpoon Consensus Algorithm

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages