Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dss: Add a readme for the raft lab and the kvraft lab #64

Merged
merged 1 commit into from Apr 23, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
310 changes: 310 additions & 0 deletions dss/README.md
@@ -0,0 +1,310 @@
# The raft and kvraft lab

This is a series of labs on a key/value storage system built with the Raft
consensus algorithm. These labs are derived from the [lab2:raft][6824lab2] and
[lab3:kvraft][6824lab3] from the famous [MIT 6.824][6824] course but rewritten
in Rust. The following text material is also very influenced by the text
material there.

[6824lab2]:http://nil.csail.mit.edu/6.824/2017/labs/lab-raft.html
[6824lab3]:http://nil.csail.mit.edu/6.824/2017/labs/lab-kvraft.html
[6824]:http://nil.csail.mit.edu/6.824/2017/index.html

In these labs your will first implement the raft consensus algorithm in the
lab2:raft, and then build a key/value service in the lab3:kvraft.

Raft is a consensus algorithm that is designed to be easy to understand.You can
read material about the Raft itself at [the Raft site][raftsite], including
[the extended Raft paper][raftpaper], an interactive visualization of the Raft,
and other resource. Those material should be helpful for you to complete this lab.

[raftsite]:https://raft.github.io/

## Getting started

First, please clone this repo with `git` to get the source code of the labs.

Then, make sure you have `rustup` install, and override toolchain for this folder
to nightly with `rustup override set nightly` . Also, to make things simpler you
should have `make` installed.

Now you can run `make test_others` to check that things are going right. You
should see all tests passed.

(If you are a Windows user, you may have to figure out how to use `make` on
Windows, or type the commands from the makefile in the console manually, or just
use the Windows Subsystem for Linux)

## The lab2: Raft

In this lab you will implement the Raft consensus algorithm. This lab has 3 parts
named with 2A, 2B and 2C.

To run all the test in this lab, run `make test_2`. Please run the tests multiple
times to mak sure your are not passing the tests just by luck.

To run just a single test, run `make cargo_test_<insert test name here>`.

### The code structure

All your codes in this lab should be in the `src/raft/service.rs` and
`src/raft/mod.rs`.

The `src/raft/mod.rs` file should contains your main implementation of Raft.
The tester (and your key/value server in lab3) will call methods in this file
to use your Raft module.

A service calls `Raft::new` to create a Raft peer, and then calls `Node::new` to
start the Raft peer. The `Node::get_state`, `Node::is_leader` and `Node::term`
will be called to get the current term of the node and whether it thinks it is
leader.

The `Node::start` will be called when the server need to append the command into
the log. `Node::start` should returns immediately without waiting for the log
appends to complete. A channel (`apply_ch`) is passed into `Raft::new` and you
should send a `ApplyMsg` to the channel for each newly committed log entry.

Your implements should use the provided `labrpc` crate to exchange RPCs. The
`labrpc` use channels to simulate sockets internally. This makes it easy to test
your code under challenging network conditions. Your defination of the RPCs
should in `src/raft/service.rs` and you should implement the RPC server in
`impl RaftService for Node`. A set of RPC clients (`peers`) is pass into
`Raft::new` for you to send RPCs to other nodes.

### Part 2A

In this part you should implement the leader election and heartbeats
(`AppendEntries` RPCs without log entries). You should make a single leader to be
elected, make leader to remain leader when there are no failures, and have a new
leader when old leader fails or packets to/from old leader is lost.

To run all the test in this lab, run `make test_2a`.

Here are some hints on this part:
- Add any state you need to the `Raft` struct.
- The `request_vote` RPC is already defined, you just need to fill the
`RequestVoteArgs` and `RequestVoteReply` struct. The lab use `labcodec` crate to
encode and decode messages in RPC, which internally use the `prost` external
crate. See the [prost document][prost] to know how to define structs that is used
as messages with `#[Derive(Message)]` and `#[prost(...)]`.
- You need to define the `append_entries` RPC by yourself. `labrpc` use a
`labrpc::service!` macro to define RPC service and generate server and client
traits from your defination. There is an example in `labrpc/examples/echo.rs`
which may help you to define new RPCs.
- This lab use things from the `futures` external crate heavily like the channels
and the `Future` trait. Read things about futures [here][futures].
- You need to make your code take actions periodically or after delays in time.
You can use lots of threads and call `std::thread::sleep` ([doc][sleep]), but you
can also use the `futures_timer::Delay` and other utilities from the
`futures-timer` external crate ([doc][futures-timer]).
- Don't forget that you need to make sure that the election timeouts in different
peers don't always fire at the same time, or else all peers will vote only for
themselves and no one will become the leader. You can generate random numbers
using the `rand` external crate ([doc][rand]).
- The tester limits the frequency of RPC calls to 10 times per second for each
pair of sender and reciever. Please do not just send RPCs repeatly without wait
for a timeout.
- The tester requires your Raft to elect a new leader within five seconds of the
failure of the old leader (if a majority of peers can still communicate). However,
leader election may require multiple rounds in case of a split vote (which can
happen if packets are lost or if candidates unluckily choose the same random
backoff times). You must pick election timeouts (and thus heartbeat intervals)
that are short enough that it's very likely that an election will complete in less
than five seconds even if it requires multiple rounds.
- But also, since the tester limits the frequency of RPC calls, your election
timeout should not be too small, and larger than the 150~300 milliseconds in the
paper's Section 5.2. Choose the values wisely.
- In Rust we lock data instead of code. Think carefully about what data should be
in the same lock.
- The Figure 2 in the [Raft paper][raftpaper] is usefule when you are in trouble
passing the test.
- It's useful to debug by printing log messages. We use the external
`log` crate ([doc][log]) to print logs in different levels. you can configure
the log level and scope by set LOG_LEVEL environment variable like
`LOG_LEVEL=labs6824=debug make test_2a`. This feature is provided by the
`env_logger` external crate ([doc][env_logger]), read it's documentation for
syntax of the log level. Also, you can collect the output in a file by redirct the
output to a file like `make test_2a 2>test.log`

[prost]:https://github.com/danburkert/prost
[futures]:https://docs.rs/futures/0.1/futures/
[sleep]:https://doc.rust-lang.org/std/thread/fn.sleep.html
[futures-timer]:https://docs.rs/futures-timer/0.1/futures_timer/index.html
[rand]:https://docs.rs/rand/0.4/rand/
[log]:https://docs.rs/log/0.4/log/
[env_logger]:https://docs.rs/env_logger/0.6/env_logger/

### Part 2B

In this part you should implement the log replication. You should implement the
`Node::Start` method, complete the rest fields in the `append_entries` RPC and
send them, and advance `commit_index` at leader.

To run all the test in this lab, run `make test_2b`. You can try to pass the
`test_basic_agree_2b` test first.

Here are some hints on this part:
- Don't forget election restriction, see section 5.4.1 in the
[Raft paper][raftpaper].
- Every server commit new entries independently by write to `apply_ch` in the
correct order. `apply_ch` is a `UnboundedSender` which will buffering the messages
until out of memory, so it is not that easy to create deadlocks as the original
go version.
- Give yourself enough time to rewrite your implementation because only after
writing a first implementation will you realize how to organize your code cleanly.
- The `test_count_2b` requires your number of RPCs to be not too much when there
is no failure. So you should optimize the number of RPCs to the minimum.
- You may need to write code that waits for certain events to occur. In that case
you can just use channels and wait on it.

### Part 2C

In this part you should implement persistence by first adding code that saves and
restores persistent state to `Persister`, like in `Raft::persist` and
`Raft::restore` using `labcodec`. You also need to determine what and when to
persist, and call `Raft::restore` in `Raft::new`.

To run all the test in this lab, run `make test_2b`. You can try to pass the
`test_persist1_2c` test first.

Here are some hints on this part:
- The usage of `labcodec` is covered in the hints of part 2A.
- This part also introduce various challenging test that involving servers failing
and the network losing RPC requests or replies. Check your implementation
carefully to find bugs that only present in this situation.
- In order to pass some of the challenging tests towards the end, such as those
marked "unreliable", you will need to implement the optimization to allow a
follower to back up the leader's nextIndex by more than one entry at a time. See
the description in the [Raft paper][raftpaper] starting at the bottom of page 7
and top of page 8 (marked by a gray line). However, the paper do not have many
details on this. You will need to fill in the gaps, perhaps with the help of
[this 6.824 Raft lectures][optimize-hint].

[optimize-hint]:http://nil.csail.mit.edu/6.824/2017/notes/l-raft2.txt

## The lab3: KvRaft

In this lab you will build a fault-tolerant key-value storage service using the
Raft module in lab 2. This lab has 2 parts named with 3A and 3B.

To run all the test in this lab, run `make test_3`. Please run the tests multiple
times to mak sure your are not passing the tests just by luck.

To run just a single test, run `make cargo_test_<insert test name here>`.

### The code structure

All your codes in this lab should be in the `src/kvraft/service.rs`,
`src/kvraft/server.rs` and `src/kvraft/client.rs`. The file name explains what
they are. Also, you need to modify the files you touched in lab2:raft.

### Part 3A

In this part you should first implement a solution that works when there are no
dropped messages, and no failed servers. Your service must ensure that `get(...)`
and `put_append(...)` are [linearizable][linearizable].

[linearizable]:https://en.wikipedia.org/wiki/Linearizability

That means, completed application calls to the methods on the `Clerk` struct in
`src/kvraft/client.rs` must appear to all clients to have affected the service in
the same linear order, even in there are failures and leader changes. A
`Clerk::Get` that starts after a completed `Clerk::Put` or `Clerk::Append` should
see the value written by the most recent `Clerk::Put` or `Clerk::Append` in the
linear order. Completed calls should have exactly-once semantics.

A reasonable plan of implementation should be:
- Client send RPC request in the `src/kvraft/client.rs`
- Enter the operation from client into the Raft log by `raft::Node::start` in the
RPC handler of `KvServer`
- Execute the operation when the raft log commited, and then reply to RPC

After implement that you should pass the basic one client test, run
`make cargo_test_basic_3a` to check it.

Here are some hints on this part:
- You should recieve commit message from Raft by `apply_ch` at the same time you
recieves RPC.
- If a leader has called `Raft::start` for a Clerk's RPC, but loses its leadership
before the request is committed to the log, you should make the client to re-send
the RPC request to other servers until it finds the new leader. You can detect
this by check things recieved from `apply_ch`.
- The `Clerk` client should remember who is the last leader, and try the last
leader first. This will avoid wasting time searching for the leader on every RPC.
- The server should not complete a `get` RPC if it is not part of a majority and
do not has up-to-date data. You can just put the get operation into the log, or
implement the optimization for read-only operations that is described in Section 8
in the [Raft paper][raftpaper].
- Think how to use lock at the beginning.

Then, you should deal with duplicate client requests, including situations where
the client sends a request to a server leader in one term, times out waiting for a
reply, and re-sends the request to a new leader in another term. The request
should always execute just once.

After this you should pass all tests in this part. To run all the test in
this lab, run `make test_3a`.

Here are some hints on this part:
- You will need to uniquely identify client operations to ensure that the key/
value service executes each one just once.
- Your scheme for duplicate detection should free server memory quickly, like
using a hashtable for and only for uncommitted logs.

### Part 3B

In your current implementation, a rebooting server replays the complete Raft log
in order to restore its state. However, it's not practical for a long-running
server to remember the complete Raft log forever.

Instead, you'll modify Raft and kvserver to cooperate to save space: from time to
time kvserver will persistently store a "snapshot" of its current state, and Raft
will discard log entries that precede the snapshot. When a server restarts (or
falls far behind the leader and must catch up), the server first installs a
snapshot and then replays log entries from after the point at which the snapshot
was created. Section 7 of the [Raft paper][raftpaper] outlines the scheme; you
will have to design the details.

The tester passes a `maxraftstate` to the `KvServer::new` indicating the maximum
allowed size of your persistent Raft state in bytes (including the log, but not
including snapshots). You should check the size of Raft state and when Raft state
size is approaching this threshold, it should save a snapshot, and tell the Raft
library that it has snapshotted, so that Raft can discard old log entries.
The `maxraftstate` is a `Option<usize>` and you do not have to snapshot when it
is `None`.

First you should modify the Raft implement to accept a compaction request and
discard entries before the given index, and continue operating while storing only
log entries after that index. The tests from lab2:raft should still pass.

Then you should modify the kvserver so that it can hands snapshots to Raft and
request compaction when Raft state grows too large. The snapshots should be saved
in `raft::Persister`

Here are some hints on this part:
- You are allowed to add methods to your Raft so that kvserver can manage the
process of trimming the Raft log and manage kvserver snapshots.
- You can test your Raft and kvserver's ability to operate with a trimmed log,
and its ability to re-start from the combination of a kvserver snapshot and
persisted Raft state, by running the Lab 3A tests while overriding `maxraftstate`
to `Some(1)`.
- Think about what should be in the snapshot. You should save new snapshot and
restore latest snapshot with `raft::Persister`.
- Uncommitted logs can also in snapshots, so your kvserver must still be able to
detect duplicated operations under this situation.

After that, you should define the `install_snapshot` RPC and the leader should
send this RPC when the leader has discarded the log entries the follower needs.
When a follower receives an `install_snapshot` RPC, it should send the snapshot to
kvserver (maybe in `apply_ch`).

After this you should pass all tests in this part. To run all the test in
this lab, run `make test_3b`.


Here are some hints on this part:
- You do not need to send the snapshots in multiple RPCs. Just send the entire
snapshot in a single `install_snapshot` RPC and that should be enough for this lab.
- You can try `test_snapshot_rpc_3b` first

[raftpaper]:https://raft.github.io/raft.pdf