Skip to content

Commit

Permalink
Update 25-fasterkv-recovery.md
Browse files Browse the repository at this point in the history
  • Loading branch information
badrishc committed May 6, 2021
1 parent ebe4c31 commit 37b2a78
Showing 1 changed file with 45 additions and 34 deletions.
79 changes: 45 additions & 34 deletions docs/_docs/25-fasterkv-recovery.md
Expand Up @@ -2,56 +2,43 @@
title: "FasterKV Checkpointing and Recovery"
permalink: /docs/fasterkv-recovery/
excerpt: "FasterKV Checkpointing and Recovery"
last_modified_at: 2021-04-29
last_modified_at: 2021-05-06
toc: true
---

## Checkpointing and Recovery

### Overall Summary

FASTER supports asynchronous non-blocking **checkpoint-based recovery**. Every new checkpoint persists (or makes durable) additional user-operations
(Read, Upsert or RMW). FASTER allows clients to keep track of operations that have persisted and those that have not using
a session-based API.
Recall that FASTER basically consists of a primary hash index operating over a hybrid log that spans disk
and main memory. By default, operations on main memory are lost on failure, similar to in-memory stores. To address
this, FASTER supports asynchronous non-blocking **checkpoint-based recovery**.

This feature is based on a recovery model called Concurrent Prefix Recovery (CPR for short). You can read more about
CPR in the research paper [here](https://www.microsoft.com/en-us/research/uploads/prod/2019/01/cpr-sigmod19.pdf).
Briefly, CPR is based on (periodic) group commit. However, instead of using an expensive
write-ahead log (WAL) which can kill FASTER's high performance, CPR: (1) provides a semantic description of committed
operations, of the form “all operations until offset Ti in session i”; and (2) uses asynchronous
incremental checkpointing instead of a WAL to implement group commit in a scalable bottleneck-free manner.

Recall that each FASTER client starts a session, associated with a unique session ID (or name). All FASTER session operations
(Read, Upsert, RMW) carry a monotonic sequence number (sequence numbers are implicit in case of async calls). At any point in
time, one may call the checkpointing API to initiate an asynchronous checkpoint of FASTER. After invoking the checkpoint, each FASTER
session is (eventually) notified of a commit point. A commit point consists of (1) a sequence number, such that all operations
until, and no operations after, that sequence number, are guaranteed to be persisted as part of that checkpoint; (2) an optional
exception list of operations that were not part of the commit because they went pending and could not complete before the
checkpoint, because the session was not active at the time of checkpointing.
The user invokes a checkpoint periodically. Every new checkpoint persists (or makes durable) the effects
of all operations (Read, Upsert, RMW) until that point. FASTER allows clients to keep track of which
operations have persisted using a session-based API. In this recovery model (details [here](#concurrent-prefix-recovery)), a
session can issue a sequence of operations with monotonic operation sequence numbers. After a checkpoint,
and during recovery, the session is informed of what prefix of issued sequence numbers is now durable on
the session.

The commit point information can be used by the session to clear any in-memory buffer of operations waiting to be performed.
During recovery, sessions can continue using `ResumeSession` invoked with the same session ID. The function returns the thread-local
sequence number until which that session hash been recovered. The new thread may use this information to replay all uncommitted
operations since that point.

With async session operations on FASTER, operations return as soon as they complete, before commit. In order to wait for commit,
you simply issue an `await session.WaitForCommitAsync()` call. The call completes only after the operation is made persistent by
an asynchronous commit (checkpoint). The user is responsible for initiating the checkpoint asynchronously.
Note that FASTER operations return as soon as they complete in memory. In order to wait for commit, you simply issue
an `await session.WaitForCommitAsync()` call. This call completes only after all prior operations on the session are
made persistent. Note that the user is responsible for initiating the checkpoint.

### Taking Checkpoints

A FASTER checkpoint consists of an optional index checkpoint, coupled with a later log
checkpoint. FASTER first recovers the index and then replays the relevant part of the log
to get back to a consistent recovered state. If an index checkpoint is unavailable, FASTER
replays the entire log to reconstruct the index. An index checkpoint is taken as follows:
replays the entire log to reconstruct the index. An index-only checkpoint is taken as follows:

```cs
await store.TakeIndexCheckpointAsync();
```

FASTER supports two notions of log checkpointing: Snapshot and Fold-Over.
FASTER supports two types of log checkpointing: Snapshot and Fold-Over.

### Snapshot Checkpoint
### Snapshot Log Checkpoint

This checkpoint is a full snapshot of in-memory portion of the hybrid log into a separate
snapshot file in the checkpoint folder. We recover using the main log followed by reading the
Expand All @@ -64,7 +51,7 @@ a delta log file in the same folder as the base snapshot. This is specified usin
await store.TakeHybridLogCheckpointAsync(CheckpointType.Snapshot, tryIncremental: false);
```

### Fold-Over Checkpoint
### Fold-Over Log Checkpoint

A fold-over checkpoint simply flushes the main data log to disk, making it read-only, and
writes a small metadata file (`info.dat`) to the checkpoint folder. This is an incremental
Expand All @@ -91,8 +78,9 @@ await store.TakeFullCheckpointAsync(CheckpointType.FoldOver);
```

This is usually more expensive than log-only checkpoints as it needs to write the entire
hash table to disk. A preferred approach is to take frequent log-only checkpoints and
take an index checkpoint at coarse grained intervals in order to reduce recovery time.
hash table to disk. When using a large hash index, a preferred approach is to take
frequent log-only checkpoints and take an index checkpoint at coarse-grained intervals
in order to reduce recovery time.

### Checkpoint Management

Expand All @@ -103,7 +91,7 @@ setting `CheckpointSettings.RemoveOutdated` to `true`. The last two index checkp
and the last log checkpoint are kept. We keep the last two index checkpoints because the
last index checkpoint may not be usable in case there is no subsequent log checkpoint
available. Make sure every index checkpoint is followed by at least one log checkpoint, for
the index checkpoint to be usable for recovery.
the index checkpoint to be usable during recovery.

### Examples

Expand Down Expand Up @@ -181,4 +169,27 @@ public class PersistenceExample
t.Start();
}
}
```
```


### Concurrent Prefix Recovery

Concurrent Prefix Recovery (CPR for short) is the default recovery model of FASTER. You can read more about
CPR in the research paper [here](https://www.microsoft.com/en-us/research/uploads/prod/2019/01/cpr-sigmod19.pdf).
Briefly, CPR is based on (periodic) group commit. However, instead of using an expensive write-ahead log (WAL)
which can kill FASTER's high performance, CPR: (1) provides a semantic description of committed
operations, of the form “all operations until offset Ti in session i”; and (2) uses asynchronous
incremental checkpointing instead of a WAL to implement group commit in a scalable bottleneck-free manner.

Recall that each FASTER client starts a session, associated with a unique session ID (or name). All FASTER session operations
(Read, Upsert, RMW) carry a monotonic sequence number (sequence numbers are implicit in case of async calls). At any point in
time, one may call the checkpointing API to initiate an asynchronous checkpoint of FASTER. After invoking the checkpoint, each FASTER
session is (eventually) notified of a commit point. A commit point consists of (1) a sequence number, such that all operations
until, and no operations after, that sequence number, are guaranteed to be persisted as part of that checkpoint; (2) an optional
exception list of operations that were not part of the commit because they went pending and could not complete before the
checkpoint, because the session was not active at the time of checkpointing.

The commit point information can be used by the session to clear any in-memory buffer of operations waiting to be performed.
During recovery, sessions can continue using `ResumeSession` invoked with the same session ID. The function returns the thread-local
sequence number until which that session hash been recovered. The new thread may use this information to replay all uncommitted
operations since that point.

0 comments on commit 37b2a78

Please sign in to comment.