Skip to content

Commit

Permalink
Create 8.0 docs
Browse files Browse the repository at this point in the history
  • Loading branch information
otoolep committed Dec 6, 2023
1 parent 876d288 commit 4131b5a
Show file tree
Hide file tree
Showing 8 changed files with 41 additions and 54 deletions.
12 changes: 6 additions & 6 deletions content/en/docs/Clustering/Automatic clustering/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,27 @@ For simplicity, let's assume you want to run a 3-node rqlite cluster. The networ
Node 1:
```bash
rqlited -node-id 1 -http-addr=$HOST1:4001 -raft-addr=$HOST1:4002 \
-bootstrap-expect 3 -join http://$HOST1:4001,http://$HOST2:4001,http://$HOST3:4001 data
-bootstrap-expect 3 -join $HOST1:4002,$HOST2:4002,$HOST3:4002 data
```
Node 2:
```bash
rqlited -node-id 2 -http-addr=$HOST2:4001 -raft-addr=$HOST2:4002 \
-bootstrap-expect 3 -join http://$HOST1:4001,http://$HOST2:4001,http://$HOST3:4001 data
-bootstrap-expect 3 -join $HOST1:4002,$HOST2:4002,$HOST3:4002 data data
```
Node 3:
```bash
rqlited -node-id 3 -http-addr=$HOST3:4001 -raft-addr=$HOST3:4002 \
-bootstrap-expect 3 -join http://$HOST1:4001,http://$HOST2:4001,http://$HOST3:4001 data
-bootstrap-expect 3 -join $HOST1:4002,$HOST2:4002,$HOST3:4002 data data
```

`-bootstrap-expect` should be set to the number of nodes that must be available before the bootstrapping process will commence, in this case 3. You also set `-join` to the HTTP URL of all 3 nodes in the cluster. **It's also required that each launch command has the same values for `-bootstrap-expect` and `-join`.**
`-bootstrap-expect` should be set to the number of nodes that must be available before the bootstrapping process will commence, in this case 3. You also set `-join` to the Radr addresses of all 3 nodes in the cluster. **It's also required that each launch command has the same values for `-bootstrap-expect` and `-join`.**

After the cluster has formed, you can launch more nodes with the same options. A node will always attempt to first perform a normal cluster-join using the given join addresses, before trying the bootstrap approach.

### Docker
With Docker you can launch every node identically:
```bash
docker run rqlite/rqlite -bootstrap-expect 3 -join http://$HOST1:4001,http://$HOST2:4001,http://$HOST3:4001
docker run rqlite/rqlite -bootstrap-expect 3 -join $HOST1:4002,$HOST2:4002,$HOST3:4002
```
where `$HOST[1-3]` are the expected network addresses of the containers.

Expand Down Expand Up @@ -152,6 +152,6 @@ If you wish a single Consul or etcd key-value system to support multiple rqlite
## Design
When using _Automatic Bootstrapping_, each node notifies all other nodes of its existence. The first node to have been contacted by enough other nodes (set by `-boostrap-expect`) boostraps the cluster. Only one node can bootstrap a cluster, so any other node that attempts to do so later will fail, and instead become a _Follower_ in the new cluster.

When using either Consul or etcd for automatic clustering, rqlite uses the key-value store of those systems. In each case only one node will succeed in atomically setting its HTTP URL in the key-value store. This node will then declare itself Leader, and other nodes will then join with it. To prevent multiple nodes updating the Leader key at once, nodes uses a check-and-set operation, only updating the Leader key if its value has not changed since it was last read by the node. See [this blog post](https://www.philipotoole.com/rqlite-7-0-designing-node-discovery-and-automatic-clustering/) for more details on the design.
When using either Consul or etcd for automatic clustering, rqlite uses the key-value store of those systems. In each case only one node will succeed in atomically setting its HTTP URL and Raft address in the key-value store. This node will then declare itself Leader, and other nodes will then join with it. To prevent multiple nodes updating the Leader key at once, nodes uses a check-and-set operation, only updating the Leader key if its value has not changed since it was last read by the node. See [this blog post](https://www.philipotoole.com/rqlite-7-0-designing-node-discovery-and-automatic-clustering/) for more details on the design.

For DNS-based discovery, the rqlite nodes simply resolve the hostname, and use the returned network addresses, once the number of returned addresses is at least as great as the `-bootstrap-expect` value. Boostrapping then proceeds as though the network addresses were passed at the command line via `-join`.
8 changes: 3 additions & 5 deletions content/en/docs/Clustering/general-guidelines/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,19 @@ _It's called the "Raft" address because that will be the port the node will use

With this command a single node is started, listening for client requests on port 4001 and listening on port 4002 for intra-cluster communication requests from other nodes. Note that the addresses passed to `-http-addr` and `-raft-addr` must be reachable from other nodes so that nodes can find each other over the network -- these addresses will be broadcast to other nodes during the _Join_ operation. If a node needs to bind to one address, but advertise a different address to other nodes, you must also set `-http-adv-addr` and `-raft-adv-addr`.

`-node-id` can be any string, as long as it's unique for the cluster. It also shouldn't change, once chosen for this node. The network addresses can change however. This node stores its state at `~/node`.
`-node-id` can be any string, as long as it's unique for the cluster. It also shouldn't change, once chosen for this node. The network addresses can change however. This node stores its state in `~/node`.

To join a second node to this leader, execute the following command on _host2_:
```bash
# Run this on host 2:
$ rqlited -node-id 2 -http-addr host2:4001 -raft-addr host2:4002 -join http://host1:4001 ~/node
$ rqlited -node-id 2 -http-addr host2:4001 -raft-addr host2:4002 -join host1:4002 ~/node
```
_If a node receives a join request, and that node is not actually the leader of the cluster, the receiving node will automatically redirect the requesting node to the Leader node. As a result a node can actually join a cluster by contacting any node in the cluster. You can also specify multiple join addresses, and the node will try each address until joining is successful._

Once executed you now have a cluster of two nodes. But for fault-tolerance you actually need a 3-node cluster, so launch a third node like so on _host3_:
```bash
# Run this on host 3:
$ rqlited -node-id 3 -http-addr host3:4001 -raft-addr host3:4002 -join http://host1:4001 ~/node
$ rqlited -node-id 3 -http-addr host3:4001 -raft-addr host3:4002 -join host1:4002 ~/node
```
_When simply restarting a node, there is no further need to pass `-join`. However, if a node does attempt to join a cluster it is already a member of, and neither its node ID or Raft network address has changed, then the cluster Leader will ignore the join request as there is nothing to do -- the joining node is already a fully-configured member of the cluster. However, if either the node ID or Raft network address of the joining node has changed, the cluster Leader will first automatically remove the joining node from the cluster configuration before processing the join request. For most applications this is an implementation detail which can be safely ignored, and cluster-joins are basically idempotent._

Expand Down Expand Up @@ -87,8 +87,6 @@ where `host` is any node in the cluster. If you do not remove a failed node the
If you cannot bring sufficient nodes back online such that the cluster can elect a leader, follow the instructions in the section titled _Dealing with failure_.

### Removing a node automatically on shutdown
> This option is **not supported** on clusters which enable Basic Auth on the HTTP API.
Sometimes it makes sense for a node to automatically remove itself when it gracefully shuts down. If you want this behaviour, pass `-raft-cluster-remove-shutdown=true` to rqlite at launch time. If the node is shut down **gracefully** (it receives `SIGTERM` for example) it will first contact the Leader and remove itself from the cluster, and then the rqlite process will terminate. As a result the Leader will not continue to contact the node after it shuts down. This removal operation also reduces the cluster quorum size.

### Automatically removing failed nodes
Expand Down
2 changes: 1 addition & 1 deletion content/en/docs/FAQ/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Then they must do it by sending write requests to the leader node. But if they c
It supports [a form of transactions](/docs/api/api/#transactions). You can wrap a bulk update in a transaction such that all the statements in the bulk request will succeed, or none of them will. However the behaviour or rqlite is undefined if you send explicit `BEGIN`, `COMMIT`, or `ROLLBACK` statements. This is not because they won't work -- they will -- but if your node (or cluster) fails while a transaction is in progress, the system may be left in a hard-to-use state. So until rqlite can offer strict guarantees about its behaviour if it fails during a transaction, using `BEGIN`, `COMMIT`, and `ROLLBACK` is officially unsupported. Unfortunately this does mean that rqlite may not be suitable for some applications.

## Can I modify the SQLite file directly?
No, you must only change the database using the HTTP API. The moment you directly modify the SQLite file under any node (if running in _on-disk_ ) the behavior of rqlite is undefined. In otherwords, you run the risk of breaking your cluster.
No, you must only change the database using the HTTP API. The moment you directly modify the SQLite file, including any changes to its journaling mode, under any node the behavior of rqlite is undefined. In otherwords, you run the risk of breaking your cluster.

## Can I read the SQLite file directly?
Yes, you can read the SQLite file directly, some end-users do this in production, but it has not been deeply tested.
Expand Down
3 changes: 3 additions & 0 deletions content/en/docs/Guides/Monitoring rqlite/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ runtime:
```bash
curl localhost:4001/nodes?pretty

# Request an improved JSON format, which is easier for parsing.
curl localhost:4001/nodes?pretty&ver=2

# Also check non-voting nodes.
curl localhost:4001/nodes?nonvoters&pretty

Expand Down
46 changes: 10 additions & 36 deletions content/en/docs/Guides/Performance/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,6 @@ weight: 30
---
rqlite replicates SQLite for fault-tolerance. It does not replicate it for performance. In fact performance is reduced relative to a standalone SQLite database due to the nature of distributed systems. _There is no such thing as a free lunch_.

## In-memory databases

By default rqlite uses an in-memory SQLite database to maximise performance. In this mode no actual SQLite file is created and the entire database is stored in memory. If you wish rqlite to use an actual file-based SQLite database, pass `-on-disk` to rqlite on start-up.

**Does using an in-memory SQLite database put my data at risk?**

No.

Since the Raft log is the authoritative store for all data, and it is stored on disk by each node, an in-memory database can be fully recreated on start-up from the information stored in the Raft log. Using an in-memory database does not put your data at risk.

## Performance Factors

rqlite performance -- defined as the number of database updates performed in a given period of time -- is primarily determined by two factors:
Expand All @@ -25,10 +15,10 @@ rqlite performance -- defined as the number of database updates performed in a g
Depending on your machine (particularly its IO performance) and network, individual INSERT performance could be anything from 10 operations per second to more than 200 operations per second.

### Disk
Disk performance is the single biggest determinant of rqlite performance _on a low-latency network_. This is because every change to the system must go through the Raft subsystem, and the Raft subsystem calls `fsync()` after every write to its log. Raft does this to ensure that the change is safely persisted in permanent storage before writing those changes to the SQLite database. This is why rqlite runs with an in-memory database by default, as using as on-disk SQLite database would put even more load on the disk, reducing the disk throughput available to Raft.
Disk performance is the single biggest determinant of rqlite performance _on a low-latency network_. This is because every change to the system must go through the Raft subsystem, and the Raft subsystem calls `fsync()` after every write to its log. Raft does this to ensure that the change is safely persisted in permanent storage before writing those changes to the SQLite database.

### Network
When running a rqlite cluster, network latency is also a factor -- and will become the performance bottleneck once latency gets high enough. This is because Raft must contact every node **twice** before a change is committed to the Raft log. Obviously the faster your network, the shorter the time to contact each node.
When running a rqlite cluster, network latency is also a factor -- and will become the performance bottleneck once latency gets high enough. This is because the Raft must contact every other node **twice** before a change is committed to the Raft log (though it does contact those nodes in parallel). Obviously the faster your network, the shorter the time to contact the nodes.

## Improving Performance

Expand All @@ -40,7 +30,7 @@ The more SQLite statements you can include in a single request to a rqlite node,
By using the [bulk API](/docs/api/bulk-api/), transactions, or both, throughput will increase significantly, often by 2 orders of magnitude. This speed-up is due to the way Raft and SQLite work. So for high throughput, execute as many operations as possible within a single request, transaction, or both.

### Queued Writes
If you can tolerate a small risk of some data loss in the event that a node crashes, you could consider using [Queued Writes](/docs/api/queued-writes/). Using Queued Writes can easily give you orders of magnitude improvement in performance, without any need to change client code.
If you can tolerate a very small risk of some data loss in the event that a node crashes, you could consider using [Queued Writes](/docs/api/queued-writes/). Using Queued Writes can easily give you orders of magnitude improvement in performance, without any need to change client code.

### Use more powerful hardware
Obviously running rqlite on better disks, better networks, or both, will improve performance.
Expand All @@ -52,36 +42,20 @@ mount -t tmpfs -o size=512m tmpfs /mnt/ramdisk
```
**This comes with risks, however**. The behavior of rqlite when a node fails, but committed entries in the Raft log have not actually been permanently persisted, **is not defined**. But if your policy is to completely deprovision your rqlite node, or rqlite cluster, in the event of any node failure, this option may be of interest to you. Perhaps you always rebuild your rqlite cluster from a different source of data, so can recover an rqlite cluster regardless of its state. Testing shows that using rqlite with a memory-only file system can result in 100x improvement in performance.

### Improving read-write concurrency
SQLite can offer better concurrent read and write support when using an on-disk database, compared to in-memory databases. But as explained above, using an on-disk SQLite database may impact write performance. But since the database-update performance will be better with an in-memory database, improving read-write concurrency may not be needed in practise.

However, if you enable an on-disk SQLite database, but then place the SQLite database on a memory-backed file system, you can have the best of both worlds. You can dedicate your disk to the Raft log, but still get better read-write concurrency with SQLite. You can specify the SQLite database file path via the `-on-disk-path` flag.

An alternative approach would be to place the SQLite on-disk database on a different physical disk than that storing the Raft log, but this may still not be as performant as an in-memory file system for the SQLite database. You should run your own testing to determine which setup meets your needs.

## In-memory Database Limits

> **rqlite was not designed for very large datasets**: While there are no hardcoded limits in the rqlite software, the nature of Raft means that the entire SQLite database is periodically copied to disk, and occasionally copied, in full, between nodes. Your hardware may not be able to process those large data operations successfully. You should test your system carefully when working with multi-GB databases.
In-memory SQLite databases (the default configuration) are currently limited to 2GiB in size. One way to get around this limit is to use an on-disk database, by passing `-on-disk` to `rqlited`. But this may impact write-performance, since disk is slower than memory. However, when running in on-disk mode, rqlite uses SQLite WAL mode, which uses the disk efficiently and any drop in write-performance may not be significant.

If you switch to on-disk SQLite, and find write-performance suffers more than you like, there are a couple of ways to address this. One option is to place the Raft log on one disk, and the SQLite database on a different disk.

Another option is to run rqlite in on-disk mode but place the SQLite database file on a memory-backed filesystem. That way you can use larger databases, and still have performance similar to running with an in-memory SQLite database.

In either case to control where rqlite places the SQLite database file, set `-on-disk-path` when launching `rqlited`. **Note that you should still place the `data` directory on an actual disk, so that the Raft log is always on a physical disk, to ensure your data is not lost if a node restarts.**
### Placing the SQLite database on a different file system
Another option is to run rqlite with the SQLite database file on a different filesystem than the Raft log. This can result in better write performance as each system gets its own dedicated I/O resources.

### Linux examples
#### Linux examples
The first example below shows rqlite storing the Raft log on one disk (_disk1_), but the on-disk SQLite database on a second disk (_disk2_).
```bash
rqlited -on-disk -on-disk-path /disk2/node1/db.sqlite /disk1/data
rqlited -on-disk-path /disk2/node1/db.sqlite /disk1/data
```

A second example of running rqlite with a SQLite file on a memory-backed file system, and keeping the data directory on persistent disk, is shown below. The data directory is where the Raft log is stored. The example below would allow up to a 4GB SQLite database.
A second example of running rqlite with a SQLite file on a memory-backed file system, but keeping the data directory on persistent disk, is shown below. The data directory is where the Raft log is stored.
```bash
# Create a RAM disk, and then launch rqlite, telling it to
# put the SQLite database on the RAM disk.
mount -t tmpfs -o size=4096m tmpfs /mnt/ramdisk
rqlited -on-disk -on-disk-path /mnt/ramdisk/node1/db.sqlite /path_to_persistent_disk/data
rqlited -on-disk-path /mnt/ramdisk/node1/db.sqlite /path_to_persistent_disk/data
```
where `/path_to_persistent_disk/data` is a directory path on your persistent disk.
where `/path_to_persistent_disk/data` is a directory path on your persistent disk. Because the Raft log is the authoritative source of truth, storing the SQLite database on a memory-backed filesystem does not risk data loss. rqlite always completely rebuilds the SQLite database from scratch on restart.

0 comments on commit 4131b5a

Please sign in to comment.