Skip to content

Commit

Permalink
docs/coding-guidelines: Use one line per sentence
Browse files Browse the repository at this point in the history
  • Loading branch information
mxinden committed Aug 17, 2022
1 parent eb0b75d commit ecf86c8
Showing 1 changed file with 36 additions and 58 deletions.
94 changes: 36 additions & 58 deletions docs/coding-guidelines.md
Expand Up @@ -26,9 +26,7 @@ Below is a set of coding guidelines followed across the rust-libp2p code base.

## Hierarchical State Machines

If you sqint, rust-libp2p is just a big hierarchy of [state
machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass
events down to their children and children pass events up to their parents.
If you sqint, rust-libp2p is just a big hierarchy of [state machines](https://en.wikipedia.org/wiki/Finite-state_machine) where parents pass events down to their children and children pass events up to their parents.

![Architecture](architecture.svg)

Expand All @@ -55,23 +53,22 @@ events down to their children and children pass events up to their parents.
```
</details>

Using hierarchical state machines is a deliberate choice throughout the
rust-libp2p code base. It makes reasoning about control and data flow simple. It
works well with Rust's `Future` model. It allows fine-grain control e.g. on the
order child state machines are polled.
Using hierarchical state machines is a deliberate choice throughout the rust-libp2p code base.
It makes reasoning about control and data flow simple.
It works well with Rust's `Future` model.
It allows fine-grain control e.g. on the order child state machines are polled.

The above comes with downsides. It feels more verbose. The mix of control flow (`loop`, `return`,
`break`, `continue`) in `poll` functions together with the asynchronous and thus decoupled
communication via events can be very hard to understand. Both are a form of complexity that we are
trading for correctness and performance which aligns with Rust's and rust-libp2p's goals.
The above comes with downsides.
It feels more verbose.
The mix of control flow (`loop`, `return`, `break`, `continue`) in `poll` functions together with the asynchronous and thus decoupled communication via events can be very hard to understand.
Both are a form of complexity that we are trading for correctness and performance which aligns with Rust's and rust-libp2p's goals.

The architecture pattern of hierarchical state machines should be used wherever possible.

### Conventions for `poll` implementations

The `poll` method of a single state machine can be complex especially when that
state machine itself `poll`s many child state machines. The patterns shown below
have proven useful and should be followed across the code base.
The `poll` method of a single state machine can be complex especially when that state machine itself `poll`s many child state machines.
The patterns shown below have proven useful and should be followed across the code base.

``` rust
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>{
Expand Down Expand Up @@ -124,9 +121,8 @@ fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>{

### Prioritize local work over new work from a remote

When handling multiple work streams, prioritize local work items over
accepting new work items from a remote. Take the following state machine as an
example, reading and writing from a socket, returning result to its parent:
When handling multiple work streams, prioritize local work items over accepting new work items from a remote.
Take the following state machine as an example, reading and writing from a socket, returning result to its parent:

``` rust
struct SomeStateMachine {
Expand Down Expand Up @@ -172,34 +168,23 @@ This priotization provides:

## Bound everything

The concept of unboundedness is an illusion. Use bounded mechanisms to prevent
unbounded memory growth and high latencies.
The concept of unboundedness is an illusion.
Use bounded mechanisms to prevent unbounded memory growth and high latencies.

### Channels

When using channels (e.g. `futures::channel::mpsc` or `std::sync::mpsc`)
always use the bounded variant, never use the unbounded variant. When using a
bounded channel, a slow consumer eventually slows down a fast producer once
the channel bound is reached, ideally granting the slow consumer more system
resources e.g. CPU time, keeping queues small and thus latencies low. When
using an unbounded channel a fast producer continues being a fast producer,
growing the channel buffer indefinitely, increasing latency until the illusion
of unboundedness breaks and the system runs out of memory.
When using channels (e.g. `futures::channel::mpsc` or `std::sync::mpsc`) always use the bounded variant, never use the unbounded variant.
When using a bounded channel, a slow consumer eventually slows down a fast producer once the channel bound is reached, ideally granting the slow consumer more system resources e.g. CPU time, keeping queues small and thus latencies low.
When using an unbounded channel a fast producer continues being a fast producer, growing the channel buffer indefinitely, increasing latency until the illusion of unboundedness breaks and the system runs out of memory.

One may use an unbounded channel if one enforces backpressure through an
out-of-band mechanism, e.g. the consumer granting the producer send-tokens
through a side-channel.
One may use an unbounded channel if one enforces backpressure through an out-of-band mechanism, e.g. the consumer granting the producer send-tokens through a side-channel.

### Local queues

As for channels shared across potentially concurrent actors (e.g. future tasks
or OS threads), the same applies for queues owned by a single actor only. E.g.
reading events from a socket into a `Vec<Event>` without some mechanism
bounding the size of that `Vec<Event>` again can lead to unbounded memory
growth and high latencies.
As for channels shared across potentially concurrent actors (e.g. future tasks or OS threads), the same applies for queues owned by a single actor only.
E.g. reading events from a socket into a `Vec<Event>` without some mechanism bounding the size of that `Vec<Event>` again can lead to unbounded memory growth and high latencies.

Note that rust-libp2p fails at this guideline, i.e. still has many unbounded
local queues.
Note that rust-libp2p fails at this guideline, i.e. still has many unbounded local queues.

### Further reading

Expand All @@ -209,26 +194,23 @@ local queues.

## No premature optimizations

Optimizations that add complexity need to be accompanied with a proof of their
effectiveness.
Optimizations that add complexity need to be accompanied with a proof of their effectiveness.

This as well applies to increasing buffer or channel sizes, as the downside of
such pseudo optimizations is increased memory footprint and latency.
This as well applies to increasing buffer or channel sizes, as the downside of such pseudo optimizations is increased memory footprint and latency.

## Keep things sequential unless proven to be slow

Concurrency adds complexity. Concurrency adds overhead due to synchronization.
Thus unless proven to be a bottleneck, don't make things concurrent. As an example
the hierarchical `NetworkBehaviour` state machine runs sequentially. It is easy
to debug as it runs sequentially. Thus far there has been no proof that
shows a speed up when running it concurrently.
Concurrency adds complexity.
Concurrency adds overhead due to synchronization.
Thus unless proven to be a bottleneck, don't make things concurrent.
As an example the hierarchical `NetworkBehaviour` state machine runs sequentially.
It is easy to debug as it runs sequentially.
Thus far there has been no proof that shows a speed up when running it concurrently.

## Use `async/await` for sequential execution only

Using `async/await` for sequential execution makes things significantly simpler.
Though unfortunately using `async/await` does not allow accesing methods on the
object being `await`ed unless paired with some synchronization mechanism like an
`Arc<Mutex<_>>`.
Though unfortunately using `async/await` does not allow accesing methods on the object being `await`ed unless paired with some synchronization mechanism like an `Arc<Mutex<_>>`.

Example: Read and once done write from/to a socket. Use `async/await`.

Expand Down Expand Up @@ -260,25 +242,21 @@ loop {
}
```

When providing `async` methods, make it explicit whether it is safe to cancel
the resulting `Future`, i.e. whether it is safe to drop the `Future` returned
by the `async` method.
When providing `async` methods, make it explicit whether it is safe to cancel the resulting `Future`, i.e. whether it is safe to drop the `Future` returned by the `async` method.

## Don't communicate by sharing memory; share memory by communicating.

The majority of rust-libp2p's code base follows the above Golang philosophy,
e.g. using channels instead of mutexes. This pattern enforces single ownership
over data, which works well with Rust's ownership model and makes reasoning
about data flow easier.
The majority of rust-libp2p's code base follows the above Golang philosophy, e.g. using channels instead of mutexes.
This pattern enforces single ownership over data, which works well with Rust's ownership model and makes reasoning about data flow easier.

### Further Reading

- https://go.dev/blog/codelab-share

## Use iteration not recursion

Rust does not support tail call optimization, thus using recursion may grow the
stack potentially unboundedly. Instead use iteration e.g. via `loop` or `for`.
Rust does not support tail call optimization, thus using recursion may grow the stack potentially unboundedly.
Instead use iteration e.g. via `loop` or `for`.

### Further Reading

Expand Down

0 comments on commit ecf86c8

Please sign in to comment.