From 3c6e472ba3b03e5626c6f03948310f2623e94aaa Mon Sep 17 00:00:00 2001 From: MasterPtato Date: Wed, 12 Nov 2025 17:41:45 -0800 Subject: [PATCH] chore: write/update docs --- CLAUDE.md | 3 - docs/engine/ACTOR_KV.md | 68 +++++ docs/engine/ACTOR_LIFECYCLE.md | 111 ++++++++ docs/engine/GASOLINE.md | 11 + docs/engine/GUARD.md | 41 +++ docs/engine/HIBERNATING_WS.md | 25 ++ docs/engine/RUNNER_SHUTDOWN.md | 17 ++ docs/engine/SERVERLESS.md | 273 ++++++++++++++++++++ docs/engine/TERMINOLOGY.md | 10 + engine/sdks/schemas/runner-protocol/v3.bare | 1 + 10 files changed, 557 insertions(+), 3 deletions(-) create mode 100644 docs/engine/ACTOR_KV.md create mode 100644 docs/engine/ACTOR_LIFECYCLE.md create mode 100644 docs/engine/GASOLINE.md create mode 100644 docs/engine/GUARD.md create mode 100644 docs/engine/HIBERNATING_WS.md create mode 100644 docs/engine/RUNNER_SHUTDOWN.md create mode 100644 docs/engine/SERVERLESS.md create mode 100644 docs/engine/TERMINOLOGY.md diff --git a/CLAUDE.md b/CLAUDE.md index 27f523d2a7..d5b7daf605 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -40,9 +40,6 @@ cargo test -- --nocapture # Check for linting issues cargo clippy -- -W warnings - -# When adding a new package to the workspace -deno run -A scripts/cargo/update_workspace.ts ``` ### Docker Development Environment diff --git a/docs/engine/ACTOR_KV.md b/docs/engine/ACTOR_KV.md new file mode 100644 index 0000000000..69a6935709 --- /dev/null +++ b/docs/engine/ACTOR_KV.md @@ -0,0 +1,68 @@ +# Actor KV Storage + +Each actor has its own private KV store which can be manipulated or accessed with the various provided KV operations. + +## Keys and Values + +A KV key is a byte array, aka a blob. A KV value is also a byte array/blob. + +Every set KV value contains metadata which includes the version and create timestamp of the key (version being a string byte array denoting the version of the Rivet Engine). + +## Operations + +### Get + +- Input + - List of keys +- Output + - List of keys + - List of values + - List of metadata + +Keys that don't exist aren't included in the output so it is important to read the output's list of keys. + +### List + +- Input + - Query mode + - All - Lists all keys up to the given limit + - Range - Lists all keys between the two given keys (exclusivity toggleable) + - Prefix - Lists all keys with the given key as a prefix + - Reverse - Whether to iterate keys in descending order instead of ascending + - Limit - how maximum returned keys +- Output + - List of keys + - List of values + - List of metadata + +### Put + +- Input + - List of keys + - List of values +- Output + - Empty + +### Delete + +- Input + - List of keys +- Output + - Empty + +### Drop + +- Input + - Empty +- Output + - Empty + +This operation deletes all keys in the entire actor's KV store. Use cautiously. + +## Errors + +Every operation can return an error instead of its regular response. The error includes a message string. + +## Implementation Details + +Each KV request has a u32 request ID which is to be provided by the user (handled internally by RivetKit). Rivet makes no attempt to order or deduplicate the responses to KV requests, it is up to the client to match the responses to the requests via the request ID. diff --git a/docs/engine/ACTOR_LIFECYCLE.md b/docs/engine/ACTOR_LIFECYCLE.md new file mode 100644 index 0000000000..7136b38a6f --- /dev/null +++ b/docs/engine/ACTOR_LIFECYCLE.md @@ -0,0 +1,111 @@ +# Actor Lifecycle Flow Diagram + +```mermaid +--- +config: + theme: mc + look: classic +--- +sequenceDiagram + participant A as API + participant U as User + participant G as Gateway + + participant R as Runner + participant RWS as Runner WS + participant RWF as Runner Workflow + participant AWF as Actor Workflow + + critical runner connection + R->>RWS: Connect + RWS->>RWF: Create runner workflow + end + + critical actor creation + U->>A: POST /actors + + A->>AWF: Create actor workflow + A->>U: + end + + critical initial request + U->>G: Request to actor + note over G: Await actor ready + + critical actor allocation + note over AWF: Allocate + + AWF->>RWF: Send StartActor + + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + RWS->>RWF: + RWF->>AWF: + note over AWF: Publish Ready msg + end + AWF->>G: Receive runner ID + + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle request + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + + critical second request + U->>G: Request to actor + note over G: Actor already connectable + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle request + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + + note over A, AWF: Time passes + + critical actor sleep + R->>RWS: Actor intent: Sleep + RWS->>RWF: + RWF->>AWF: + note over AWF: Mark as sleeping + AWF->>RWF: Send StopActor + RWF->>RWS: + RWS->>R: + note over R: Stop actor + R->>RWS: Actor state update: Stopped + RWS->>RWF: + RWF->>AWF: + note over AWF: Sleep + end + + critical request to sleeping actor + U->>G: Request to actor + note over G: Actor sleeping + G->>AWF: Wake + note over G: Await actor ready + critical actor allocation + note over AWF: Allocate + AWF->>RWF: Send StartActor + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + RWS->>RWF: + RWF->>AWF: + note over AWF: Publish Ready msg + end + AWF->>G: Receive runner ID + + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle request + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end +``` \ No newline at end of file diff --git a/docs/engine/GASOLINE.md b/docs/engine/GASOLINE.md new file mode 100644 index 0000000000..8d6ef0cdce --- /dev/null +++ b/docs/engine/GASOLINE.md @@ -0,0 +1,11 @@ +# Gasoline + +Gasoline (at engine/packages/gasoline) is the durable execution engine running most persistent things on Rivet Engine. + +Gasoline consists of: +- Workflows - Similar to the concept of actors (not Rivet Actors) which can sleep (be removed from memory) when not in use +- Signals - Facilitates intercommunication between workflow <-> workflow and other services (such as api) -> workflow +- Messages - Ephemeral "fire-and-forget" communication between workflows -> other services +- Activities - Individual steps in a workflow, each can be individually retried upon failure and "replayed" instead of re-executed with every workflow run +- Operations - Thin wrappers around native rust functions. Provided for clean interop with the Gasoline ecosystem + diff --git a/docs/engine/GUARD.md b/docs/engine/GUARD.md new file mode 100644 index 0000000000..a4973ee2f5 --- /dev/null +++ b/docs/engine/GUARD.md @@ -0,0 +1,41 @@ +# Rivet Guard + +Guard facilitates HTTP communication between the public internet and various internal rivet services, as well as tunnelling connections to actors. + +## Routing + +Guard uses request path and/or headers to route requests. + +### Actors + +Guard routes requests to actors when: +- the path matches `/gateway/{actor_id}/{...path}` +- the path matches `/gateway/{actor_id}@{token}/{...path}` +- when connecting a websocket, `Sec-Websocket-Protocol` consists of comma delimited dot separated pairs like `rivet_target.actor,rivet_actor.{actor_id}` +- otherwise, when the `X-Rivet-Target` header is set to `actor` and `X-Rivet-Actor` header is set to the actor id + +### Runners + +Guard accepts runner websocket connections when: +- the path matches `/runners/connect` +- `Sec-Websocket-Protocol` consists of comma delimited dot separated pairs like `rivet_target.runner` + +### API Requests + +Guard routes requests to the API layer when the `X-Rivet-Target` header is set to `api-public` or is unset. + +## Proxying (Gateway) + +The Gateway (a portion of Guard) acts as a proxy for requests and websockets to actors: + +- Internally, the websocket connects to a websocket listener running on the Rivet Engine +- Rivet Engine transmits HTTP requests and websocket messages via the runner protocol to the actor's corresponding runner's websocket + - The runner has a single websocket connection open to Guard which is independent from any client websocket connection + - This single connection multiplexes all actor requests and websocket connections +- The runner delegates requests and websockets to actors +- The runner sends HTTP responses and websocket messages back to Rivet through is websocket via the runner protocol +- Rivet transforms the runner protocol messages into HTTP responses and websocket messages + +### Websocket Hibernation + +The Gateway allows us to implement hibernatable websockets (see HIBERNATING_WS.md) for actors. We can keep a client's websocket connection open while simultaneously allowing for actors to sleep, resulting in 0 usage when there is no traffic over the websocket. The actor is automatically awoken when a websocket message is transmitted to the Gateway. diff --git a/docs/engine/HIBERNATING_WS.md b/docs/engine/HIBERNATING_WS.md new file mode 100644 index 0000000000..18391f5986 --- /dev/null +++ b/docs/engine/HIBERNATING_WS.md @@ -0,0 +1,25 @@ +# Hibernating Websockets + +## Lifecycle + +1. Client establishes a websocket connection to an actor via Rivet, which is managed by Guard (see GUARD.md) +2. Guard checks to see if the actor is awake. If it is, skip step 3 +3. If the actor is not awake, send a Wake signal to its workflow. This will make the actor allocate to an existing runner, or in the case of serverless, start a new runner and allocate to that +4. Guard sends the runner a ToClientWebSocketOpen message a via the runner protocol +5. The runner sends back ToServerWebSocketOpen to acknowledge the connection + - The runner must set `.canHibernate = true` for hibernation to work +6. At this point the websocket connection is fully established and any websocket messages sent by the client are proxied through Guard to the runner to be delegated to the actor +7. Should the actor go to sleep, the runner will close the websocket by sending ToServerWebSocketClose with `.hibernate = true` via the runner protocol +8. Guard receives that the websocket has closed on the runner side and starts hibernating. During hibernation nothing happens. +9. + - If the actor is awoken from any other source, go to step 6. We do not send a ToClientWebSocketOpen message in this case + - If the client sends a websocket message during websocket hibernation, go to step 2 + - If the client closes the websocket, the actor is rewoken (if not already running) and sent a ToClientWebSocketClose + +## State + +To facilitate state management on the runner side (specifically via RivetKit), each hibernating websocket runs a keepalive loop which periodically stores a value to UDB marking it as active. + +When a client websocket closes during hibernation, this value is cleared. + +When a runner receives a CommandStartActor message via the runner protocol, it contains information about which hiberating requests are still active. diff --git a/docs/engine/RUNNER_SHUTDOWN.md b/docs/engine/RUNNER_SHUTDOWN.md new file mode 100644 index 0000000000..e65cf5bd2c --- /dev/null +++ b/docs/engine/RUNNER_SHUTDOWN.md @@ -0,0 +1,17 @@ +# Runner Shutdown + +## 1. Self initiated shutdown + +1. Runner sends ToServerStopping to runner WS +2. Runner WS proxies ToServerStopping to runner WF +3. Runner WF sets itself as "draining", preventing future actor allocations to it +4. Runner WF sends GoingAway signal to all actor WFs +5. Once the runner lost threshold is passed, runner WF sends ToClientClose to runner WS +6. Runner WS closes connection to runner, informing it not to attempt reconnection + +## 2. Rivet initiated shutdown + +1. Runner WF receives Stop signal +2. Runner WF sends GoingAway signal to all actor WFs +3. Once the runner lost threshold is passed, runner WF sends ToClientClose to runner WS +4. Runner WS closes connection to runner, informing it not to attempt reconnection diff --git a/docs/engine/SERVERLESS.md b/docs/engine/SERVERLESS.md new file mode 100644 index 0000000000..7a7da1fa0e --- /dev/null +++ b/docs/engine/SERVERLESS.md @@ -0,0 +1,273 @@ +# Serverless Flow Diagrams + +## Ideal Serverless Flow + +```mermaid +--- +config: + theme: mc + look: classic +--- +sequenceDiagram + participant A as API + participant U as User + participant G as Gateway + + participant R as Runner + participant RWS as Runner WS + participant RWF as Runner Workflow + participant AWF as Actor Workflow + participant S as Serverless + participant SE as Serverless
Endpoint + + note over AWF: Actor already
created and sleeping + + critical request to sleeping actor + U->>G: Request to actor + note over G: Actor sleeping + G->>AWF: Wake + note over G: Await actor ready + + critical actor allocation + note over AWF: Allocate + + note over AWF: No runners available,
Start pending + AWF->>S: Bump + note over S: Desired: 1 + S->>SE: GET /start + SE-->R: Same process + + critical runner connection + R->>RWS: Connect + RWS->>RWF: Create runner workflow + end + + note over RWF: Allocate pending actors + RWF->>AWF: Allocate + + AWF->>RWF: Send StartActor + + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + RWS->>RWF: + RWF->>AWF: + note over AWF: Publish Ready msg + end + AWF->>G: Receive runner ID + + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle request + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + +%% note over A, AWF: Time passes + +%% critical actor sleep +%% R->>RWS: Actor intent: Sleep +%% RWS->>RWF: +%% RWF->>AWF: +%% note over AWF: Mark as sleeping +%% AWF->>RWF: Send StopActor +%% RWF->>RWS: +%% RWS->>R: +%% note over R: Stop actor +%% R->>RWS: Actor state update: Stopped +%% RWS->>RWF: +%% RWF->>AWF: +%% note over AWF: Sleep +%% end +``` + +## Messy Serverless Flow + +```mermaid +--- +config: + theme: mc + look: classic +--- +sequenceDiagram + %% participant A as API + participant U as User + participant G as Gateway + + participant R as Runner + participant RWS as Runner WS + participant RWF as Runner Workflow + participant AWF as Actor Workflow + participant S as Serverless + participant SE as Serverless
Endpoint + + note over R, RWF: For simplicity, this represents multiple
runners/runner workflows + note over AWF: Actor already
created and sleeping + + critical request to sleeping actor + U->>G: GET /sleep
(actor endpoint) + note over G: Actor sleeping + G->>AWF: Wake + note over G: Await actor ready + + critical actor allocation + note over AWF: Allocate + note over AWF: No runners available,
Start pending + AWF->>S: Bump + note over S: Desired: 1 + S->>SE: GET /start + SE-->R: Same process + + critical runner connection + R->>RWS: Connect + RWS->>RWF: Create runner workflow + end + + note over RWF: Allocate pending actors + RWF->>AWF: Allocate + + AWF->>RWF: Send StartActor + + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + RWS->>RWF: + RWF->>AWF: + note over AWF: Publish Ready msg + end + AWF->>G: Receive runner ID + + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle GET /sleep + R->>RWS: Actor intent: Sleep + RWS->>RWF: + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + + note over U: Immediately request
sleep endpoint again + + critical request to running actor + U->>G: GET /sleep
(actor endpoint) + note over G: Actor running + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle GET /sleep + R->>RWS: Actor intent: Sleep + RWS->>RWF: + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + + critical actor sleep + RWF->>AWF: Actor intent: Sleep + note over AWF: Mark as sleeping + AWF->>RWF: Send StopActor + RWF->>AWF: Second actor intent: Sleep + note over AWF: Ignored, already
marked as sleeping + end + + critical request to actor marked as sleeping + U->>G: GET /sleep
(actor endpoint) + note over G: Actor sleeping + G->>AWF: Wake + note over G: Await actor ready + note over AWF: Actor is currently marked
as sleeping but has not stopped
yet, defer wake after stop + critical actor sleep cont + RWF->>RWS: Proxy StopActor
(from before) + RWS->>R: + note over R: Stop actor + R->>RWS: Actor state update: Stopped + RWS->>RWF: + RWF->>AWF: + note over AWF: Deallocate + AWF->>S: Bump + note over AWF: Send Stopped msg + end + AWF->>G: Receive Stopped msg + G->>AWF: Retry wake + critical actor reallocation + note over AWF: Deferred wake + note over AWF: Allocate + AWF->>RWF: Send StartActor + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + note over AWF: Ignore retry wake (from before)
because we are already allocated + + S->>RWF: Stop + note over S: After grace + S->>RWS: Evict WS + RWS->>R: + note over RWF: Remove from alloc idx + note over RWF: Evict running actors + RWF->>AWF: Actor lost + note over AWF: Deallocate + note over AWF: Send Stopped msg + AWF->>G: Receive Stopped msg + G->>AWF: Retry wake + note over AWF: Allocate + note over AWF: No runners available,
Start pending + AWF->>S: Bump + note over S: Desired: 1 + S->>SE: GET /start + note over R, SE: Second runner + SE-->R: Same process + + critical runner connection + R->>RWS: Connect + RWS->>RWF: Create runner workflow + end + + note over RWF: Allocate pending actors + RWF->>AWF: Allocate + + AWF->>RWF: Send StartActor + + RWF->>RWS: + RWS->>R: + note over R: Start actor + R->>RWS: Actor state update: Running + RWS->>RWF: + RWF->>AWF: + note over AWF: Publish Ready msg + end + AWF->>G: Receive runner ID + G->>RWS: Tunnel ToClientRequestStart + RWS->>R: + note over R: Handle GET /sleep + R->>RWS: Actor intent: Sleep + RWS->>RWF: + R->>RWS: ToServerResponseStart + RWS->>G: + G->>U: + end + + critical actor sleep + RWF->>AWF: Actor intent: Sleep + note over AWF: Mark as sleeping + AWF->>RWF: Send StopActor + RWF->>RWS: + RWS->>R: + note over R: Stop actor + R->>RWS: Actor state update: Stopped + RWS->>RWF: + RWF->>AWF: + note over AWF: Deallocate + AWF->>S: Bump + note over AWF: Send Stopped msg + end + + S->>RWF: Stop + note over RWF: Remove from alloc idx + note over S: After grace + S->>RWS: Evict WS + RWS->>R: +``` \ No newline at end of file diff --git a/docs/engine/TERMINOLOGY.md b/docs/engine/TERMINOLOGY.md new file mode 100644 index 0000000000..cd6bd94d42 --- /dev/null +++ b/docs/engine/TERMINOLOGY.md @@ -0,0 +1,10 @@ +# Terminology + +- Rivet Engine - The binary running everything related to Rivet +- Client - the user/app connecting to Rivet +- Runner - the client-side runner code +- Runner Protocol - Rivet <-> Runner communication protocol defined as BARE (see engine/sdks/schemas/runner-protocol) +- Runner WF - the rivet-side runner, manages runner lifecycle +- Actor WF - rivet-side actor +- Gateway - A portion of Guard responsible for proxying requests and websockets to actors running on runners +- Runner WS - The runner connects to this to communicate to Rivet diff --git a/engine/sdks/schemas/runner-protocol/v3.bare b/engine/sdks/schemas/runner-protocol/v3.bare index 769807fe63..507e621205 100644 --- a/engine/sdks/schemas/runner-protocol/v3.bare +++ b/engine/sdks/schemas/runner-protocol/v3.bare @@ -12,6 +12,7 @@ type KvKey data type KvValue data type KvMetadata struct { version: data + # TODO: Rename to update_ts createTs: i64 }