Skip to content

Commit

Permalink
doc: kill-switch when buffer is full (#1034)
Browse files Browse the repository at this point in the history
Signed-off-by: Vigith Maurice <vigith@gmail.com>
  • Loading branch information
vigith authored and whynowy committed Sep 14, 2023
1 parent 73db23a commit 8d5c56f
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 1 deletion.
36 changes: 36 additions & 0 deletions docs/user-guide/reference/edge-tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Edge Tuning

## Drop message onFull

We need to have an edge level setting to drop the messages if the `buffer.isFull == true`. Even if the UDF or UDSink drops
a message due to some internal error in the user-defined code, the processing latency will spike up causing a natural
back pressure. A kill switch to drop messages can help alleviate/avoid any repercussions on the rest of the DAG.

This setting is an edge-level setting and can be enabled by `onFull` and the default is `retryUntilSuccess` (other option
is `discardLatest`).

This is a **data loss scenario** but can be useful in cases where we are doing user-introduced experimentations,
like A/B testing, on the pipeline. It is totally okay for the experimentation side of the DAG to have data loss while
the production is unaffected.

### discardLatest

Setting `onFull` to `discardLatest` will drop the message on the floor if the edge is full.

```yaml
edges:
- from: a
to: b
onFull: discardLatest
```
### retryUntilSuccess
The default setting for `onFull` in `retryUntilSuccess` which will make sure the message is retried until successful.

```yaml
edges:
- from: a
to: b
onFull: retryUntilSuccess
```
4 changes: 3 additions & 1 deletion docs/user-guide/reference/pipeline-tuning.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Pipeline Tuning

For a data processing pipeline, each vertex keeps running the cycle of reading data from an Inter-Step Buffer (or data source), processing the data, and writing to next Inter-Step Buffers (or sinks). It is possible to make some tuning for this data processing cycle.
For a data processing pipeline, each vertex keeps running the cycle of reading data from an Inter-Step Buffer (or data source),
processing the data, and writing to next Inter-Step Buffers (or sinks). It is possible to make some tuning for this data
processing cycle.

- `readBatchSize` - How many messages to read for each cycle, defaults to `500`.
- `bufferMaxLength` - How many unprocessed messages can be existing in the Inter-Step Buffer, defaults to `30000`.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ nav:
- Examples: "user-guide/user-defined-functions/reduce/examples.md"
- Reference:
- user-guide/reference/pipeline-tuning.md
- user-guide/reference/edge-tuning.md
- user-guide/reference/autoscaling.md
- user-guide/reference/conditional-forwarding.md
- user-guide/reference/join-vertex.md
Expand Down

0 comments on commit 8d5c56f

Please sign in to comment.