Skip to content

Commit

Permalink
chore: finalize doc for 0.7 release (#472)
Browse files Browse the repository at this point in the history
Signed-off-by: Vigith Maurice <vigith@gmail.com>
  • Loading branch information
vigith committed Jan 13, 2023
1 parent aa59168 commit 3271016
Show file tree
Hide file tree
Showing 8 changed files with 28 additions and 7 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ full-featured stream processing platforms.

## Roadmap

- Data aggregation (e.g. group-by)
- User Defined Transformer at Source for better deserialization and filter for cost reduction (v0.8)
- Multi partitioned edges for higher throughput (v0.9)

## Resources

Expand Down
3 changes: 2 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ stream processing platforms.

## Roadmap

- Data aggregation (e.g. group-by)
- User Defined Transformer at Source for better deserialization and filter for cost reduction (v0.8)
- Multi partitioned edges for higher throughput (v0.9)

## Getting Started

Expand Down
8 changes: 7 additions & 1 deletion docs/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,14 @@ The pipeline can be deleted by
kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/stable/test/e2e/testdata/even-odd.yaml
```

## A pipeline with reduce (aggregation)

[Reduce Examples](user-guide/user-defined-functions/reduce/examples.md)

## What's Next

Try more examples in the [`examples`](https://github.com/numaproj/numaflow/tree/main/examples) directory.

After exploring how Numaflow pipeline run, you can check what data [Sources](./user-guide/sources/generator.md) and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write [User Defined Functions](user-guide/user-defined-functions/map/map.md).
After exploring how Numaflow pipeline run, you can check what data [Sources](./user-guide/sources/generator.md)
and [Sinks](./user-guide/sinks/kafka.md) Numaflow supports out of the box, or learn how to write
[User Defined Functions](user-guide/user-defined-functions/user-defined-functions.md).
4 changes: 3 additions & 1 deletion docs/user-guide/user-defined-functions/reduce/examples.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Reduce Examples

Please read [reduce](./reduce.md) to get the best out of these examples.

## Prerequisites

Install the ISB
Expand Down Expand Up @@ -139,4 +141,4 @@ and 10, the output from the first reduce vertex will be 25 (5 * 5) and 50 (5 * 1
to the next non-keyed reduce vertex with the fixed window duration of 10s. This being a non-keyed, it will
combine the inputs and produce the output of 150(25 * 2 + 50 * 2), which will be passed to the reduce
vertex with a sliding window of duration 60s and with the slide duration of 10s. Hence the final output
will be 900(150 * 6).
will be 900(150 * 6).
5 changes: 5 additions & 0 deletions docs/user-guide/user-defined-functions/reduce/reduce.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Reduce UDF

## Overview

Reduce is one of the most commonly used abstractions in a stream processing pipeline to define
aggregation functions on a stream of data. It is the reduce feature that helps us solve problems like
"performs a summary operation(such as counting the number of occurrence of a key, yielding user login
Expand All @@ -9,6 +11,7 @@ bounding condition is "time", eg, "number of users logged in per minute". So whi
unbounded stream of data, we need a way to group elements into finite chunks using time. To build these
chunks the reduce function is applied to the set of records produced using the concept of [windowing](./windowing/windowing.md).

## Reduce Pseudo code
Unlike in _map_ vertex where only an element is given to user-defined function, in _reduce_ since
there is a group of elements, an iterator is passed to the reduce function. The following is a generic
outlook of a reduce function. I have written the pseudo-code using the accumulator to show that very
Expand All @@ -30,6 +33,8 @@ def reduceFn(key: str, datums: Iterator[Datum], md: Metadata) -> Messages:
return Messages(Message.to_vtx(key, acumulator.result()))
```

## Specification

The structure for defining a reduce vertex is as follows.
```yaml
- name: my-reduce-udf
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Fixed

## Overview

Fixed windows (sometimes called tumbling windows) are defined by a static window size, e.g. 30 second
windows, one minute windows, etc. They are generally aligned, i.e. every window applies across all
the data for the corresponding period of time. It has a fixed size measured in time and does not
Expand All @@ -24,7 +26,7 @@ vertices:
NOTE: A duration string is a possibly signed sequence of decimal numbers, each with optional fraction
and a unit suffix, such as "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

## Length
### Length

The `length` is the window size of the fixed window.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Sliding

## Overview

Sliding windows are similar to Fixed windows, the size of the windows is measured in time and is fixed.
The important difference from the Fixed window is the fact that it allows an element to be present in
more than one window. The additional window slide parameter controls how frequently a sliding window
Expand All @@ -22,11 +24,11 @@ vertices:
NOTE: A duration string is a possibly signed sequence of decimal numbers, each with optional fraction
and a unit suffix, such as "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

## Length
### Length

The `length` is the window size of the fixed window.

## Slide
### Slide

`slide` is the slide parameter that controls the frequency at which the sliding window is created.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Windowing

## Overview

In the world of data processing on an unbounded stream, Windowing is a concept
of grouping data using temporal boundaries. We use event-time to discover
temporal boundaries on an unbounded, infinite stream and [Watermark](../../../watermarks.md) to ensure
Expand Down

0 comments on commit 3271016

Please sign in to comment.