euphoria-core: Consider adding support for side-outputs #124

vanekjar · 2017-05-10T16:00:45Z

When we discussed implementation details of counters we came across that they may be implemented as side-output from each operator. Details in issue #31. But it seems side-outputs is more general concept deserving separate issue.

The purpose of this issue is to think about implementing similar concept to Apache Beam additional outputs or Apache Flink side outputs

According to @je-ik:

We could design a parallel outputs to the main output, by naming them (and therefore construct what looks like counter, but essentially is nothing more than another Dataset). The code would then look like this:

  Flow flow = Flow.create();
  Dataset<Integer> input = ...;
  Dataset<Integer> output = FlatMap.of(input)
      .using((in, ctx) -> {
        ctx.collect( /* do whatever transformation of `in` */ );
        ctx.collect("input-elements", 1L);
      })
      .output();
  Dataset<Long> inputElements = flow.getNamedStream("input-elements");
  // now I can do whatever i want with this stream, I can window it as I wish, aggregate by a function
  // of my choice and so on, and finally, persist the dataset where I wish

So,

in the example above I used strings to identify the corresponding outputs, but of course, it was just an example - this would need to be modified a little to incorporate strong typing of the output Datasets - this goes in the direction of tags in the sense of Beam
the output would probably be neither keyed nor windowed, it would, ofcourse, carry timestamp
the output would be available via the Flow
the executor can know if an output is not used, because user code has to read it (via getTaggedStream)
what you do with the output Dataset is left on the user code, so you can use it for bussiness logic (joining it with some "main" dataset) or monitoring and debugging (storing it into appropriate sink - e.g. elastic search)

A little modified example, which covers the above topics:

  Flow flow = Flow.create();
  Dataset<Integer> input = ...;
  NamedTag<Long> elementsTag = NamedTag.named("input-elements").typed(Long.class);
  Dataset<Integer> output = FlatMap.of(input)
      .using((in, ctx) -> {
        ctx.collect( /* do whatever transformation of `in` */ );
        ctx.collect(elementsTag, 1L);
      })
      .withNamedTags(elementsTag)
      .output();
  Dataset<Long> inputElements = flow.getTaggedStream(elementsTag);
  // now I can do whatever i want with this stream, I can window it as I wish, aggregate by a function
  // of my choice and so on, and finally, persist the dataset where I wish

The text was updated successfully, but these errors were encountered:

je-ik · 2018-11-14T21:50:10Z

In roadmap.

vanekjar added client-api enhancement labels May 10, 2017

je-ik closed this as completed Nov 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

euphoria-core: Consider adding support for side-outputs #124

euphoria-core: Consider adding support for side-outputs #124

vanekjar commented May 10, 2017

je-ik commented Nov 14, 2018

euphoria-core: Consider adding support for side-outputs #124

euphoria-core: Consider adding support for side-outputs #124

Comments

vanekjar commented May 10, 2017

je-ik commented Nov 14, 2018