Skip to content

Motivation

svXaverius edited this page Feb 7, 2017 · 9 revisions

Euphoria's origins reach back into 2014 when we started exploring the idea to describe computations independent of a specific engine. The initial goal was to write programs which then could be executed on different big data processing platforms. This would allow us to adopt new technologies without having to fully rewrite our programs.

With a declarative Java API able to describe our batch programs and an execution engine on top of Apache Spark and a non-distributed, in-memory executor, we became more and more interested in unifying batch and streams processing. Since then, abstracting away the aspects of batched/streamed input/output of a Euphoria's program execution to the client code has become an additional goal.

In summary the goals and motivation for the project are:

  • Provide a unified API supporting batch and stream processing using the same application code.
    • The application code shall be unaware of the fact that might happen to process a stable dataset or one being streamed from a live source.
    • The main difference between these two processing paradigms are just runtime characteristics (latency, throughput, accuracy) that should not affect business logic.
  • Provide an engine agnostic API allowing to switch/exchange the underlying executing technology without changing the application code. This allows applications to be written without being "locked into" a specific framework.
    • Technologies come and go. Especially in the field of "big data" these days, we see a lot of innovation and want to leverage and try out the new engines/runtimes. With a neutral API, we are free to build adapters to these new technologies and evaluate their potentials early on.
  • Provide a type-safe API to avoid "getting lost" in complicated programs, avoiding unnecessary runtime failures due to invalid assumptions about the processed data types.
  • Provide a declarative, less verbose API leveraging Java 8's lambda expressions.
    • Java 8 can considerably cut down the level of verbosity typically seen in Java code by allowing programmers to leverage Lambda expressions. Euphoria's API is designed with Lambdas in mind.
  • Provide support for different execution engines (while allowing for a certain, constant overhead.)
    • Adapters to specific engines are allowed to have a certain, but constant overhead. Of course, the desire is to have this constant overhead as close as zero.
    • Although not trivial, writing an adapter to an execution engine is supposed to be possible with reasonable effort. The API is internally designed in such a way, that translating a user provided program into an engine specific execution plan requires support for only a minimum of operations/operators.

These points are indeed similar to the goals of the Apache Beam project. Euphoria is in fact an alternative implementation of the concepts described publicly in the 2015 Google paper "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing". Indeed, we strongly recommend the study of this paper, as well as the accompanying two blog posts Streaming 101 and Streaming 102 by Tyler Akidau.