Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
Branch: master
Clone or download
mareksimunek Merge pull request #304 from seznam/simunek/euphoriaIO
[euphoria-beam-io] Added beam wrappers for euphoria Sinks and Sources
Latest commit 44c6e7b Oct 31, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks [euphoria-build] add maven build for releasing Jan 20, 2018
cd #! Avoid running tests twice Feb 3, 2017
euphoria-avro sq Mar 22, 2018
euphoria-beam-io [euphoria-beam-io] added javadoc Oct 30, 2018
euphoria-core [euphoria-spark] Optimize reduce by key. May 10, 2018
euphoria-examples Merge branch 'master' into tune-readme Apr 16, 2018
euphoria-flink [euphoria-flink] register types to kryo Sep 13, 2018
euphoria-fluent [euphoria-core] #259 Hints are not runtime specific. Hints describe O… Mar 8, 2018
euphoria-hadoop [euphoria-spark] Input split size log added. Jul 24, 2018
euphoria-hbase [euphoria-hbase] Fix deprecation warnings. May 10, 2018
euphoria-kafka Upgrade kafka to 1.0.0 Feb 14, 2018
euphoria-local [euphoria-build] add maven build for releasing Jan 20, 2018
euphoria-operator-testkit [euphoria-spark] Optimize reduce by key. May 10, 2018
euphoria-spark [FIX] Do not overload SparkExecutor#setAccumulatorProvider Aug 15, 2018
euphoria-testing [euphoria-build] add maven build for releasing Jan 20, 2018
gradle [euphoria-spark] Bump spark version. May 10, 2018
thirdparty-guava [euphoria-hadoop] fix build Jan 20, 2018
.gitignore [euphoria-core] Union can accept more than two data sets Dec 2, 2017
.travis.yml Move to gradle. Nov 14, 2017
HEADER Move to gradle. Nov 14, 2017
LICENSE #! Add LICENSE (APLv2.0) Jan 26, 2017
NOTICE #! Correct name of copyright owner Jan 30, 2017
README.md Merge branch 'master' into tune-readme Apr 16, 2018
build.gradle [euphoria-spark] Bump spark version. May 10, 2018
gradlew Move to gradle. Nov 14, 2017
gradlew.bat [euphoria-build] add maven build for releasing Jan 20, 2018
license-header.txt [euphoria-build] add maven build for releasing Jan 20, 2018
pom.xml [euphoria-build] add maven build for releasing Jan 20, 2018
settings.gradle Introduce beam IO Sep 14, 2018

README.md

Euphoria

Build Status

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model that can express both batch and stream transformations.

The main goal of the API is to ease the creation of programs with business logic independent of a specific runtime framework/engine and independent of the source or destination of the processed data. Such programs are then transferable with little effort to new environments and new data sources or destinations - idealy just by configuration.

Key features

  • Unified API that supports both batch and stream processing using the same code
  • Avoids vendor lock-in - migrating between different engines is matter of configuration
  • Declarative Java API using Java 8 Lambda expressions
  • Support for different notions of time (event time, ingestion time)
  • Flexible windowing (Time, TimeSliding, Session, Count)

Download

The best way to use Euphoria is by adding the following Maven dependency to your pom.xml:

<dependency>
  <groupId>cz.seznam.euphoria</groupId>
  <artifactId>euphoria-core</artifactId>
  <version>0.7.0</version>
</dependency>

You may want to add additional modules, such as support of various engines or I/O data sources/sinks. For more details read the Maven Dependencies wiki page.

WordCount example

// Define data source and data sinks
DataSource<String> dataSource = new SimpleHadoopTextFileSource(inputPath);
DataSink<String> dataSink = new SimpleHadoopTextFileSink<>(outputPath);

// Define a flow, i.e. a chain of transformations
Flow flow = Flow.create("WordCount");

Dataset<String> lines = flow.createInput(dataSource);

Dataset<String> words = FlatMap.named("TOKENIZER")
    .of(lines)
    .using((String line, Collector<String> context) -> {
      for (String word : line.split("\\s+")) {
        context.collect(word);
      }
    })
    .output();

Dataset<Pair<String, Long>> counted = ReduceByKey.named("COUNT")
    .of(words)
    .keyBy(w -> w)
    .valueBy(w -> 1L)
    .combineBy(Sums.ofLongs())
    .output();

MapElements.named("FORMAT")
    .of(counted)
    .using(p -> p.getFirst() + "\n" + p.getSecond())
    .output()
    .persist(dataSink);

// Initialize an executor and run the flow (using Apache Flink)
try {
  Executor executor = new FlinkExecutor();
  executor.submit(flow).get();
} catch (InterruptedException ex) {
  LOG.warn("Interrupted while waiting for the flow to finish.", ex);
} catch (IOException | ExecutionException ex) {
  throw new RuntimeException(ex);
}

Supported Engines

Euphoria translates flows, also known as data transformation pipelines, into the specific API of a chosen, supported big-data processing engine. Currently, the following are supported:

  • Apache Flink
  • Apache Spark
  • An independent, standalone, in-memory engine which is part of the Euphoria project suitable for running flows in unit tests.

In the WordCount example from above, to switch the execution engine from Apache Flink to Apache Spark, we'd merely need to replace FlinkExecutor with SparkExecutor.

Bugs / Features / Contributing

There's still a lot of room for improvements and extensions. Have a look into the issue tracker and feel free to contribute by reporting new problems, contributing to existing ones, or even open issues in case of questions. Any constructive feedback is warmly welcome!

As usually with open source, don't hesitate to fork the repo and submit a pull requests if you see something to be changed. We'll be happy see euphoria improving over time.

Building

To build the Euphoria artifacts, the following is required:

  • Git
  • Java 8

Building the project itself is a matter of:

git clone https://github.com/seznam/euphoria
cd euphoria
./gradlew publishToMavenLocal -xtest

Documentation

Contact us

License

Euphoria is licensed under the terms of the Apache License 2.0.