Answer how Stratosphere compares to Apache Spark #36

rmetzger · 2014-05-02T17:30:39Z

This message from our mailing list, posted by @fhueske might be a good skeleton:

Similar to Spark, Stratosphere is a complete data processing system, i.e., it has a programming API, a program compiler (optimizer), and an own execution runtime.
It is also an alternative for Hadoop MapReduce and in several design points quite similar to Spark:

Programs are executed as DAGs
Higher-level programming primitives (compared to Hadoop MR)
APIs in Scala and Java
Reads data from external data stores (has no own data storage), e.g., HDFS, S3, RDBMS, ...

However, Stratosphere is also different in some aspects:

Database-inspired processing using pipelining, gradually going to disk if memory is not sufficient (Hybridhash Joins, external sorts)
Sophisticated cost-based optimizer choosing execution strategies (broadcasting vs. partitioning, sort vs. hash joins, ...)
Implemented in Java (in contrast to Spark which uses Scala)
No intermediate result materialization in memory (this is on the roadmap)

Stratosphere and Spark can be rather seen as alternatives.
We do not build on any of Sparks components as we have our own programming API and execution engine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Answer how Stratosphere compares to Apache Spark #36

Answer how Stratosphere compares to Apache Spark #36

rmetzger commented May 2, 2014

Answer how Stratosphere compares to Apache Spark #36

Answer how Stratosphere compares to Apache Spark #36

Comments

rmetzger commented May 2, 2014