Functional, Typesafe, Declarative Data Pipelines
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
examples/com/intentmedia/mario World 1-1 Jun 5, 2015
project World 1-1 Jun 5, 2015
src World 1-1 Jun 5, 2015
.gitignore World 1-1 Jun 5, 2015
LICENSE.md
README.md JS2 adding link to launch blog post Jun 9, 2015
build.sbt Updating README, fixing duplicated url on pom XML Jun 5, 2015

README.md

Mario

image

Mario is a Scala library focused on defining complex data pipelines in a functional, typesafe, and efficient way. See the launch blog post for more details on the motivation behind the library.

image Defining pipelines

Pipelines are very easy to build, using only the pipe function. You can construct pipelines with and without depedencies. Pipelines can be non-linear, but must be acyclic. The lack of cycles is enforced by the library, so it is impossible to define a cyclic dependency in Mario.

Execution of pipelines is done concurrently, guaranteeing that each step is executed just once.

image Usage

Import

import com.intentmedia.mario.Pipeline._

Example

Here is a simple 3 step pipeline:

// independent step
val step1 = pipe(0 until 100)
// unary dependent step
val step2 = pipe((a: Range) => a.size until 200, step1)
// binary dependent step
val step3 = pipe((a: Range, b: Range) => a ++ b, step1, step2)

step3.run().size
// 200 (step1 will be only executed once)

Independent pipelines can be executed using runWith:

val result = step3.runWith(pipe(println("foo")), pipe(println("bar")))
result.run().size
// 200 (will also print "foo" and "bar")

Pipelines can be composed using for comprehensions:

for {
  step1 <- pipe(0 until 100)
  step2 <- pipe((a: Range) => a.size until 200, step1)
  step3 <- pipe((a: Range, b: Range) => a ++ b, step1, step2)
} yield step3.run().size

image Installation

Add the following to your sbt build:

libraryDependencies += "com.intentmedia.mario" %% "mario" % "0.1.0"

image Roadmap

  • Fault tolerance
  • Implicit caching (in Spark)
  • Web UI