Skip to content

intentmedia/mario

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mario

image

Mario is a Scala library focused on defining complex data pipelines in a functional, typesafe, and efficient way. See the launch blog post for more details on the motivation behind the library.

image Defining pipelines

Pipelines are very easy to build, using only the pipe function. You can construct pipelines with and without depedencies. Pipelines can be non-linear, but must be acyclic. The lack of cycles is enforced by the library, so it is impossible to define a cyclic dependency in Mario.

Execution of pipelines is done concurrently, guaranteeing that each step is executed just once.

image Usage

Import

import com.intentmedia.mario.Pipeline._

Example

Here is a simple 3 step pipeline:

// independent step
val step1 = pipe(0 until 100)
// unary dependent step
val step2 = pipe((a: Range) => a.size until 200, step1)
// binary dependent step
val step3 = pipe((a: Range, b: Range) => a ++ b, step1, step2)

step3.run().size
// 200 (step1 will be only executed once)

Independent pipelines can be executed using runWith:

val result = step3.runWith(pipe(println("foo")), pipe(println("bar")))
result.run().size
// 200 (will also print "foo" and "bar")

Pipelines can be composed using for comprehensions:

for {
  step1 <- pipe(0 until 100)
  step2 <- pipe((a: Range) => a.size until 200, step1)
  step3 <- pipe((a: Range, b: Range) => a ++ b, step1, step2)
} yield step3.run().size

image Installation

Add the following to your sbt build:

libraryDependencies += "com.intentmedia.mario" %% "mario" % "0.1.0"

image Roadmap

  • Fault tolerance
  • Implicit caching (in Spark)
  • Web UI

About

Functional, Typesafe, Declarative Data Pipelines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages