Skip to content

souravzzz/drake

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Drake

Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates:

  • which commands to execute (based on file timestamps)
  • in what order to execute the commands (based on dependencies)

Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows.

Installation

Drake is a Clojure project, so to build Drake you will need to have leiningen.

Note that Drake has been tested under Linux and Mac OS X. We've not tested it on Windows.

Clone the project:

$ git clone git@github.com:Factual/drake.git
$ cd drake

Build the uberjar:

$ lein uberjar

Run Drake from the jar

Once you've built the uberjar, you can run Drake like this:

$ java -jar drake.jar

You can pass in arguments and options to Drake by putting them at the end of the above command, e.g.:

$ java -jar drake.jar --version

A nicer way to run Drake

We recommend you "install" Drake in your environment so that you can run it by just typing "drake". For example, you could have an executable script called drake, like this on your path:

#!/bin/bash
java -cp $(dirname $0)/drake.jar drake.core $@

Drake documentation refers to running Drake as "drake". If you are instead running the uberjar, just replace "drake" with "java -jar drake.jar" in the examples.

Basic Usage

The wiki is the home for Drake's documentation, but here are simple notes on usage:

To build a specific target (and any out-of-date dependencies, if necessary):

$ drake mytarget

To build a target and everything that depends on it (a.k.a. "down-tree" mode):

$ drake ^mytarget

To build a specific target only, without any dependencies, up or down the tree:

$ drake =mytarget

To force build a target:

$ drake +mytarget

To force build a target and all its downtree dependencies:

$ drake +^mytarget

To force build the entire workflow:

$ drake +...

To exclude targets:

$ drake ... -sometarget -anothertarget

By default, Drake will look for ./workflow.d. The simplest way to run your workflow is to name your workflow file workflow.d, and make sure you're in the same directory. Then, simply:

$ drake

To specify the workflow file explicitly, use -w or --workflow. E.g.:

$ drake -w /myworkflow/my-fav-workflow.d

Use drake --help for the full list of options.

Documentation, etc.

The wiki is the home for Drake's documentation.

A lot of work went into designing and specifying Drake. To prove it, here's the 60 page specification document. It can be downloaded as a PDF and treated like a user manual.

There are annotated workflow examples in the demos directory.

There's a Google Group for Drake

If you like screencasts, check out this Drake walk-through video recorded by Artem, Drake's primary designer:

HDFS Compatibility

Drake provides HDFS support by allowing you to specify inputs and outputs like hdfs://my/big_file.txt.

If you plan to use Drake with HDFS, please see the wiki doc on HDFS Compatibility.

License

Source Copyright © 2012-2013 Factual, Inc.

Distributed under the Eclipse Public License, the same as Clojure uses. See the file COPYING.

About

Data workflow tool, like a "Make for data"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published