Drake is a simple-to-use, extensible, text-based data workflow tool that organizes command execution around data and its dependencies. Data processing steps are defined along with their inputs and outputs and Drake automatically resolves their dependencies and calculates:
- which commands to execute (based on file timestamps)
- in what order to execute the commands (based on dependencies)
Drake is similar to GNU Make, but designed especially for data workflow management. It has HDFS support, allows multiple inputs and outputs, and includes a host of features designed to help you bring sanity to your otherwise chaotic data processing workflows.
If you like screencasts, check out this Drake walk-through video recorded by Artem Boytsov, Drake's primary designer:
Drake has been tested under Linux, Mac OS X and Windows 8. We've not tested it on other operating systems.
Drake installs itself on the first run of the
drake shell script; there is no
separate install script. Follow these instructions to install drake manually:
- Make sure you have Java version 6 or later.
- Download the
drakescript from the
masterbranch of this project.
- Place the
drakescript on your
~/binis a good choice if it is on your path.)
- Set it to be executable. (
chmod 755 ~/bin/drake)
- Run it (
If you're on a Mac you can alternatively use Homebrew to install Drake:
brew install drake
Starting with Drake version 1.0.0, once you have Drake installed you can easily upgrade your version of Drake by running
drake --upgrade. The latest version of Drake will be downloaded and installed for you.
Download or build the uberjar
You can build Drake from source or run from a prebuilt jar. Detailed instructions
Use Drake as a Clojure library
You can programmatically use Drake from your Clojure project by using Drake's Clojure front end. Your project.clj dependencies should include the latest Drake library, e.g.:
Faster startup time
The wiki is the home for Drake's documentation, but here are simple notes on usage:
To build a specific target (and any out-of-date dependencies, if necessary):
$ drake mytarget
To build a target and everything that depends on it (a.k.a. "down-tree" mode):
$ drake ^mytarget
To build a specific target only, without any dependencies, up or down the tree:
$ drake =mytarget
To force build a target:
$ drake +mytarget
To force build a target and all its downtree dependencies:
$ drake +^mytarget
To force build the entire workflow:
$ drake +...
To exclude targets:
$ drake ... -sometarget -anothertarget
By default, Drake will look for
./Drakefile. The simplest way to run your workflow is to name your workflow file
Drakefile, and make sure you're in the same directory. Then, simply:
To specify the workflow file explicitly, use
$ drake -w /myworkflow/my-workflow.drake
drake --help for the full list of options.
The wiki is the home for Drake's documentation.
A lot of work went into designing and specifying Drake. To prove it, here's the 60 page specification and user manual. It's stored in Google Docs, and we encourage everyone to use its superb commenting feature to provide feedback. Just select the text you want to comment on, and click Insert -> Comment (Ctrl + Alt + M on Windows, Cmd + Option + M on Mac). It can also be downloaded as a PDF.
There are annotated workflow examples in the demos directory.
Visualize your workflow
See more detail
Asynchronous Execution of Steps
Please see the wiki page on async.
Drake has a plugin mechanism, allowing developers to publish and use custom plugins that extend Drake. See the Plugin wiki page for details.
Drake provides HDFS support by allowing you to specify inputs and outputs like
If you plan to use Drake with HDFS, please see the wiki page on HDFS Compatibility.
Amazon S3 Compatibility
Thanks to Chris Howe, Drake now has basic compatibility with Amazon S3 by allowing you to specify
inputs and outputs like
If you plan to use Drake with S3, please see the wiki doc on S3 Compatibility.
Drake on the REPL
You can use Drake from your Clojure REPL, via
drake.core/run-workflow. Please see the Drake on the REPL wiki page for more details.
Stuff outside this repo
Courtesy of @daguar, an alternative approach to installing Drake on Mac OS X.
Original blog post announcing Drake's open source release
An epic knock-down-drag-out set of threads on Hacker News discussing the design merits of Drake
Source Copyright © 2012-2015 Factual, Inc.
Distributed under the Eclipse Public License, the same as Clojure uses. See the file COPYING.