-
Notifications
You must be signed in to change notification settings - Fork 5
A simple easy to use Hadoop map reduce workflow engine
pranab/fluxua
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Introduction ============ Fluxua is a simple workflow driver for Hadoop map reduce jobs. It's non intrusive. It does not require your map reduce implementation to extend any special class and can be plugged into the workflow as is. There are other map reduce workflow engines like oozie, cascading etc. that you can use. Fluxua is itended to be a simple, small foot print alternative. Architecture ============ The Hadoop map reduce jobs are defined as nodes in a DAG (directed acyclic graph). Edges represent the dependency between the jobs. The workflow drives topologically orders the DAG and executes the jobs in proper order, starting with the jobs at the root nodes of the DAG. Each MR job is executed from a separate thread. The executing thread communicates with the driver through a blocking queue. All MR jobs at intermediate DAG nodes should do blocking executions of the job, so that the dependent jobs can be launched only after the parent jobs have completed. Configuration ============= The jobs and the workflow are defined ina JSON file as below. The configuration has three main sections. The first section has the system configurations. The second sections has configuration of all thge jobs. The las sectipns has flow definitions in terms of jobs. There is an example from a data mining project in the resource directory. Sample Shell Script =================== Here is sample shell script for using the driver. The entry point is the class org.fluxua.driver.JobDriver DR_JAR=/home/pranab/Projects/fluxua/target/fluxua-1.0.jar JAR=/home/pranab/Projects/zaal/target/zaal-1.0.jar CL=org.fluxua.driver.JobDriver CONFIG=/home/pranab/Projects/zaal/zaal.json export HADOOP_CLASSPATH=$DR_JAR:$JAR echo $HADOOP_CLASSPATH hadoop fs -rmr /zaal/dor/output hadoop fs -rmr /zaal/bcl/output hadoop jar $JAR $CL -c $CONFIG -f bayesian -i sample Additional Info =============== More details can be found in my blogpost here http://pkghosh.wordpress.com/2011/05/22/hadoop-orchestration
About
A simple easy to use Hadoop map reduce workflow engine
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published