Skip to content

pranab/fluxua

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction
============
Fluxua is a simple workflow driver for Hadoop map reduce jobs. It's non intrusive. It does not require
your map reduce implementation to extend any special class and can be plugged into the workflow as is. There
are other map reduce workflow engines like oozie, cascading etc. that you can use. Fluxua is itended to be a 
simple, small foot print alternative.

Architecture
============
The Hadoop map reduce jobs are defined as nodes in a DAG (directed acyclic graph). Edges represent the dependency
between the jobs. The workflow drives topologically orders the DAG and executes the jobs in proper order, starting
with the jobs at the root nodes of the DAG.

Each MR job is executed from a separate thread. The executing thread communicates with the driver through a blocking 
queue. All MR jobs at intermediate DAG nodes should do blocking executions of the job, so that the dependent jobs can
be launched only after the parent jobs have completed.

Configuration
=============
The jobs and the workflow are defined ina JSON file as below. The configuration has three main sections. The first
section has the system configurations. The second sections has configuration of all thge jobs. The las sectipns has 
flow definitions in terms of jobs. There is an example from a data mining project in the resource directory.


Sample Shell Script
===================
Here is sample shell script for using the driver. The entry point is the class org.fluxua.driver.JobDriver

DR_JAR=/home/pranab/Projects/fluxua/target/fluxua-1.0.jar
JAR=/home/pranab/Projects/zaal/target/zaal-1.0.jar
CL=org.fluxua.driver.JobDriver
CONFIG=/home/pranab/Projects/zaal/zaal.json
export HADOOP_CLASSPATH=$DR_JAR:$JAR
echo $HADOOP_CLASSPATH
hadoop fs -rmr /zaal/dor/output
hadoop fs -rmr /zaal/bcl/output
hadoop jar $JAR $CL -c $CONFIG -f bayesian -i sample

Additional Info
===============
More details can be found in my blogpost here
http://pkghosh.wordpress.com/2011/05/22/hadoop-orchestration



About

A simple easy to use Hadoop map reduce workflow engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages