This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of MapReduce algrithms, which are implemented using
- Java/MapReduce Hadoop 2.5.0
- Java/Spark 1.0.2 (will upgrade to 1.1.0 in next few days)
Please note that this is a work in progress...
- Title: Data Algorithms
- Author: Mahmoud Parsian
- Publisher: O'Reilly Media
- All source code, libraries, and build scripts are posted here
- Shell scripts will be posted for running Spark/Hadoop program (soon!)
| Software | Version |
|---|---|
| Java | JDK7 |
| Hadoop | 2.5.0 |
| Spark | 1.0.2 |
| Ant | 1.9.4 |
| Name | Description |
|---|---|
| README.md | The file you are reading now |
| README_lib.md | The file you are reading now (must read before build) |
| src | Source files for MapReduce/Hadoop/Spark |
| lib | Required jar files |
| build.xml | The ant build script |
| dist | The ant build's output directory |
| LICENSE | License for using this repository |
| misc | misc. files for this repository |
| setenv | example of how to set your environment variables before building |
Before you build, you should read README_lib.md
Apache's ant 1.9.4 is used for building the project.
-
To clean up:
ant clean
-
To build: the build will create /dist/data_algorithms_book.jar file.
ant
-
To check your build environment:
ant myenv
To run programs, you have to make sure that your CLASSPATH contains all of the following JAR files:
<install-dir>/dist/data_algorithms_book.jar- all jar files in the
<install-dir>/lib/directory
Make sure that you use the full path for all jar files. This is how you can set up your CLASSPATH in a Linux bash environment:
BOOK_HOME=<install-dir>
export CLASSPATH=.:$BOOK_HOME/dist/data_algorithms_book.jar
jars=`find $BOOK_HOME/lib -name '*.jar'`
for j in $jars ; do
export CLASSPATH=$CLASSPATH:$j
done
Please send me an email: mahmoud.parsian@yahoo.com
Thank you!


