Arabesque Project Skeleton
This repository contains a sample application built on top of Arabesque. It contains:
- A pre-configured pom.xml file for easy building with Maven.
- Execution scripts and configuration files for easily running your applications.
- Sample data file with the format Arabesque expects.
You may use it as a starting point for developing your own algorithms in Arabesque.
Arabesque and this skeleton project are open-source with the Apache 2.0 license.
To compile this project, you need:
To run the compiled application, you need:
- A 64-bit JVM running under Linux or Mac.
- A functioning installation of Hadoop2 with MapReduce (local or in a cluster)
Make it yours
Fork this project using Github (don't forget to change the repository name!!) or manually by executing the following:
git clone https://github.com/Qatar-Computing-Research-Institute/Arabesque-Skeleton.git $PROJECT_PATH cd $PROJECT_PATH git remote rename origin upstream git remote add origin $YOUR_REPO_URL
You should then edit the
pom.xml file paying particular attention to the following lines:
<groupId>org.example</groupId> <artifactId>arabesque-skeleton</artifactId> <version>1.0</version> <name>Arabesque Skeleton</name> <description>Skeleton for a new project using the Arabesque system</description>
Give it a descriptive name and description and make sure to change the group and artifact ids.
You should also change the following line in
scripts/run_arabesque.sh to match
your new artifactId:
Your application code should go under the
src/main/java. Included in this
skeleton is a sample implementation of Clique Finding which you might find a
useful starting point for your own implementations. Make sure to rename the
package and class according to your purposes.
You may compile this project as any other normal maven-based project.
If you execute the following command at the root of the project (where the
pom.xml file is located)
Maven will compile and package your application. The resulting jar will
be located under the
- In a machine with access to an Hadoop cluster, create a directory where you'll put everything necessary to execute your computation:
mkdir example cd example
- Put all the following files in that directory (using SCP/FTP/...):
An input graph with the correct input format as expected by Arabesque:
# <num vertices> <num edges> <vertex0Id> <vertex0Label> [<neighbour00Id> <neighbour01Id> ...] <vertex1Id> <vertex1Label> [<neighbour10Id> <neighbour11Id> ...] ...
Vertex ids should be in the range between 0 and
(number of vertices - 1). A sample graph is under the
- Upload your input graph to HDFS. A sample graph is under the
datadirectory. Make sure you have initialized HDFS first.
hdfs dfs -put <input graph file> <destination graph file in HDFS>
Change the settings in
application.yamlto match your cluster, application and data settings (
input_graph_pathshould point to the final path of the graph in HDFS according to the previous step).
To start your computation, execute the following (you should probably clean the output directory first):
./run_arabesque.sh cluster.yaml application.yaml
You can check the logs of the hadoop containers for progress information.
When finished, you can consult the results in the
output_pathHDFS directory as specified on the