Spark on SLURM

This repository contains a lightweight scripts collection that you can use in order to run Spark jobs on SLURM, via sbatch. The scripts were tested using Spark 1.4.0 on UPPMAX (http://www.uppmax.uu.se/) milou HPC cluster.

Configuration

The minimal configuration requirement consists in having a SPARK_HOME environment variable pointing to the Spark installation folder:

#/.bashrc
export SPARK_HOME="/path/to/spark-version"

However, since you are probably running on HPC you may want to configure Spark in order to write logs and temporary files on local scratch:

#$SPARK_HOME/conf/spark-env.sh
#This configuration is known to work UPPMAX milou
export SPARK_LOCAL_DIRS=$(mktemp -d)
export SPARK_WORKER_DIR=$(mktemp -d)
export SPARK_LOG_DIR="$TMPDIR/spark-logs"
mkdir -p $SPARK_LOG_DIR

Finally, in order to test your configuration you may want to run:

export SPARK_EXAMPLES_JAR="/path/to/spark-examples.jar"
sbatch -A <your_project_id> examples/spark-pi.sh

##Usage For usage please refer to the examples directory in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md
start-cluster.sh		start-cluster.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark on SLURM

Configuration

About

Releases

Packages

Contributors 2

Languages

License

mcapuccini/spark-on-slurm

Folders and files

Latest commit

History

Repository files navigation

Spark on SLURM

Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages