spark-bench

Benchmark Suite for Apache Spark

READ OUR DOCS: https://sparktc.github.io/spark-bench/

Current VS. Legacy Version

spark-bench has recently gone through an extensive rewrite. While we think you'll like the new capabilities, it is not quite feature complete with the previous version of spark-bench. Many of the workloads that were available in the legacy have not yet been ported over, but they will be!

In the meantime, if you would like to see the old version of spark-bench, it's preserved in the legacy branch.

You can also grab the last official release of the legacy version from here.

Current Spark version supported by spark-bench: 2.1.1

Documentation

Visit the docs website: https://sparktc.github.io/spark-bench/

Installation

Grab the latest release from here: https://github.com/Spark-TC/spark-bench/releases/latest.
Unpack the tarball using tar -xvzf.
cd into the newly created folder.
Set your environment variables

Option 1: modify SPARK_HOME and SPARK_MASTER_HOST in bin/spark-bench-env.sh to reflect your environment.
Option 2: Recommended! Modify the config files in the examples and set spark-home and spark-args = { master } to reflect your environment. See here for more details.

Start using spark-bench!

Building It Yourself

Alternatively, you can also clone this repo and build it yourself.

First, install SBT according to the instructions for your system: http://www.scala-sbt.org/0.13/docs/Setup.html

Clone this repo.

git clone https://github.com/Spark-TC/spark-bench.git
cd spark-bench/

The latest changes will always be on develop, the stable version is master. Optionally check out develop here, or skip this step to stay on master.

git checkout develop

Building spark-bench takes more heap space than the default provided by SBT. There are several ways to set these options for SBT, this is just one. I recommend adding the following line to your bash_profile:

export SBT_OPTS="-Xmx1536M -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=2G -Xss2M"

Now you're ready to test spark-bench, if you so desire.

sbt test

And finally to build the distribution folder and associated tar file.

sbt dist

Running the Examples From The Distribution

The spark-bench distribution comes bundled with example scripts and configuration files that should run out out the box with only very limited setup.

Creating the Distribution Folder

If you installed spark-bench by unpacking the tar file, you're ready to go. If you cloned the repo, first run sbt dist and then change into that generated folder.

Setting Environment Variables

There are two ways to set the Spark home and master variables necessary to run the examples.

Option 1: Setting Bash Environment Variables

Inside the bin folder is a file called spark-bench-env.sh. In this folder are two environment variables that you will be required to set. The first is SPARK_HOME which is simply the full path to the top level of your Spark installation on your laptop or cluster. The second is SPARK_MASTER_HOST which is the same as what you would enter as --master in a spark submit script for this environment. This might be local[2] on your laptop, yarn on a Yarn cluster, an IP address and port if you're running in standalone mode, you get the idea!

You can set those environment variables in your bash profile or by uncommenting the lines in spark-bench-env.sh and filling them out in place.

Option 2: RECOMMENDED! Modifying Example Config Files To Include Environment Info

For example, in the minimal-example.conf, which looks like this:

spark-bench = {
  spark-submit-config = [{
    workload-suites = [
      {
        descr = "One run of SparkPi and that's it!"
        benchmark-output = "console"
        workloads = [
          {
            name = "sparkpi"
            slices = 10
          }
        ]
      }
    ]
  }]
}

Add the spark-home and master keys.

spark-bench = {
  spark-home = "/path/to/your/spark/install/" 
  spark-submit-config = [{
    spark-args = {
      master = "local[*]" // or whatever the correct master is for your environment
    }
    workload-suites = [
      {
        descr = "One run of SparkPi and that's it!"
        benchmark-output = "console"
        workloads = [
          {
            name = "sparkpi"
            slices = 10
          }
        ]
      }
    ]
  }]
}

Running the Examples

From the spark-bench distribution file, simply run:

./bin/spark-bench.sh ./examples/minimal-example.conf

The example scripts and associated configuration files are a great starting point for learning spark-bench by example. You can also read more about spark-bench at our documentation site

Previewing the Github Pages Site Locally

The spark-bench documentation at https://sparktc.github.io/spark-bench/ is generated from files in the docs/ folder. To see the Jekyll site locally:

Follow the instructions from Github regarding installing Ruby, bundler, etc.
From the docs/ folder, run bundle exec jekyll serve and navigate in your browser to 127.0.0.1:4000

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
bin		bin
cli		cli
docs		docs
examples		examples
project		project
spark-launch		spark-launch
test-workloads/src/main/scala/com/ibm/sparktc/sparkbench/workload/custom		test-workloads/src/main/scala/com/ibm/sparktc/sparkbench/workload/custom
travis-shell-scripts		travis-shell-scripts
utils		utils
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
NOTICE		NOTICE
SparkBench_APC2015.pdf		SparkBench_APC2015.pdf
build.sbt		build.sbt
readme.md		readme.md
scalastyle-config.xml		scalastyle-config.xml
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-bench

Benchmark Suite for Apache Spark

READ OUR DOCS: https://sparktc.github.io/spark-bench/

Table of Contents

Current VS. Legacy Version

Current Spark version supported by spark-bench: 2.1.1

Documentation

Installation

Building It Yourself

Running the Examples From The Distribution

Creating the Distribution Folder

Setting Environment Variables

Option 1: Setting Bash Environment Variables

Option 2: RECOMMENDED! Modifying Example Config Files To Include Environment Info

Running the Examples

Previewing the Github Pages Site Locally

About

Releases

Sponsor this project

Packages

Languages

License

pjfanning/spark-bench

Folders and files

Latest commit

History

Repository files navigation

spark-bench

Benchmark Suite for Apache Spark

READ OUR DOCS: https://sparktc.github.io/spark-bench/

Table of Contents

Current VS. Legacy Version

Current Spark version supported by spark-bench: 2.1.1

Documentation

Installation

Building It Yourself

Running the Examples From The Distribution

Creating the Distribution Folder

Setting Environment Variables

Option 1: Setting Bash Environment Variables

Option 2: RECOMMENDED! Modifying Example Config Files To Include Environment Info

Running the Examples

Previewing the Github Pages Site Locally

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages