SparkAutoSubmitter

SparkAutoSubmitter is a job submitter for Apache Spark, which uses (1) heuristics based upon historical data and (2) realtime resource availability to select the optimal runtime configuration for any given job.

Usage

bash spark-submit-wrapper.sh <job-type> <input-data> <cluster-efficiency>

Where:

job-type is an identifier for the job, eg "ScalaWordCount"
input-data is an HDFS path where the input data to the job lives
cluster-efficiency is an integer [0-9] with 0=most emphasis on the performance of the single job and 9=performance of the computing cluster as a whole

Installation

Clone the repository in the same environment that has the prerequisite software listed below.
Update spark-environment.conf with the values for your system

Prerequisites / Supported Runtime

Software	Version
Apache Spark	2.1 (with hadoop 2.7)
Hadoop	2.8.1

Open Ends

read and write results to hdfs path
make more things configurable
figure out how to support multiple job types (and also those not in hibench)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
cluster-metrics.sh		cluster-metrics.sh
spark-environment.conf		spark-environment.conf
spark-submit-wrapper.sh		spark-submit-wrapper.sh
spark-tuner.py		spark-tuner.py
submit-history.log		submit-history.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkAutoSubmitter

Usage

Installation

Prerequisites / Supported Runtime

Open Ends

About

Releases

Packages

Languages

pgoetsch/SparkAutoSubmitter

Folders and files

Latest commit

History

Repository files navigation

SparkAutoSubmitter

Usage

Installation

Prerequisites / Supported Runtime

Open Ends

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages