SparkAutoSubmitter is a job submitter for Apache Spark, which uses (1) heuristics based upon historical data and (2) realtime resource availability to select the optimal runtime configuration for any given job.
bash spark-submit-wrapper.sh <job-type> <input-data> <cluster-efficiency>
Where:
job-type
is an identifier for the job, eg "ScalaWordCount"input-data
is an HDFS path where the input data to the job livescluster-efficiency
is an integer [0-9] with 0=most emphasis on the performance of the single job and 9=performance of the computing cluster as a whole
- Clone the repository in the same environment that has the prerequisite software listed below.
- Update spark-environment.conf with the values for your system
Software | Version |
---|---|
Apache Spark | 2.1 (with hadoop 2.7) |
Hadoop | 2.8.1 |
- read and write results to hdfs path
- make more things configurable
- figure out how to support multiple job types (and also those not in hibench)