What is it?
Hawk is a Hybrid Data Center Scheduler presented at Usenix ATC 2015
It takes the best of both worlds combining centralized and distributed schedulers. It has the following main features:
Hybrid Scheduling. Schedules Long jobs in a centralized way (better scheduling decisions) and Short jobs in a distributed way (better scheduling latency).
Work stealing. To do better load balance when a node is free it will contact another one and 'steal' the short-latency-sensitive jobs in the queue.
Partitioning. It prevents Long jobs from taking all the resources in the cluster so that Short jobs do not experience head-of-line blocking.
Eagle is currently work in progress. A beta version is available here. Eagle aims to avoid the Head-of-Line blocking that short jobs experience in distributed schedulers by providing and approximate/fast view of the Long jobs.
In order to run Hawk and Eagle you need to have Java JDK 1.7 installed and Maven.
This command installs Hawk/Eagle locally.
$ mvn install -DskipTests
Create a configuration file with the following parameters.
deployment.mode = configbased # currently only this mode is supported static.node_monitors =<hostname_1>:20502 # comma sepparated list of nodes where jobs will run static.app.name = spark # the application name, this can also be changed as a java opt system.memory =10240000 # system.cpus=1 # currently only one slot per machine is supported sample.ratio=2 # number of probes per task for distributed schedulers, 'power of two' cancellation=no # after a job finishes will cancel the rest of the probes, in practice makes no difference scheduler.centralized=<centralized_scheduler_ip> # centralized scheduler IP, if no centralized scheduler set 0.0.0.0 big.partition=80 # the percentage of nodes where Long jobs can run small.partition=100 # the percentage of nodes where Short jobs can run nodemonitor.stealing=yes # enable Hawk stealing nodemonitor.stealing_attempts=10 # number of stealing attempts eagle.piggybacking=no # enable Eagle eagle.retry_rounds=0 # number of rounds distributed schedulers should try before going to small partition
To enable Eagle you need to change the following parameters
nodemonitor.stealing=no # enable Hawk stealing nodemonitor.stealing_attempts=0 # number of stealing attempts eagle.piggybacking=yes # enable Eagle eagle.retry_rounds=3 # number of rounds distributed schedulers should try before going to small partition
After creating the configuration file you can run Hawk/Eagle daemon with the following command (replace JAVA_DIR, EAGLE_JAR and CONF_FILE with their corresponding paths):
$ JAVA_DIR -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps -Xmx2046m -XX:+PrintGCDetails -cp EAGLE_JAR ch.epfl.eagle.daemon.EagleDaemon -c CONF_FILE
Now you need to run a front end application, you can test it with a Spark program for example.
We also have a plugin for Spark, you can find it here.
You can compile it using the following command, provided you installed Eagle first.
$ build/sbt assembly
You can run an example with JavaSleep, for that you need to create a file with the jobs sleeping time. The input file should have the following format:
[Each line a job] Col1: job arrival time Col2: number of tasks in job Col3: estimated job runtime (we use normally the mean) Col4: (and as many cols as needed) the real duration of each task for the job (for the sleep) Example: 570 2 2722 2722 2722
Start the driver.
$ spark/bin/spark-run -Dspark.driver.host=<driver_hostname>
org.apache.spark.examples.JavaSleep "eagle@$SCHEDULER:20503" 5 3
SMALL can take the values: "small" or "big" depending on if its the centralized or the distributed (centralized --> big)
Start the backends. This should run in each of the nodes
org.apache.spark.scheduler.eagle.EagleExecutorBackend --driver-url spark://EagleSchedulerBackend@<driver_hostname>:60501
Hawk and Eagle are meant to improve job completion times in large clusters, to simulate with tens of thousands of nodes we used a simulator. This simulator is in Python, please refer to its README for further information.