Find file History
bsikander feat(jobserver): Load config and jar from hadoop
Standalone cluster with cluster mode requires
users to provide config/jar file on each worker through
file system or hadoop supported filesystems.

According to spark documentation:
"If your application is launched through Spark submit,
then the application jar is automatically distributed
to all worker nodes"

The above statement is not correct.
* If the driver folder doesn't exist on the worker, then
spark will create the folder and download the jar file
from any method that user has provided (file/hdfs/s3).
* If the driver folder already exists then Spark will
just use the already existing jar.

Further, according the Spark documentation, the configuration
"spark.files" will place the files in the working
directory of executors.

Currently, jobserver was only capable of loading files
from local system in standalone mode.

Since we want to configuration file in driver, we cannot
use this option.

This change, allows user to specify any hadoop supported
filesystem path to be passed in MANAGER_CONF_FILE,
MANAGER_JAR_FILE environment variables. We pass the jar
file path directly to spark because it can handle loading
files from HDFS supported filesystems.

Driver will use hadoop APIs to read the config file at runtime.
The path that user specifies is passed (after basic
validation) to driver and at startup it loads the
config file.

Note: If HDFS is setup in HA mode then the nameservice will
handle the routing of request to active namenode.
Latest commit 1d9d1e4 Sep 26, 2018