Skip to content

Commit

Permalink
SPARK-1680: use configs for specifying environment variables on YARN
Browse files Browse the repository at this point in the history
Note that this also documents spark.executorEnv.*  which to me means its public.  If we don't want that please speak up.

Author: Thomas Graves <tgraves@apache.org>

Closes apache#1512 from tgravescs/SPARK-1680 and squashes the following commits:

11525df [Thomas Graves] more doc changes
553bad0 [Thomas Graves] fix documentation
152bf7c [Thomas Graves] fix docs
5382326 [Thomas Graves] try fix docs
32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN
  • Loading branch information
tgravescs committed Aug 5, 2014
1 parent 74f82c7 commit 41e0a21
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 6 deletions.
8 changes: 8 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,14 @@ Apart from these, the following properties are also available, and may be useful
used during aggregation goes above this amount, it will spill the data into disks.
</td>
</tr>
<tr>
<td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the Executor
process. The user can specify multiple of these and to set multiple environment variables.
</td>
</tr>
</table>

#### Shuffle Behavior
Expand Down
22 changes: 17 additions & 5 deletions docs/running-on-yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven guide](building-with-

Most of the configs are the same for Spark on YARN as for other deployment modes. See the [configuration page](configuration.html) for more information on those. These are configs that are specific to Spark on YARN.

#### Environment Variables

* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes launched on YARN. This can be a comma separated list of environment variables, e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.

#### Spark Properties

<table class="table">
Expand Down Expand Up @@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
<td><code>spark.yarn.access.namenodes</code></td>
<td>(none)</td>
<td>
A list of secure HDFS namenodes your Spark application is going to access. For example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The Spark application must have acess to the namenodes listed and Kerberos must be properly configured to be able to access them (either in the same realm or in a trusted realm). Spark acquires security tokens for each of the namenodes so that the Spark application can access those remote HDFS clusters.
A list of secure HDFS namenodes your Spark application is going to access. For
example, `spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`.
The Spark application must have acess to the namenodes listed and Kerberos must
be properly configured to be able to access them (either in the same realm or in
a trusted realm). Spark acquires security tokens for each of the namenodes so that
the Spark application can access those remote HDFS clusters.
</td>
</tr>
<tr>
<td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
<td>(none)</td>
<td>
Add the environment variable specified by <code>EnvironmentVariableName</code> to the
Application Master process launched on YARN. The user can specify multiple of
these and to set multiple environment variables. In yarn-cluster mode this controls
the environment of the SPARK driver and in yarn-client mode it only controls
the environment of the executor launcher.
</td>
</tr>
</table>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,14 @@ trait ClientBase extends Logging {
localResources
}

/** Get all application master environment variables set on this SparkConf */
def getAppMasterEnv: Seq[(String, String)] = {
val prefix = "spark.yarn.appMasterEnv."
sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
.map{case (k, v) => (k.substring(prefix.length), v)}
}


def setupLaunchEnv(
localResources: HashMap[String, LocalResource],
stagingDir: String): HashMap[String, String] = {
Expand All @@ -276,6 +284,11 @@ trait ClientBase extends Logging {
distCacheMgr.setDistFilesEnv(env)
distCacheMgr.setDistArchivesEnv(env)

getAppMasterEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}

// Keep this for backwards compatibility but users should move to the config
sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
// Allow users to specify some environment variables.
YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, File.pathSeparator)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)

// Allow users to specify some environment variables
sparkConf.getExecutorEnv.foreach { case (key, value) =>
YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
}

// Keep this for backwards compatibility but users should move to the config
YarnSparkHadoopUtil.setEnvFromInputString(env, System.getenv("SPARK_YARN_USER_ENV"),
File.pathSeparator)

Expand Down

0 comments on commit 41e0a21

Please sign in to comment.