Skip to content

Commit

Permalink
[SPARK-31308][PYSPARK] Merging pyFiles to files argument for Non-PySp…
Browse files Browse the repository at this point in the history
…ark applications

### What changes were proposed in this pull request?

This PR (SPARK-31308) proposed to add python dependencies even it is not Python applications.

### Why are the changes needed?

For now, we add `pyFiles` argument to `files` argument only for Python applications, in SparkSubmit. Like the reason in apache#21420, "for some Spark applications, though they're a java program, they require not only jar dependencies, but also python dependencies.", we need to add `pyFiles` to `files` even it is not Python applications.

### Does this PR introduce any user-facing change?

Yes. After this change, for non-PySpark applications, the Python files specified by `pyFiles` are also added to `files` like PySpark applications.

### How was this patch tested?

Manually test on jupyter notebook or do `spark-submit` with `--verbose`.

```
Spark config:
...
(spark.files,file:/Users/dongjoon/PRS/SPARK-PR-28077/a.py)
(spark.submit.deployMode,client)
(spark.master,local[*])
```

Closes apache#28077 from viirya/pyfile.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
  • Loading branch information
2 people authored and Seongjin Cho committed Apr 14, 2020
1 parent f6de6b8 commit f014af4
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -474,10 +474,12 @@ private[spark] class SparkSubmit extends Logging {
args.mainClass = "org.apache.spark.deploy.PythonRunner"
args.childArgs = ArrayBuffer(localPrimaryResource, localPyFiles) ++ args.childArgs
}
if (clusterManager != YARN) {
// The YARN backend handles python files differently, so don't merge the lists.
args.files = mergeFileLists(args.files, args.pyFiles)
}
}

// Non-PySpark applications can need Python dependencies.
if (deployMode == CLIENT && clusterManager != YARN) {
// The YARN backend handles python files differently, so don't merge the lists.
args.files = mergeFileLists(args.files, args.pyFiles)
}

if (localPyFiles != null) {
Expand Down

0 comments on commit f014af4

Please sign in to comment.