[SPARK-8437] [DOCS] Using directory path without wildcard for filenam…

…e slow for large number of files with wholeTextFiles and binaryFiles Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/' Author: Sean Owen <sowen@cloudera.com> Closes apache#7036 from srowen/SPARK-8437 and squashes the following commits: 0e813ae [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 'dir/'
maropu · Jun 30, 2015 · 5d30eae · 5d30eae
1 parent fbf7573
commit 5d30eae
Showing 1 changed file with 6 additions and 2 deletions.
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -831,6 +831,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
    * }}}
    *
    * @note Small files are preferred, large file is also allowable, but may cause bad performance.
+   * @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
+   *       rather than `.../path/` or `.../path`
    *
    * @param minPartitions A suggestion value of the minimal splitting number for input data.
    */
@@ -878,9 +880,11 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
    *   (a-hdfs-path/part-nnnnn, its content)
    * }}}
    *
-   * @param minPartitions A suggestion value of the minimal splitting number for input data.
-   *
    * @note Small files are preferred; very large files may cause bad performance.
+   * @note On some filesystems, `.../path/*` can be a more efficient way to read all files in a directory
+   *       rather than `.../path/` or `.../path`
+   *
+   * @param minPartitions A suggestion value of the minimal splitting number for input data.
    */
   @Experimental
   def binaryFiles(