[SPARK-21039][SPARK CORE] Use treeAggregate instead of aggregate in D…

…ataFrame.stat.bloomFilter ## What changes were proposed in this pull request? To use treeAggregate instead of aggregate in DataFrame.stat.bloomFilter to parallelize the operation of merging the bloom filters (Please fill in changes proposed in this fix) ## How was this patch tested? unit tests passed (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Author: Rishabh Bhardwaj <admin@rishabh.local> Author: Rishabh Bhardwaj <r0b00ko@rishabh.Dlink> Author: Rishabh Bhardwaj <admin@Admins-MacBook-Pro.local> Author: Rishabh Bhardwaj <r0b00ko@rishabh.local> Closes apache#18263 from rishabhbhardwaj/SPARK-21039.
ianlcsd · Jun 13, 2017 · 9b2c877 · 9b2c877
1 parent 2aaed0a
commit 9b2c877
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -551,7 +551,7 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
         )
     }
 
-    singleCol.queryExecution.toRdd.aggregate(zero)(
+    singleCol.queryExecution.toRdd.treeAggregate(zero)(
       (filter: BloomFilter, row: InternalRow) => {
         updater(filter, row)
         filter