update shortest path test to make it fail on spark 1.5 and 1.6 #23

mengxr · 2016-02-24T11:36:47Z

This could be a bug in our integral ID mapping.

I can reproduce the bug in Spark 1.6:

val df = sqlContext.range(10).select(col("id"), monotonicallyIncreasingId().as("long_id”))
df.show()

+---+-----------+
| id|    long_id|
+---+-----------+
|  0|          0|
|  1|          1|
|  2| 8589934592|
|  3| 8589934593|
|  4| 8589934594|
|  5|17179869184|
|  6|17179869185|
|  7|25769803776|
|  8|25769803777|
|  9|25769803778|
+---+-----------+

df.filter(col("id") === 3).show()

+---+----------+
| id|   long_id|
+---+----------+
|  3|8589934592|
+---+----------+

If seems that filter is pushed down below monotonicallyIncreadingId, which is wrong.

== Physical Plan ==
Project [id#837L,monotonicallyincreasingid() AS long_id#838L]
+- Filter (id#837L = 3)
   +- Scan ExistingRDD[id#837L]

@rxin

liancheng · 2016-02-24T16:32:43Z

I'm looking into this one.

liancheng · 2016-02-24T17:06:36Z

Please note that due to implementation strategy of MonotonicallyIncreadingID, DataFrame partition number is crucial for reproducing this bug. We should filter out an element that belongs to any partition other than the 1st one, and this element must not be the 1st element in that partition.

For example, on my 8-core laptop, Spark default parallelism is 8, the following code reproduces this issue:

import org.apache.spark.sql.functions._

val df = sqlContext
  .range(16) // 8 partitions, 2 elements per partition
  .select(
    col("id"),
    monotonicallyIncreasingId().as("long_id")
  )

df.show()
df.filter(col("id") === 3) // 2nd element in the 2nd partition
  .show()

mengxr · 2016-02-24T17:52:52Z

@liancheng Could you create a Spark JIRA and post the link here? We will implement a workaround in graphframes while waiting for the official fix. Thanks!

update shortest path test to make it fail on spark 1.5 and 1.6

…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field.

…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field. (cherry picked from commit 3fa6491) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

liancheng · 2016-02-25T15:42:18Z

@mengxr Sorry, missed your last comment. JIRA link: https://issues.apache.org/jira/browse/SPARK-13473

update the test to make it fail on spark 1.5 and 1.6

7b6cf10

mengxr changed the title ~~update the test to make it fail on spark 1.5 and 1.6~~ update shortest path test to make it fail on spark 1.5 and 1.6 Feb 24, 2016

liancheng mentioned this pull request Feb 24, 2016

[SPARK-13473][SQL] Don't push predicate through project with nondeterministic field(s) apache/spark#11348

Closed

use the same workaround for Spark 1.5/1.6

67bca03

mengxr added a commit that referenced this pull request Feb 24, 2016

Merge pull request #23 from mengxr/shortest-paths-1.5-1.6

9cf9741

update shortest path test to make it fail on spark 1.5 and 1.6

mengxr merged commit 9cf9741 into graphframes:master Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update shortest path test to make it fail on spark 1.5 and 1.6 #23

update shortest path test to make it fail on spark 1.5 and 1.6 #23

mengxr commented Feb 24, 2016

liancheng commented Feb 24, 2016

liancheng commented Feb 24, 2016

mengxr commented Feb 24, 2016

liancheng commented Feb 25, 2016

update shortest path test to make it fail on spark 1.5 and 1.6 #23

update shortest path test to make it fail on spark 1.5 and 1.6 #23

Conversation

mengxr commented Feb 24, 2016

liancheng commented Feb 24, 2016

liancheng commented Feb 24, 2016

mengxr commented Feb 24, 2016

liancheng commented Feb 25, 2016