-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update shortest path test to make it fail on spark 1.5 and 1.6 #23
Conversation
I'm looking into this one. |
Please note that due to implementation strategy of For example, on my 8-core laptop, Spark default parallelism is 8, the following code reproduces this issue: import org.apache.spark.sql.functions._
val df = sqlContext
.range(16) // 8 partitions, 2 elements per partition
.select(
col("id"),
monotonicallyIncreasingId().as("long_id")
)
df.show()
df.filter(col("id") === 3) // 2nd element in the 2nd partition
.show() |
@liancheng Could you create a Spark JIRA and post the link here? We will implement a workaround in graphframes while waiting for the official fix. Thanks! |
update shortest path test to make it fail on spark 1.5 and 1.6
…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field.
…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field. (cherry picked from commit 3fa6491) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ministic field(s) ## What changes were proposed in this pull request? Predicates shouldn't be pushed through project with nondeterministic field(s). See graphframes/graphframes#23 and SPARK-13473 for more details. This PR targets master, branch-1.6, and branch-1.5. ## How was this patch tested? A test case is added in `FilterPushdownSuite`. It constructs a query plan where a filter is over a project with a nondeterministic field. Optimized query plan shouldn't change in this case. Author: Cheng Lian <lian@databricks.com> Closes #11348 from liancheng/spark-13473-no-ppd-through-nondeterministic-project-field. (cherry picked from commit 3fa6491) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@mengxr Sorry, missed your last comment. JIRA link: https://issues.apache.org/jira/browse/SPARK-13473 |
This could be a bug in our integral ID mapping.
I can reproduce the bug in Spark 1.6:
If seems that
filter
is pushed down belowmonotonicallyIncreadingId
, which is wrong.@rxin