Properly support Spark Connect filter pushdown #1186

phillipleblanc · 2024-04-23T14:57:57Z

Fixes a bug in the Databricks Data Connector where filters were not being properly pushed down, resulting in the equivalent of a SELECT * FROM big_table and then trying to filter in-memory, which was causing some timeouts and inefficient queries.

I attempted to use the in-built DataFusion unparser, but ran into an issue with column quoting and I raised a PR to fix it upstream: apache/datafusion#10198

Once that PR lands and a new version is released, we can start removing our own expr::to_sql method.

ablyler · 2024-04-23T15:16:57Z

I've confirmed that is fixes the issue I've seen. Thanks for the quick fix!

Properly support Spark Connect filter pushdown

e7dc673

phillipleblanc added the kind/bug Something isn't working label Apr 23, 2024

phillipleblanc added this to the v0.12-alpha milestone Apr 23, 2024

phillipleblanc self-assigned this Apr 23, 2024

phillipleblanc requested a review from a team as a code owner April 23, 2024 14:57

digadeesh approved these changes Apr 23, 2024

View reviewed changes

digadeesh merged commit 3792065 into trunk Apr 24, 2024
16 checks passed

digadeesh deleted the phillip/240423-spark-push-down branch April 24, 2024 00:20

digadeesh modified the milestones: v0.12-alpha, v0.11.2-alpha Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly support Spark Connect filter pushdown #1186

Properly support Spark Connect filter pushdown #1186

phillipleblanc commented Apr 23, 2024

ablyler commented Apr 23, 2024

Properly support Spark Connect filter pushdown #1186

Properly support Spark Connect filter pushdown #1186

Conversation

phillipleblanc commented Apr 23, 2024

ablyler commented Apr 23, 2024