Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP(iox-10577): patched df upgrade 202-04-14 #12

Closed
wants to merge 22 commits into from

Conversation

appletreeisyellow
Copy link

@appletreeisyellow appletreeisyellow commented Apr 24, 2024

⚠️ This will not be merged. ⚠️

All the patches below are included in the last commit of April 13, 2024 in DataFusion!

This PR is based on #5 and #10 that including the following patches:

  1. Bringing us up to datafusion to 2024-04-14

  2. PATCH: add the named struct patch

    commit 66f4fcb4664fc797ffb046d5b2ebcfca65ba4cd7
    Author: Andrew Lamb <andrew@nerdnetworks.org>
    Date:   Tue Apr 2 17:21:02 2024 -0400
    
        Use `struct` instead of `named_struct` when there are no aliases (#9897)
    
  3. PATCH: include the patch request (per slack) for the upstream coalesce bug.
    apache@4d85979 / coercion vec[Dictionary, Utf8] to Dictionary for coalesce function apache/datafusion#9958

    commit f0eec349a1abed14bcb2ee8a9fbf98bbb19b8f9a (HEAD -> iox-10350/df-upgrade-2024-03-31)
    Author: Lordworms <48054792+Lordworms@users.noreply.github.com>
    Date:   Fri Apr 5 15:57:48 2024 -0500
    
        coercion vec[Dictionary, Utf8] to Dictionary for coalesce function (#9958)
    
  4. PATCH: patch for the function re-writer, visiting subqueries within expressions.
    apache@e161cd6 / fix NamedStructField should be rewritten in OperatorToFunction in subquery regression (change ApplyFunctionRewrites to use TreeNode API apache/datafusion#10032 (merged in DF on April 12, 2024)

    commit e8de1c612a986ae4b0348ce0a9d92f08d93c258c
    Author: Andrew Lamb <andrew@nerdnetworks.org>
    Date:   Wed Apr 10 11:14:02 2024 -0400
    
        fix NamedStructField should be rewritten in OperatorToFunction in subquery
    
  5. PATCH: cherry-picked apache@671cef8 / Prune pages are all null in ParquetExec by row_counts and fix NOT NULL prune apache/datafusion#10051 (merged in DF on April 13, 2024)

alamb and others added 16 commits April 5, 2024 12:40
…che#9897)

* Revert "use alias (apache#9894)"

This reverts commit 9487ca0.

* Use `struct` instead of `named_struct` when there are no aliases

* Update docs

* fmt
…pache#9958)

* for debug

finish

remove print

add space

* fix clippy

* finish

* fix clippy
…L prune (apache#10051)

* Prune pages are all null in ParquetExec by row_counts
and fix NOT NULL prune

* fix clippy

* Update datafusion/core/src/physical_optimizer/pruning.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/core/tests/parquet/page_pruning.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/core/tests/parquet/page_pruning.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/core/tests/parquet/page_pruning.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/core/tests/parquet/page_pruning.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* remove allocate vec

* better way avoid allocate vec

* simply expr

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@github-actions github-actions bot added the core label Apr 24, 2024
@appletreeisyellow appletreeisyellow changed the title WIP(iox-10577): patched df upgrade 202-04-TBD WIP(iox-10577): patched df upgrade 202-04-14 Apr 24, 2024
@crepererum
Copy link

Is the patch list still up-to-date? Based on the PR dates, many of them should be merged in the 2024-04-14 main version of DataFusion already.

@appletreeisyellow
Copy link
Author

Is the patch list still up-to-date? Based on the PR dates, many of them should be merged in the 2024-04-14 main version of DataFusion already.

@crepererum This patch list keeps a history of how we get to 2024-04-14 without breaking any CI pipeline. You are right! All the patches are merged in the 2024-04-24 main version of DataFusion already 🎉

I think it works the same to have the DataFusion version in IOx Cargo.toml to either point to this forked influxdata/arrow-datafusion repo branch or to the apache/datafusion repo main. Since I plan to update this next patch for @wiedld, I just keep pointing to this forked repo

@appletreeisyellow
Copy link
Author

The upgrade is done. Closing

@appletreeisyellow appletreeisyellow deleted the chunchun/update-df-apr-week-2 branch April 29, 2024 16:41
wiedld pushed a commit that referenced this pull request Jul 17, 2024
* Add dialect param to use CHAR instead of TEXT for Utf8 unparsing for MySQL (#12)

* Configurable data type instead of flag for Utf8 unparsing

* Fix type in comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants